Calculate the independence of two events using both analytic and empirical techniques. I’m trying to assess whether the probability of having a meal classified as “dinner” depends on whether that meal includes “chicken” as its main dish.
The options for the main dish:
The options for the side dishes:
All possible combinations to create a meal:
dishes <- data.frame(
main = c("chicken", "salmon", "pork", "chicken", "pancakes", "french toast"),
side = c("salad", "salad", "green beans", "corn", "carrots", "bacon")
)
possible_meals <- dishes %>%
cross_df() %>%
mutate_if(is.factor,as.character)
possible_meals
# A tibble: 36 × 2
main side
<chr> <chr>
1 chicken salad
2 salmon salad
3 pork salad
4 chicken salad
5 pancakes salad
6 french toast salad
7 chicken salad
8 salmon salad
9 pork salad
10 chicken salad
# … with 26 more rows
Event a will be, “the main course is chicken.” What is its probability?
# a = main course is chicken
# tally the number of meals that include chicken
sum(possible_meals$main == "chicken") / nrow(possible_meals)
[1] 0.3333333
So, p(a)
= 0.333. Event b will be, “the meal is dinner.” What is its probability?
# b = the meal is dinner (rather than breakfast)
# tally the number of meals that are dinners rather than breakfast
# any meals with pancakes, french toast, or bacon are not dinner
# number of meal options for 'main" X number of meal options for 'side'
(
sum(dishes$main != c('pancakes', 'french toast')) / nrow(dishes)
*
sum(dishes$side != "bacon") / nrow(dishes)
)
[1] 0.5555556
So, p(b)
= 0.555. If a and b are independent, then p(b)
should be the same as p(b | a)
. Does the probability of eating a meal classified as dinner depend on whether that meal includes chicken?
First, the analytic solution.
p(b | a) = p(b & a) / p(a)
p(dinner | chicken) = p(dinner & chicken) / p(chicken)
I need to find p(dinner & chicken)
to solve. So tally the possible ways chicken can combine with other dishes to create a dinner platter.
tally_count <- 0
for(i in 1:nrow(possible_meals)){
meal_df <- possible_meals[i,]
contain_chicken <- meal_df$main == "chicken"
no_bacon <- meal_df$side != "bacon"
if(contain_chicken == T && no_bacon == T){tally_count <- tally_count + 1}
}
tally_count / nrow(possible_meals)
[1] 0.2777778
Cool, p(dinner & chicken)
= 0.2777. Now I can calculate the conditional probability.
p(dinner | chicken) = p(dinner & chicken) / p(chicken)
X = 0.2777 / 0.333
X = 0.83
X does not equal p(b)
, so the two are dependent. How about the empirical solution?
# what is the empirical estimate of p(dinner | chicken)?
# to calculate, I need:
# p(dinner & chicken) / p(chicken)
sims <- 10000
df <- data.frame(
chicken_and_dinner = c(rep(0, sims)),
chicken = c(rep(0, sims))
)
for(j in 1:sims){
eat_main <- sample(dishes$main, 1, replace = F)
eat_side <- sample(dishes$side, 1, replace = F)
chicken_and_dinner <- F
if(eat_main == "chicken" &&
(eat_side == "salad" | eat_side == "green beans" | eat_side == "corn" | eat_side == "carrots")){
chicken_and_dinner <- T
}
chicken <- F
if(eat_main == "chicken"){chicken <- T}
single_run_result <- c(chicken_and_dinner, chicken)
df[j, "chicken_and_dinner"] <- chicken_and_dinner
df[j, "chicken"] <- chicken
}
tally_chicken_and_dinner <- sum(df$chicken_and_dinner == 1)
tally_chicken <- sum(df$chicken == 1)
prob_cd <- tally_chicken_and_dinner / sims
prob_c <- tally_chicken / sims
prob_cd / prob_c
[1] 0.8368014
Bo\(^2\)m =)