Monty Hall Generalized (in R)


The Monty Hall problem is a classic problem in game theory. It's based on the game show Let's Make a Deal hosted by Monty Hall, and goes like this:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

The problem as quoted above was posed by statistician Steve Selvin in 1975. With a few steps of reasoning, we can see that the strategy of always switching your choice dominates the strategy of always staying with your initial pick. If you always switch, you have twice the chance of winning the car.

What if we didn't want to limit ourselves to 2 "bad" doors and 1 "good" door? We could generalize the game to have \(k\) total doors, \(a\) doors revealed after the initial pick, and \(h\) "good" doors.

Here's an implementation in R (available as a gist). Simulating two instances of the game for 10,000 runs, we see that always switching handily beats always staying.

# Simulation of general Monty Hall game

# R has unpredictable sampling from singletons.
# So we need conditional statements for edge cases:
#   only one possible door to reveal
#   only one possible door to switch to

# Parameters:
#   k: total number of doors
#   a: number of doors revealed
#   h: number of doors with prize
game = function(switch, k, a, h)
{
  doors = sample(c(rep('prize', h), rep('goat', k - h)), k)
  pick = sample(1:k, 1)
  remaining = setdiff(1:k, pick)
  goats = intersect(remaining, which(doors == 'goat'))
  
  if (length(goats) == 1) {
    revealed = goats
  } else {
    revealed = sample(goats, a)
  }

  hidden = setdiff(remaining, revealed)
  
  if (switch){
    if (length(hidden) == 1) {
      pick = hidden
    } else{
      pick = sample(hidden, 1)
    }
  }
  
  doors[pick] == 'prize'
}

# Conventional game with 3 doors, 1 being revealed and 1 having a prize
stay = sapply(1:10000, FUN=function(i) game(F, 3, 1, 1))
switch = sapply(1:10000, FUN=function(i) game(T, 3, 1, 1))
print(c(mean(stay), mean(switch)))
# 0.33 0.67

stay = sapply(1:10000, FUN=function(i) game(F, 10, 5, 2))
switch = sapply(1:10000, FUN=function(i) game(T, 10, 5, 2))
print(c(mean(stay), mean(switch)))
# 0.2 0.45
		

In the generalized setting, we view the standard game as having parameters \(k=3,a=1,h=1\).

Always switching is the dominant strategy even in the generalized setting, assuming that the doors are revealed in a truly random manner. We can prove this analytically.

First, we consider the probability of winning when always switching. If the initial pick is "good", which has probability \(\frac{h}{k}\), then there are \(k-a-1\) remaining doors out of which \(h-1\) are "good". If the initial pick is "bad", then \(h\) of the remaining doors are "good". So, the probability of winning when always switching is $$\frac{h}{k}\times\frac{h-1}{k-a-1}+\frac{k-h}{k}\times\frac{h}{k-a-1}=\frac{h(k-1)}{k(k-a-1)}$$ This is clearly greater than \(\frac{h}{k}\), which is the probability of winning when always staying.