Optimizing an algorithm is important, both for real-world applications as well as to improve your problem solving skills. However, as computer hardware improves, most people end up neglecting this skillset in favor of a lazy approach since the computational time difference may only be a few seconds. I have been working through problems on Project Euler and have found myself falling into this trap. (Of course, there are problems where a brute force approach is perfectly acceptable, especially if there’s not a closed form solution. More on this later.)
However, I came across a simple little problem that I found was ripe for optimization when I started coding it up. I’ll start by showing code for a direct implementation of the problem, followed by code that only focuses on the right answers and has a few other tweaks to minimize execution time.
The Monty Hall Problem
The Monty Hall Problem (named after Monty Hall, the host of Let’s Make a Deal) is a veridical paradox that produces a result that appears absurd but is demonstrated to be true nevertheless.
The following question was posed to Marilyn vos Savant in 1990:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats.
You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat.
He then says to you, “Do you want to pick door No. 2?”
Is it to your advantage to switch your choice?
Her response was that it is indeed advantageous to always switch, which in fact doubles the chances of winning the car. A lot of people didn’t believe her (after all, wouldn’t it be a 50/50 chance since there are two doors left?), and even when the logic was mapped out, it took running the problem a few million times through a computer until some people were convinced.
BTW: The answer is that you will win 66% of the time if you switch, and only 33% of the time if you stay.
Step 1: Direct Implementation
Let’s code the problem up as if it were occurring in real life. That is to say we are going to procedurally code the logic as if we were taking part in the show. This is the simplest way to go because we have a discrete number of steps to follow.
switch_wins = keep_wins = 0 # Read in number of iterations from command line, # otherwise default to 500K iterations = (ARGV.empty? ? 500000 : ARGV[0].to_i) iterations.times do # Hide the car car = rand(3) # Pick a door choice = rand(3) # Reveal a "goat" door. This will be done using some array logic. # Basically we are going to take an array of possible doors and subtract out # both their initial choice as well as the car's location # What is left are viable "goat" doors we can show. revealed_goat_door = ([0, 1, 2] - [car] - [choice]).first # Choose the other door and see if we win new_choice = ([0, 1, 2] - [revealed_goat_door] - [choice]).first if car == new_choice switch_wins += 1 end # Otherwise, see if staying would have won if car == choice keep_wins += 1 end end puts "-- After #{iterations} iterations of each strategy --" puts "Wins from switching: #{((switch_wins.to_f / iterations.to_f) * 100.0).round(2)}%" puts "Wins from staying: #{((keep_wins.to_f / iterations.to_f) * 100.0).round(2)}%."
Let’s take a look at the execution time. I’m running this script on my laptop using the built in “time” command and running it for increasing number of iterations (note I ran each one several times and averaged out the results to account for CPU load, etc):
> time ruby monty_hall_1.rb 500000 -- After 500000 iterations of each strategy -- Wins from switching: 66.65% Wins from staying: 33.35%. real 0m3.603s user 0m3.319s sys 0m0.022s > time ruby monty_hall_1.rb 1000000 -- After 1000000 iterations of each strategy -- Wins from switching: 66.64% Wins from staying: 33.36%. real 0m7.111s user 0m6.592s sys 0m0.032s > time ruby monty_hall_1.rb 2000000 -- After 2000000 iterations of each strategy -- Wins from switching: 66.68% Wins from staying: 33.32%. real 0m13.534s user 0m13.041s sys 0m0.039s
Step 2: Let’s Focus on the “Right” Answers Only
In addition to not mucking about trying to figure out which door to reveal, we can simplify the logic to only check for the two possible win scenarios. These are:
1) The player picked the car correctly on the first try and did not switch.
2) The player picked a goat on the first try and did switch.
Technically the actual door choices, location of the car, etc don’t matter…only whether or not the above conditions are met. Furthermore, the placement of the car doesn’t need to be random, if the player’s choice is already random then there is always a 1/3 chance of guessing the right door, so that can be treated as a constant.
keep_wins = switch_wins = 0 car = 2 # Read in number of iterations from command line, # otherwise default to 500K iterations = (ARGV.empty? ? 500000 : ARGV[0].to_i) iterations.times do # Pick a door choice = rand(3) # Test the "stay" strategy if car == choice # Picked the car, and stuck with it keep_wins += 1 end # Test the "switch" strategy if car != choice # Picked a goat, then switched switch_wins += 1 end end puts "-- After #{iterations} iterations --" puts "Wins from switching: #{((switch_wins.to_f / iterations.to_f) * 100.0).round(2)}%" puts "Wins from staying: #{((keep_wins.to_f / iterations.to_f) * 100.0).round(2)}%."
Let’s take a look at the execution time. I’m running this script on my laptop using the built in “time” command and running it for increasing number of iterations (note I ran each one several times and averaged out the results to account for CPU load, etc):
> time ruby monty_hall_2.rb 500000 -- After 500000 iterations -- Wins from switching: 66.71% Wins from staying: 33.29%. real 0m0.273s user 0m0.243s sys 0m0.008s > time ruby monty_hall_2.rb 1000000 -- After 1000000 iterations -- Wins from switching: 66.63% Wins from staying: 33.37%. real 0m0.483s user 0m0.467s sys 0m0.009s > time ruby monty_hall_2.rb 2000000 -- After 2000000 iterations -- Wins from switching: 66.65% Wins from staying: 33.35%. real 0m0.938s user 0m0.906s sys 0m0.009s
The Results: 93% more faster!!
By eliminating unnecessary random number generation, filtering of doors by focusing on the “win” conditions in an abstract manner, we were able to drop execution time significantly (it only taks 7% as long as the original problem). Sure, the difference of a few seconds may seem trivial for something like this, but if there were a script that was run 1,000 or 10,000 times a day, then the savings can be well worth it.
A real world example of this occurred at a photography company I used to work for. They had several scripts that assisted the staff in the processing and grouping of school portraits into “packages” (2 8×10, 5 5x7s, 10 wallets) based on how many photos of each child were taken, how many were “good”, etc. Initially the script only took around 5 seconds to run per child, but it was run all day long by a room full of staff for every child that had a photo taken by the company. Every time a staff member ran the script, they had to wait for it to complete before moving to the next child.
This amounted to a couple million executions per year, so the total execution time per year was around 2,700 hours. Just shaving 1 second off the execution time translated into around 500 hours of saved time, which is significant. I think we managed to get the execution down to an average of 2 or 3 seconds via logic improvements and hardware upgrades, which resulted in faster turnaround times, less overtime, and a happier staff.
References:
Leave a comment