The Search for “The One”

Don't Do it, Unless...

Here’s a scientifically backed tip for finding the "right-est" person to marry: don’t marry your high school sweetheart. It sounds harsh, but there is math behind the concept!

Suppose you’re 18, want to be married by 28, and are just starting to seriously date. The mathematically optimal strategy says: date casually (no commitments) until you’re 21–22 (about 3.7 years), keeping track of the best "candidate" you’ve met so far. Then, commit to the very next person who is clearly better than anyone from that initial sample.

You’ll end up with the objectively best partner out of everyone you would ever have dated with about a 37% probability—and your expected outcome is astonishingly close to as good as it could be. Don't get mad, that's the science!

How committed to science are you? I wasn’t that committed the science, I’ll admit. There is a caveat, which we'll address below.

The Multi-Armed Bandit

The first two chapters of Algorithms to Live By revolve around one surprisingly powerful, but strange, idea: almost every hard choice in life can be modeled as a row of slot machines in a casino! Each machine has a hidden payout rate, and you, with a limited number of coins (time, money, attention, life).

Your goal: maximize total reward.

You contend with job offers, apartments, marketing channels, date-night spots, parenting techniques, hobbies. Consider each of those domains to be a hallway lined with one-armed bandits (aka slot machines). The book uses this metaphor to tackle three universal questions:

When do I stop looking and just choose? (The 37% Rule)
Once I’ve chosen, how often should I keep experimenting vs. sticking with what works? (Explore vs. Exploit)
When the stakes are high, how do I commit without being paralyzed by future regret? (Win-Stay, Lose-Shift)

1. The 37% Rule – When to Stop Looking

In the most unforgiving situation where you have only one chance to view the option and make a decision, like house/apartment hunting in a hot market where you have to look and then decide “yes” or “no”, there is a mathematically proven solution!

In earlier times, it was known as the “Secretary Problem”. But the problem is so universal it maps to house-hunting, apartment shopping, dating and marriage, hiring staff, job hunting, buying, selling – any situation where you have many options that have to be evaluated serially and a decision has to be made on the spot.

After decades of math, the provably optimal strategy is staggeringly simple:

Look at (but reject) the first ~37% of the total options.
Note the best one in that sample.
Then accept the very first option that beats your benchmark.

You get the single best candidate in the pool of candidates ~37% of the time, and your average outcome is remarkably close to the theoretical maximum!

It should be noted that this doesn’t guarantee you will get the best outcome every time but it is unintuitively close! In fact it will only get you the best 37% of the time. However, for going in blind and having to pick on the spot, it is remarkable!

And what makes it actually useful, is that it scales! 10 options look at 4, 1000 options look at 370, 10000 options look at 3700! Then leap!

(Side note: It's also interesting that the same value, 37%, shows up both as the fraction you should sample and as your success probability—math’s little poetry!)

Real-world variations (37% of what)

Time: Want to marry by 30, starting at 18? Explore ages 18–23.5, then commit to the next person better than anyone seen so far.
Budget: $50,000 marketing budget: spend the first ~$18,500 testing wildly.
Count: 25 viable apartments: force yourself to view 9 before making any offer.

Pro tip: Add hard constraints early (“must be < $2,000/mo, hardwood floors, < 45 min commute”) to shrink the pool, which shrinks the cost of the 37% exploration phase. Relaxing constraints later can improve the ceiling but lengthens the search.

The rule bends beautifully when reality is kinder than the worst-case model:

If you can re-approach options you rejected, you can explore longer than 37%.
If the candidates can reject your offer, you can reduce the exploration time to increase likelihood of being accepted.
If you have an objective test that instantly disqualifies the bottom 50% (their objectively worse than the average), you need far fewer interviews.
If there's an additional cost for every search or delay, reducing the number of searches must be considered.

And that set of rule-benders is the caveat I referred to above. If we know what we're looking for (we have an objective criteria), we're not blind to other options around us (options aren't in a vacuum), there is a degree of commitment (the decision isn't imminent), and interest that goes both ways (it's a relationship in our partner's interest, too), the decisions we make need much less exploration than the no-information situation of37%!

2. Explore vs. Exploit – Stick with What Works

Most of life isn’t one-shot. You can keep pulling different arms forever (restaurants, marketing channels, workout routines, friends, hobbies)!

In this context, “explore” is the concept that we’re not trying to make any sort of decision yet. We’re just learning, collecting information and making the comparisons we need for the future. “Exploit” is simply using the information gleaned in the “explore” phase.

The winning strategies all follow the same beautiful pattern:

Early → explore heavily (you have no idea which arms are good).
Middle → gradually shift toward exploiting your current winners.
Late → mostly exploit, with tiny continued exploration (because the world changes and you don’t want to get stuck on a 1990s winner).

It’s so intuitive that we do this without even realizing it.

New City - Old City

We can see it in the 37% Rule. Our "intuitive math” might not be optimal, but we have something built in! Consider the restaurants in your city or town. When you move to a new city, you’re likely willing to try any local dining establishment that doesn’t look like it will kill you. When you know you’re not going to be in the city for much longer, you don’t visit unfamiliar diners…there would be no real benefit!

Restaurant chains take advantage of our desire for “exploiting”, saving us the trouble of “exploring” when we’re in unfamiliar territory. This is definitely a variation of the 37% Rule.

Optimism Bonus

We also tend to give unexplored options an “optimism bonus”. Unless you’re extremely averse to change, we are always willing to give the unknown a little extra latitude. If we didn’t, we’d never assume that a different, unknown, option could be better than what we have now…we’d be stuck. In math terms, when we aren’t sure about something, intuitively, or with real values, we give it an Upper Confidence Bound (UCB). It's what the potential upside could be. Here's the formula:

UCB = Average so far + c × √(1 / number of times you’ve tried it)

Where the bigger c is, the less willing you are to change without more exploring. Usually, 1.4-2.0. During the 37% exploration phase it can be as high as 6 to try many options. When exploration is over, it’s set to 0, so the option on the table has to be better than the average so far.

Now compare the UCB for the new option with the UCBs for the existing options and pick the winner!

Here’s how we likely use the formula in every-day life:

Should I try this: Only if it’s greater than (the average outcome so far + (my willingness to be wrong * √(how often I've alreday tried it)) of each other option.

You may notice that with the square root of the number of attempts (i.e. the less data), the bigger the bonus. We intuitively and naturally try new things exactly when we don't have all the information yet!.

Win-Stay, Lose-Shift

Win-Stay, Lose-Shift is the toddler strategy that actually works! It’s pretty simple and is surprisingly close to an optimal in many real-world situations. It’s pretty straightforward: keep doing what worked last time, but the moment it stops working, randomly try something new.

Consider how toddlers put everything in their mouths. They are exploring, exploring, exploring, and exploring without end! Well, until they regret it–and they learn quickly! That's win-stay, lose shift.

The application extends beyond toddlers, though. When there are multiple vendors to choose from, try one and work with them as long as they are treating you well. When things don’t work out, try a different one. We all tend to have similar strategies! But again, the multi-arm bandits give us the math to demonstrate the optimal algorithms for the worst-case scenarios.

3. Regret Minimization – Our Mindful Scorekeeper

I think it’s safe to say that the point of having strategies for “looking and leaping” for one-time adventures and for the ongoing scenarios of “explore-exploit” is to make the best decision we can with the information we have. And all of that is to minimize our total regret. If we have no personal regret it’s only because we made the best choices we could.

So to put that in a formula:

Regret = (what you actually got) − (what you could have gotten if you’d been perfect).

The algorithms that minimize regret over thousands of pulls are the same ones that feel wisest in life.

However, regret is asymmetric. We feel the downside of negative more than the upside of positive. So, we really need a new formula:

Felt regret = K * max(0, Regret)

Where K is our Loss Sversion Factor, which social scientists indicate is usually between 2 and 2.5.

Math aside, a meaningful antidote to regret, especially felt regret, is Jeff Bezos's "Regret Minimization Framework”. Simply, when the decision feels huge, project yourself to age 80 and ask: “Which version of this story will make future-me wince harder—the risk I took and failed, or the risk I was too scared to take?” Bezos used this exact framework to leave Wall Street and start Amazon.

Making decisions with the right time-scale in mind is valuable!

Bringing It Home

Flooring Project as a Multi-Armed Bandit

Hundreds of samples, 10 flooring stores, 6 weeks to have flooring at the house, 3 store recommendations, 1 contractor ready to go and wedded bliss.

I did my best to treat the whole thing as a multi-arm bandit problem:

We had the experience of our trusted contractor to suggest some flooring vendors. So we spent 2 weeks looking at vendors before we picked one.

Despite having only some initial “fuzzy” constraints on samples – cost, color, tone, style – we explored and quickly developed strict constraints. With the pool of possible flooring options drastically reduced, we got to through the explore phase with only 8 or 10 more samples…then we leapt!
Regret check: We didn’t get the cheapest flooring, we didn’t get the highest quality flooring. I was straightforward about pricing with our potential vendors. 80-year-old us will have no regrets about that purchase!

As I write this, the sub-floors are being prepped for new flooring. There is so much unfamiliar and sporadic noise that the dog is nervous and every surface in the house has a new layer of dust. So there might be a little regret

Next time we’ll be looking at how to tame the chaos! So until then, keep Aiming Up!

Cheat Sheet

Next time you’re overwhelmed by options, run this sequence:

Is this a one-shot decision? → 37% Rule + absolute floor.
Is this an ongoing part of life? → Use the New City → Old City test or Upper Confidence Bound.
Does it feel terrifyingly high-stakes? → Felt Regret minimization + Bezos "80-year-old test."

That’s it. Three tools from the exact same mathematical root (the multi-armed bandit) that can help solve so much decision paralysis!

Digging Deeper

Original multi-armed bandit papers (for the truly brave) (PDF)
Upper Confidence Bound explained (YouTube)
The Secretary Problem (Wikipedia)
37% Rule Derivation: Example with dating (Website)
An interview with Jeff Bezos about regret-minimization (YouTube)
Optimal stopping with absolute thresholds (PDF)
The Prisoners Dilemma (Wikipedia)
The Nash Equilibrium (Website)
The Nash Equilibrium Explained (Website)
Cooperation-based algorithms (Website)
“Tit-For-Tat” (Grokipedia)
“Win-Stay, Lose-Shift” (or Pavlov) (Website)