Have you ever been shopping for a product online and had trouble deciding between two similar products? When that happens to me, I always look to the product ratings and reviews to help me decide. Unfortunately, sometimes those ratings can be hard to interpret. For example, suppose one product has just five ratings all of which gave it 5 stars and another product has ten ratings, nine of which are 5 stars and one of which is 4 stars. Which product is better?
One way to try to resolve questions like this is to look at the average rating. In this example, the first product has an average rating of 5 stars and the second product has an average of 4.9. So does that mean the first product is better? Maybe. But what if the first product only has one 5 star rating? Should we really be confident that the single 5 star average is reliable enough to conclude that it is better even though we have a lot more information about the quality of the second product and it is also highly rated? Perhaps not.
What is Bayesian Inference?
Fortunately, probability theory provides a better method for answering questions like this. Let's suppose that x represents the true rating of a product we are interested in. If we could have every person on earth rate the product and average the ratings, that we would know the value of x exactly. The problem is, of course, that this is impossible. So, instead we might try to describe the probability that x takes on different values given the information that we have (which we will refer to as D for data). For example, if we have observed one 5 star rating we might want to know whether it is likely that x is actually a 5 star product or a 4 star product or worse. We can visualize these probabilities using a function called a probability density. I have displayed two such functions below.
The horizontal axis represents the star rating and the vertical axis represents the relative likelihoods of different ratings. For example. You can see that both curves reach their highest values above x=4. This means that average ratings around 4 stars are the most likely outcomes whereas values around 3 stars are much less likely. For the orange curve, the most likely outcomes are around 4.5 stars and values 3 and below are extremely unlikely. This suggests the second product (orange) is likely the better one.
If you want to be more precise, you can compare areas under the curves. In this example, 54% of the area under the curve is to the right of x=4. This means that 54% of the time, the ratings for the 1st product (blue) are above 4 stars. Meanwhile, the ratings for the 2nd product (orange) are above 4 stars 82% of the time. Again this suggests that we can be more confident that the orange product is highly rated than than the blue.
But how do we figure out what these functions should look like? Enter Bayes' law. Back in the 1700's, a statistician and Presbyterian minister named Thomas Bayes formulated a theorem, now known as Bayes' law, that says the following:
or in words, the probability of x given the data is proportional to the probability of the data given x times the probability of x. If this seems like Greek to you, that is OK, let's try to break it down.
The left hand side is a probability density function like the ones above that we would like to compute. It is sometimes called the posterior density because it represents our beliefs after we have collected some evidence. The right hand side tells us how to actually calculate this. You just need to know two things:
The likelihood of the observed data (the star ratings we do have) for each possible value of x, P(D|x).
The prior probability of x which represents our beliefs about the true rating ahead of time.
Both of those quantities can be determined given some simple assumptions about the problem at hand. For example, in the absence of any ratings (the prior), you might assume that ratings between 1 and 5 are equally likely (or better yet, you might look at the distribution of ratings for similar products). There is also a probability distribution known as a multinomial distribution that is appropriate for describing the likelihood of getting different collections of star ratings. So, armed with those two pieces of information, you can compute the posterior. Actually working through the calculations can be a bit tedious, but the good news is that it can be done (usually with the help of a computer) and you don't have to because I have created an interactive calculator to do it for us! Note: It takes a little while to load.
This calculator gives you the option of choosing the number of 1, 2, 3, 4 and 5 star ratings for each product, the threshold you want to use when computing probabilities (i.e. if the threshold is 4 then it gives the probability that each product has a true rating above 4), and the prior strength (which determines how sensitive the probabilities are to new data). Feel free to give it a spin the next time you are trying to decide between two products on amazon.
Optimal reasoning and imperfect heuristics
With that admittedly technical (sorry!) introduction out of the way, we can get into the point of this post. There are many situations in life where, like the preceding example, we are forced to make decisions amidst uncertainty and incomplete information. For example, we often need to evaluate whether a claim is true or false based on a relatively small amount of evidence. In these circumstances, we often begin with some assumptions that may cause us to lean in one particular direction even without having collected any specific evidence (our prior). We then look at the evidence that is available to us and interpret it in light of some sort of internal mental model for what we would expect if the claim is true or false (the likelihood). And then we revise our beliefs in response to that new information to obtain a new set of beliefs (the posterior). In other words, anytime we use evidence to update our beliefs about a topic, we are using a form of inference that is very similar to the Bayesian approach that I outlined above.
It turns out that Bayesian inference is actually the optimal way to perform these sorts of inferences. You can show that under mild assumptions, Bayesian inference will allow your beliefs to converge to the truth as quickly as possible, even if your priors are not quite right. So, if your goal is to learn the truth, you should absolutely use this Bayesian approach.
Unfortunately, actually working out the calculations to use Bayesian inference for every decision we face is impractical. For this reason, our brains learn mental shortcuts to try to perform inference that is much faster, but not quite as reliable. These mental shortcuts, called heuristics, are what we might think of as gut reactions or instincts or intuitions, and they often do a surprisingly good job of approximating the optimal approach. However, there are situations where these intuitions lead us astray and can cause us to come to flawed conclusions. In his book, "Thinking Fast and Slow" (which I found fascinating), Psychologist Daniel Kahneman documents example after example where our intuitions lead to inaccurate and even nonsensical conclusions. One such failure is the base rate fallacy which leads even trained professionals to misinterpret medical test results.
Confirmation bias in action
Confirmation bias is yet another example of our intuitions' failure to replicate Bayesian inference properly. To see how this works, let's consider a simple example. Suppose there is an unusually shaped coin and we would like to determine the probability p that it comes up heads. If we had all the time in the world, we could just flip the coin over and over and over and count up the fraction of heads that we observe. Unfortunately, that might take too long, so instead we might want to estimate the plausible range of values for p (which can be determined from the posterior density) based on a relatively small number of flips.
We can accomplish this by again using Bayesian inference. We start with a prior, then we collect some data and compute the likelihood before updating the posterior. Let's go through the steps below.
For the prior, we might assume that any value of p between 0 and 1 is equally likely. This is called an uninformative prior. This would lead to a flat curve like the blue curve in the plot below on the right. This is what you would use if you have no reason to favor heads or tails and is like starting the experiment from a blank slate (0 heads and 0 tails).
On the other hand, if the shape of the coin suggests that one side is more likely than the other, we might start with a belief that favors particular values of p. This is called an informative prior. This is also illustrated by the blue curve in the plot below on the right.
Here the prior is biased towards tails. The most probable values for the probability of heads p are close to 0. This is a bit like assuming that you start with a few flips ahead of time with more tails (9) than heads (1).
We can then flip the coin a bunch of times (e.g. 200 times), and update our beliefs using Bayes law to obtain the posterior distributions plotted in orange. Notice that both posteriors are sharply peaked, which means that we have less uncertainty after collecting all that evidence. Also, the true value used for the simulation (0.75) is relatively close to the peak in both cases. This means that as we collect more evidence our beliefs are getting closer and closer to the truth. Notice however that the second scenario gives a slightly less accurate posterior because it started with the wrong assumption. It assumed that tails was more likely than heads when in reality heads was more likely. The good news is that those incorrect priors can be overcome with enough evidence. This is important: without confirmation bias, our beliefs will converge to the correct answer given enough (representative) evidence.
However, once we introduce confirmation bias the story changes. One way to simulate this is to simply count observations that agree with the initial assumption a bit more and/or count count observations that disagree with the initial assumption a bit less. For example, below, I plotted the outcome where each tails counts twice as much as each heads. Notice that although the posterior is still sharply peaked, it is not centered around the true value meaning that we are very confident but our beliefs are wrong. We think p is around 0.55 when it should be much higher.
If the bias toward tails is even stronger (tails counts for 4x as much as heads), then the discrepancy is even more pronounced.
In this scenario we could even convince ourselves that the coin favors tails even though heads comes up 75% of the time.
To make matters worse, even if your priors are unbiased, when you interpret the evidence in a biased way you will still converge to the wrong solution as seen below.
In other words, confirmation bias short circuits our ability to properly weigh the evidence and identify the true answer. We lose all guarantees that our beliefs will converge to the truth, and unless our biases happen to point in the right direction, we are unlikely to arrive at the correct answer. In that case, it is not the evidence that shapes our conclusions, it is just our preconceived biases.
Wisdom from proverbs
The stakes may seem very low when we are talking about coin flips, but this is a metaphor for how we evaluate truth claims in general. If you replace "probability of flipping heads" with "probability that a story is true" then the stakes are much higher. Our willingness to weigh the evidence objectively can determine whether we will be able to separate the truth from fake news, distinguish conspiracies from conspiracy theories, and discern between trustworthy people and con men.
Unfortunately, it can be difficult for us to recognize our own biases. That is why it is important to get advice from others. Sometimes they can see what we do not. Perhaps this is what King Solomon had in mind when, in the book of proverbs, he addressed the importance wise counsel saying:
Proverbs 12:15 The way of a fool is right in his own eyes, but a wise man listens to advice.
I think Solomon would agree that to put too much trust in your priors without objectively considering the evidence is to be a fool. The wise man considers the possibility that he just might be wrong and is therefore willing to listen and even to seek out the counsel of others.
When we do that, it is important that we be willing to listen to a diversity of opinions. If we only listen to those that confirm our biases, then their advice adds nothing. It is like double counting the evidence that supports our priors which, as we observed previously, can actually hinder our ability to ascertain the truth. Again, the book of Proverbs affirms this saying:
Proverbs 11:14 Where there is no guidance, a people falls, but in an abundance of counselors there is safety.
In trying times like these, wise counsel is more important than ever. Let's hope that all of us, especially our leaders, will be willing to seek it out, and perhaps, to be a bit more Bayesian in evaluating the evidence.
Comments