Abrazolica


Home     Archive     Tags     About        RSS

Blogroll

John Baez
John D Cook
A Box Guessing Game

I have two identical boxes. One of them has 10 balls labeled 1 to 10. The other has 100 balls labeled 1 to 100. Let's call the one with 10 balls A and the one with 100 balls B. I want you to pick one of the boxes, remove a ball and look at the number. Now tell me if the box you picked is A or B.

If the number is greater than 10 you should immediately tell me the box is B. If it is less than or equal to 10 it could be either box. Since all the balls in A have numbers ≤ 10 while only 10% of the balls in B do, it is more likely the ball came from A. Exactly how much more likely? That is the question.

Before drawing the ball, the probability that the box you picked is A is 0.5. Let L be the event that you draw a ball with number ≤ 10 then from Bayes' theorem we have

\[P(A|L)=\frac{P(L|A)P(A)}{P(L)}\]

\(P(L)=P(A,L)+P(B,L)\)
\(P(A,L)=P(L|A)P(A)\)
\(P(B,L)=P(L|B)P(B)\)

so the equation for \(P(A|L)\) is

\[P(A|L)=\frac{P(L|A)P(A)}{P(L|A)P(A)+P(L|B)P(B)}\]

where \(P(A)=P(B)=1/2\), \(P(L|A)=1\) and \(P(L|B)=10/100=1/10\). With these values we get \(P(A|L)=10/11=0.909\ldots\). The probability goes from 50% to over 90% after observing L so the box is almost certainly A.

But I see that you're the cautious type and a 90% probability isn't good enough for you. So let's put the ball back in the box, shake it up a bit, and choose another ball. Once again the result is L. Now what's the probability you chose box A?

The calculation is the same as above but now we have \(P(A)=10/11\), \(P(B)=1/11\). The new probability is then \(P(A|L)=100/101=0.990099\ldots\). That's as close to certain as you're going to get.

For fun look at the general case where A has n balls, B has m > n balls, and \(L_{k}\) is the event that you draw a ball k times from one of the boxes with replacement and each time the number is ≤ n. Show that the probabilities are

\[P(A|L_{k})=\frac{1}{1+x^{k}}\]

\[P(B|L_{k})=\frac{x^{k}}{1+x^{k}}\]

where \(x=n/m\).


Probability of Meeting Someone with COVID-19

Currently in Colorado (where we live) about 1 person in 50 has COVID-19. So let \(p=1/50\) be the probability that when you meet someone in Colorado, they have COVID-19.

The premise behind this assumption is that you have an equal probability of being exposed to everyone in the population. This is generally not true. People tend to confine themselves to subpopulations, and the probability of COVID-19 in different subpopulations can vary greatly. We just want to do a simple analysis so we'll assume it's true.

So the question is, if you come into contact with \(n\) people, what is the probability that at least one of them has COVID-19? The probability that none of them has it is \((1-p)^n\) therefore the probability that at least one of them has it is

\(1-(1-p)^n\)

The value of \(n\) for which this is at least \(0.5\), i.e. you have an even chance of contacting someone with COVID-19 is

\(n=-\log(2)/\log(1-p)\)

For Colorado, this number is currently about 34.

The average number of people you have to come into contact with to meet someone with COVID-19 is \(1/p\). For Colorado right now this is \(50\).

Below is a plot of the probability of meeting someone with COVID-19 as a function of \(n\) given that \(p=1/50\).

Probability of contacting someone with COVID-19 as a function of \(n\) given that \(p=1/50\)

A Rock and Classical Music Mashup

Let's create new music by hybridizing Beethoven's 5th Symphony and Deep Purple's Smoke on the Water using a probabilistic automaton.

You might ask "How did you come up with such an idea?". Well, Deep Purple's guitarist Ritchie Blackmore, who created the central melody of Smoke on the Water, says the melody is derived from Beethoven's 5th, and that he owes Beethoven "a lot of money".

The use of automata for creating melodies is covered in our book Creating Melodies, and the methods in this post come from there.

We start by converting sheet music of the 1st movement of Beethoven's 5th Symphony to scientific pitch notation:

Notes: z [G3G4] [G3G4] [G3G4] [E3E4] z [F3F4] [F3F4] [F3F4] [D3D4] z G4 G4 G4 z A4 A4 A4 z E5 E5 E5 C5 G4 G4 G4 z A4 A4 A4 z F5 F5 F5 D5 G5 G5 F5 [E5D5] G5 G5 F5
Rhythm: 1 1 1 1 4 1 1 1 1 8 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 5 1 1 1

From there, we create an abc notation file consisting of the following:
X: 1
T: Symphony No 5, 1st movement
C: Original work: Ludwig van Beethoven
M: 2/4
K: none
Q: 140
L: 1/8
%%MIDI program 0
z1[G,1G1][G,1G1][G,1G1][E,4E4]z1[F,1F1][F,1F1][F,1F1][D,8D8]z1G1G1G1z1A1A1A1z1e1e1e1c5G1G1G1z1A1A1A1z1f1f1f1d5g1g1f1[e5d5]g1g1f1[e5d5]
Then we create a MIDI file from the abc file with the command:
abc2midi bvn5th.abc -o bvn5th.mid

You can listen to the result here.

Now we do the same with the first four measures of Smoke on the Water whose notes and rhythm are:

Notes: [D3G3] [F3^B3] [G3C4] z [D3G3] z [F3^B3] [^A3^D4] [G3C4] [D3G3] [F3^B3] [G3C4] [F3^B3] [D3G3]
Rhythm: 2 2 2 1 1 1 2 1 4 2 2 2 2 7

As before, from there we create an abc notation file consisting of the following:
X: 1
T: Smoke on the Water
C: Original work: Deep Purple
M: 4/4
K: none
Q: 140
L: 1/8
%%MIDI program 0
[D,2G,2][F,2_B,2][G,2C2]z1[D,1G,1]z1[F,2_B,2][_A,1_D1][G,4C4][D,2G,2][F,2_B,2][G,3C3][F,2_B,2][D,7G,7]
[D,2G,2][F,2_B,2][G,2C2]z1[D,1G,1]z1[F,2_B,2][_A,1_D1][G,4C4][D,2G,2][F,2_B,2][G,3C3][F,2_B,2][D,7G,7]
Again, we create a MIDI file from the abc file with the command:
abc2midi smoke.abc -o smoke.mid

You can listen to the result here.

Now to hybridize the two works, we start by creating an automaton of each work. As our book describes in more detail, an automaton can be constructed for each work from the notes listed above. The automata are shown below.

Beethoven's Fifth automaton


Smoke on the Water automaton

The next step is to make the hybrid automaton. To do that, we need to connect the above two automata to make a single larger one. This can clearly be done in more than one way. Because we want the resulting music to sound good, experimentation is in order. A guiding principle is that the connection between the two needs to go both ways, so we don't get stuck in one or the other automaton. Below is one possible connection that seems reasonable.

Hybrid automaton

The Smoke on the Water automaton is marked by a dashed line rectangle. Here we've made three connections between automata. Two that enter the Smoke on the Water automaton, and one that exits it.

With the above hybrid automaton we can now make a probabilistic automaton file to represent it. Below is the file hybrid01.pat which consists of the following lines:

16
0 (0,0,0.33333) (1,1,0.33333) (12,c,0.33333)
1 (2,2,1.0)
2 (2,2,0.5) (3,3,0.5)
3 (4,4,0.5) (14,e,0.5)
4 (4,4,0.5) (5,5,0.5)
5 (5,5,0.33333) (6,6,0.33333) (8,8,0.33333)
6 (6,6,0.5) (7,7,0.5)
7 (4,4,1.0)
8 (8,8,0.33333) (9,9,0.33333) (11,b,0.33333)
9 (10,a,1.0)
10 (10,a,0.5) (8,8,0.5)
11 (10,a,1.0)
12 (13,d,0.5) (14,e,0.5)
13 (2,2,0.25) (12,c,0.25) (14,e,0.25) (15,f,0.25)
14 (13,d,1.0)
15 (12,c,1.0)

Note that for simplicity we've given all transitions equal probability. There are 16 nodes in the automaton numbered in the file from 0 to 15 with note correspondences shown below:

0 → G3G4
1 → E3E4
2 → F3F4
3 → D3D4
4 → G4
5 → A4
6 → E5
7 → G5
8 → F5
9 → D5
10 → G5
11 → E5D5
12 → G3C4
13 → F3B3
14 → D3G3
15 → A3D4

Note that we've accidentally created a redundancy with nodes 7 and 10 being the same. Those two nodes could be merged, but that only makes things more efficient, and doesn't affect the outcome, so we'll just leave it as is.

To create an output string of length 24 with the above defined probabilistic automaton, with start state 0, and a random seed, we run the command:
seed=$RANDOM;echo $seed;pautogen hybrid01.pat 24 0 $seed

which outputs for example the lines:

4886
000122234555566744444456

with the 1st line being the random number seed used, and the second line the output string. Now we need to produce a rhythm for the 24 notes above.

We'll just use the Smoke on the Water rhythm, which has length 14, which we'll repeat once to make a rhythm string of length 28. Our note string has length 24, so we'll add 4 more notes to match the length of the rhythm string, while simultaneously imposing more symmetry on the note string by changing a few notes. Note that this is an "art" part of the process, involving listening to the result, then going back and changing the note string, and repeating until it sounds satisfying. The final note string may not even satisfy the rules of the automaton, but what counts after all, is that the end result is something you like.

Now we paste the note strings and rhythm strings into a .str file (hybrid01.str), including our list of notes/chords used. Below is the complete file:

"Beethoven's 5th - Smoke on the Water" hybrid automaton variations
Stefan and Richard Hollos
Abrazol Publishing
16
[G,G] [E,E] [F,F] [D,D] G A e g f d g [ed] [G,C] [F,^B,] [D,G,] [A,^D]
0001211123455556674444456456 2221112142222722211121422227 2

For details on the format of .str files see the "Input file formats" section of the Software chapter near the end of the Creating Melodies book, or online here.

We create an abc file and MIDI file with the following commands:
str2abc hybrid01.str 0 C 100 1/16 > hybrid01.abc
abc2midi hybrid01.abc -o hybrid01.mid

You can listen to the result here.


Now let's generate another note string using our probabilistic automaton file, but this time, for variety, instead of starting at state 0, which is in the Beethoven's 5th part of the automaton, we'll start in the Smoke on the Water part, at state 13. We'll specify a string length of 28 to match the rhythm string length we've been using. The command is then:
seed=$RANDOM;echo $seed;pautogen hybrid01.pat 28 13 $seed
which gives us an output like this:
18274
edfcdedfcedced2223edced223ed
Adding this note string to the file hybrid01.str, calling it hybrid01b.str and alternating it with the note string already in the file, our file becomes:
"Beethoven's 5th - Smoke on the Water" hybrid automaton variations
Stefan and Richard Hollos
Abrazol Publishing
16
[G,G] [E,E] [F,F] [D,D] G A e g f d g [ed] [G,C] [F,^B,] [D,G,] [A,^D]
edfcdedfcedced2223edced223ed 2221112142222722211121422227 2
0001211123455556674444456456 2221112142222722211121422227 2
edfcdedfcedced2223edced223ed 2221112142222722211121422227 2
0001211123455556674444456456 2221112142222722211121422227 2

This makes for another nice tune.

Let's add a drum, by using the following abc file as a model (a Soukous rhythm from our book Creating Rhythms):

X: 1
T: bdf002
M: 16/4
K: C
Q: 480
V:1 clef=perc
L: 1/4
%%MIDI channel 10
%%MIDI drummap A 45
| A3A3A4AA5 | A3A3A4AA5 | A3A3A4AA5 |
V:2 clef=perc
L: 1/4
%%MIDI channel 10
%%MIDI drummap B 50
| z2B3B3B4BB3 | z2B3B3B4BB3 | z2B3B3B4BB3 |
Some of the info in that abc file was added to hybrid01b.abc, saving it as hybrid02.abc. Now, as before we convert the abc file to MIDI with command:
abc2midi hybrid02.abc -o hybrid02.mid

You can listen to the result here.

Note that in hybrid02.abc we also slightly slowed the tempo (Q: parameter) from 100 to 90, since we liked it better. Reference info for drum mapping can be found here.

Now let's start from scratch to make another hybrid melody, putting it in hybrid03.str. We generate a couple note strings using the automaton like this:

seed=$RANDOM;echo $seed;pautogen hybrid01.pat 35 13 $seed
which produced the output:
9291
edcdededfcdcd223eded23444444589aa89
seed=$RANDOM;echo $seed;pautogen hybrid01.pat 15 13 $seed
19789
cededcd23edfcde

Note that the first string is length 35, while the 2nd is length 15. We will end up with two note strings of length 35, where the shorter string is repeated, and an additional 5 notes added to make another string of length 35.

This time we use a Son Clave rhythm which consists of the abc file:

X: 1
T: bdf000
M: 16/4
K: C
Q: 480
V:1 clef=perc
L: 1/4
%%MIDI channel 10
%%MIDI drummap A 45
| A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 |
V:2 clef=perc
L: 1/4
%%MIDI channel 10
%%MIDI drummap B 50
| B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 |

The note strings and the rhythm from above are incorporated into hybrid03.str which consists of:

"Beethoven's 5th - Smoke on the Water" hybrid automaton variations
Stefan and Richard Hollos
Abrazol Publishing
16
[G,G] [E,E] [F,F] [D,D] G A e g f d g [ed] [G,C] [F,^B,] [D,G,] [A,^D]
edcdededfcdcd22edcdededfcdcd22edcde 33424334243342433424334243342433424 2
cededcd23edfcdecededcd23edfcdeceded 33424334243342433424334243342433424 2
str2abc hybrid03.str 0 C 100 1/16 > hybrid03.abc

Now let's add a drum by editing the just created hybrid03.abc to make it look like this:

X: 1
T: "Beethoven's 5th - Smoke on the Water" hybrid automaton variations
C: Stefan and Richard Hollos
Z: Abrazol Publishing
M: 4/4
K: C
Q: 160
V: 1
L: 1/16
%%MIDI program 24
[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4[F,^B,]3[D,G,]3[F,^B,]4[A,^D]2[G,C]4[F,^B,]3[G,C]3[F,^B,]4[F,F]2[F,F]4[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4[F,^B,]3[D,G,]3[F,^B,]4[A,^D]2[G,C]4[F,^B,]3[G,C]3[F,^B,]4[F,F]2[F,F]4[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4[F,^B,]3[D,G,]3[F,^B,]4[A,^D]2[G,C]4[F,^B,]3[G,C]3[F,^B,]4[F,F]2[F,F]4[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4[F,^B,]3[D,G,]3[F,^B,]4[A,^D]2[G,C]4[F,^B,]3[G,C]3[F,^B,]4[F,F]2[F,F]4[D,G,]3[F,^B,]3[G,C]4[F,^B,]2[D,G,]4
[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4[G,C]3[F,^B,]3[F,F]4[D,D]2[D,G,]4[F,^B,]3[A,^D]3[G,C]4[F,^B,]2[D,G,]4[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4[G,C]3[F,^B,]3[F,F]4[D,D]2[D,G,]4[F,^B,]3[A,^D]3[G,C]4[F,^B,]2[D,G,]4[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4[G,C]3[F,^B,]3[F,F]4[D,D]2[D,G,]4[F,^B,]3[A,^D]3[G,C]4[F,^B,]2[D,G,]4[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4[G,C]3[F,^B,]3[F,F]4[D,D]2[D,G,]4[F,^B,]3[A,^D]3[G,C]4[F,^B,]2[D,G,]4[G,C]3[D,G,]3[F,^B,]4[D,G,]2[F,^B,]4
V: 2 clef=perc
L: 1/16
%%MIDI channel 10
%%MIDI drummap A 45
| A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 | z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 |
| z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 | z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 |
| A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 | z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 |
| z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 | A3A3A4A2A4 | z3z3z4z2z4 | A3A3A4A2A4 | A3A3A4A2A4 |
V: 3 clef=perc
L: 1/16
%%MIDI channel 10
%%MIDI drummap B 50
| B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 | z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 |
| z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 | z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 |
| B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 | z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 |
| z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 | B3B3B4B2B4 | z3z3z4z2z4 | B3B3B4B2B4 | B3B3B4B2B4 |

where the rhythms are alternatively stopped between measures with rests (z's).

Now, as before we convert the abc file to MIDI with command:
abc2midi hybrid03.abc -o hybrid03.mid

You can listen to the result here, which we call Beethoven on the Water.


COVID-19 Testing

Medical tests are not perfect. Sometimes a person with the disease will test negative and sometimes a healthy person will test positive. The COVID-19 tests are no exception. They are by no means perfect. In fact some are pretty bad. So if I take one of these tests and it comes back positive, what are the chances I actually have the disease? That's the question I want to answer.

How well a medical test performs is specified by two parameters called the sensitivity and specificity of the test. To see what these parameters mean, suppose you give the test to a group of people you know are sick and a group of people you know are healthy. Let

N(S) = number of people in the sick group
N(H) = number of people in the healthy group

The results can then be summarized in the following table.

S H
+ N(S+) N(H+) N(+)
- N(S-) N(H-) N(-)
N(S) N(H)

N(S+) = number of sick people who test positive
N(H+) = number of healthy people who test positive
N(S-) = number of sick people who test negative
N(H-) = number of healthy people who test negative
N(S+) + N(S-) = N(S)
N(H+) + N(H-) = N(H)
N(+) = N(S+) + N(H+) = number of positive test results
N(-) = N(S-) + N(H-) = number of negative test results

If the test was perfect you would get N(S-) = N(H+) = 0 i.e. there would be no false negatives or positives. From the numbers in the table you can get an estimate of the sensitivity and specificity of the test by looking at the fraction of sick people who test positive and the fraction of healthy people who test negative.

sensitivity = N(S+)/N(S)
specificity = N(H-)/N(H)

In other words, the sensitivity is the probability that, given someone is sick, they test positive. The specificity is the probability that, given someone is healthy, they test negative. We will call these probabilities P(+|S) and P(-|H). It is important to remember that the above ratios are just estimates of these probabilities. The larger you make N(S) and N(H), the better the estimates.

Suppose for example you have N(S)=34. If you perform the test on these people and find that N(S+)=34, i.e. they all test positive, does that mean you have a perfect test, that P(+|S)=1? No, P(+|S)=1 is just an estimate. The real value will probably be close but it is almost surely not exactly equal to 1.

A medical test should specify what's called a 95% confidence interval for P(+|S) and P(-|H). This means that if you redo the testing many times, using the same N(S) and N(H), the estimates you get for P(+|S) and P(-|H) will fall within the interval 95% of the time. There are many ways to calculate the interval and many papers have been written on the subject. It comes from the part of statistics called estimation theory. If you're interested you can start with the Wikipedia article on binomial proportion confidence intervals.

A less common approach to assessing how good the estimates for P(+|S) and P(-|H) are is to look at their probability distributions (yes, a probability can have a probability distribution). For the N(S)=34 example mentioned above what you have essentially is 34 tosses of a coin with two outcomes + or -. Since the population is known to be sick you can assume P(+) is much greater than P(-). In other words we have a highly biased coin but we don't know exactly what the probabilities are. We show in our book Coin Tossing: the Hydrogen Atom of Probability that one way you can model these probabilities is with a Beta distribution. See also our video of a Dancing Beta Distribution.

Let p=P(+|S) then the Beta distribution for p is

\[f(n,k,p)=(n+1)\binom{n}{k}p^k(1-p)^{n-k}\]

where n=N(S) and k=N(S+). So for k=n=34 you have

\[f(34,34,p)=35p^{34}\]

A plot of this distribution is shown below for 0.8 ≤ p ≤ 1. Note that the peak is at p=1 which is obviously the most likely value given the data. But there is a nonzero probability that p < 1.

Beta distribution

To get the probability that p is in the interval a ≤ p ≤ 1 you integrate f(34,34,p) over the interval. The result is

\[F(a)=P(a \lt p \lt 1) = 1 - a^{35}\]

A plot of F(a) for 0.8 ≤ a ≤ 1 is shown below. From the figure you can see that p is is almost certainly in the interval 0.8 ≤ p ≤ 1. As you narrow the interval the probability that p is contained in it gets smaller. The 95% confidence interval appears to be about 0.92 ≤ p ≤ 1.

Integral of Beta distribution

The beta distribution is not a common way to model the value of p but in my opinion it is more accurate. Before I move on I should mention that f(n,k,p) always has a maximum at p=k/n and the expectation and standard deviation for the distribution are

\[\mu = (k+1)/(n+2)\] \[\sigma = \sqrt{(k+1)*(n-k+1)/(n+3)}/(n+2)\]

Now let's look at the question I started with. Given that I get a positive COVID-19 test result, what is the probability I have the disease? I will denote this probability P(S|+). In the literature this is called the positive predictive value of the test. How do I calculate P(S|+) given the sensitivity and specificity of the test, i.e. given P(+|S) and P(-|H)?

From Bayes' Theorem we know that

\[P(S|+) = \frac{P(S+)}{P(+)} = \frac{P(+|S)P(S)}{P(+)}\]

and from the law of total probability we have:

\[P(+) = P(S+) + P(H+) = P(+|S)P(S) + P(+|H)P(H)\]

So the equation we're looking for is

\[P(S|+) = \frac{P(+|S)P(S)}{P(+|S)P(S) + P(+|H)P(H)}\]

where P(H) = 1 - P(S) and P(+|H) = 1 - P(-|H). We know P(+|S) and P(-|H), they are the sensitivity and specificity of the test. The only missing piece of information is P(S) i.e. the probability of being sick, also known as the prevalence of the disease. This will have to be estimated somehow. The FDA puts the estimate at 5% or P(S)=0.05.

As a sample calculation I'll use the sensitivity and specificity data for the Abbott Laboratories, Abbott Alinity i SARS-CoV-2 IgG test. This data along with data for other FDA approved tests can be found on the FDA's EUA Authorized Serology Test Performance web page The 95% confidence intervals for P(+|S) and P(-|H) are

P(+|S) = [0.899, 1.000]
P(-|H) = [0.946, 0.998]

Using this in the above equation for P(S|+) along with P(S)=0.05 (5% of the general population has antibodies) will give you the following 95% confidence interval for having the disease given a positive test result.

P(S|+) = [0.467, 0.963]

Note the wide range. At the low end, the chance that you have COVID-19 is no better than a coin toss.

What about the probability that you are healthy given a negative test result. This is called the negative predictive value of the test and we will denote it as P(H|-). Deriving a formula for P(H|-) in terms of P(+|S) and P(-|H) is pretty much the same as what I did above for P(S|+). Leaving out the details, the formula is

\[P(H|-) = \frac{P(-|H)P(H)}{P(-|S)P(S) + P(-|H)P(H)}\]

where P(-|S) = 1 - P(+|S). Doing the calculation using the above intervals for P(+|S) and P(-|H) will give you the following 95% confidence interval for being healthy given a negative test result.

P(H|-) = [0.994, 1.000]

The test has a much better negative predictive value than positive predictive value. This is generally true as long as the number of healthy people is much larger than the number of sick people.

The positive and negative predictive values of a test should be included in the test results. If not, you should ask for them. Also try to get the sensitivity and specificity along with information on exactly how they were estimated so you can do your own confidence interval analysis.

If you get a positive result on a test with a low positive predictive value you should have another independent test done. The positive predictive value of the two tests will usually be much higher. The simplest thing to do is to take the same test over again. While this may not exactly be independent of the first test you can treat it as such for a first approximation.

To see how this works let's assume we have a test with sensitivity and specificity equal to

P(+|S) = 0.95
P(-|H) = 0.98

and P(S)=0.01 (1% of the population is assumed to be sick) then

P(S|+) = 0.324

If you take the test again you use the same equation but the prevalence of the disease is now the value we just calculated, P(S)=0.324. The new calculation is

P(S|+) = 0.958

which is a much better result. Iterating this again you get

P(S|+) = 0.999

This is more or less common sense. If you take the test over and over again and it keeps coming up positive then you probably really do have the disease, barring the possibility that the additional tests are not at all independent and don't provide any additional information.

To summarize, always look at the positive and negative predictive values of a test. Getting a positive test result doesn't always mean that you actually have or had the disease. Always try to have more than one test done. One test by itself almost never provides a definitive result. Finally, wash your hands, wear a mask, and avoid crowds so we can beat this virus.



© 2010-2020 Stefan Hollos and Richard Hollos