What are bits and bytes good for?
Can they help
consumers take informed responsibility for risks with help from an appropriate
handful of coins?
Can jurors assess the significance of DNA evidence in the
same context?
That's only the beginning of what surprisalbased information
measures do everyday.
Clues to the big picture are hiding in
the table below, which connects bits of surprisal to a series of increasingly
unlikely events...
situation

probability p = 1/2^{#bits}

surprisal #bits

one equals one

1

0 bits

wrong guess on a 4choice question

3/4

~0.415 bits

correct guess on truefalse question

1/2

1 bit

correct guess on a 4choice question

1/4

2 bits

seven on a pair of dice

6/6^{2} =1/6

~2.58 bits

snakeeyes on a pair of dice

1/6^{2} =1/36

~5.17 bits

N heads on a toss of N coins

1/2^{N}

N bits

harm from a smallpox vaccination

~1/1,000,000

~19.9 bits

win the UK Jackpot lottery

1/13,983,816

~23.6 bits

RGB monitor choice of one pixel's color

1/256^{3} ~5.9×10^{8}

24 bits

gamma
ray burst mass extinction event TODAY!

<1/(10^{9}*365) ~2.7×10^{12}

>38 bits

availability to reset 1 gigabyte of random access memory

1/2^{8E9} ~10^{2.4E9}

8×10^{9} bits ~7.6×10^{14} J/K

choices for 6×10^{23} Argon atoms in a 24.2L box at 295K

~1/2^{1.61E25} ~10^{4.8E24}

~1.61×10^{25} bits ~155 J/K

one equals two

0

∞ bits

Something that is certain has a probability of one. No surprisal there. As p decreases from one to zero, surprisal goes from zero to infinity but less quickly than one might imagine. Less than a bit of surprisal is associated with happenings having a better than 50:50 chance. One bit of surprisal (only) is associated with a wild guess of the correct answer to a truefalse question. Something with probability one fourth has 2 bits of surprisal, p=1/16 means 4 bits of surprisal, p=1/256 means 8 bits of surprisal, p=1/16,777,216 means 24 bits of surprisal, etc.
Mnemonic  Range 
probability p = 1/2^{#_of_bits_of_surprisal}  0 < p < 1 
Thus a bit of surprisal is what you feel after "calling heads" on a coin toss, when the coin lands with heads up! Surprisal is two bits when you throw heads on two of two coins at once. Three bits of surprisal (heads up on three of three coins) is starting to feel respectable. Twentyfour bits of surprisal, on the other hand, is closer to what you experience when winning the lottery. Thus surprisal reduces the probability of an extremely rare event to a quantity of more manageable size.
To refine your taste for surprisals, try the following experiment.
Toss
a single coin a few times, each time trying to predict which
side of the coin will land facing up. The predictions (of
ordinary folk, at least) are often wrong. That pleasant
surprise we feel when our prediction comes true (after some practice)
is associated with one bit of surprisal, as defined
above. Next, toss two coins. Occasionally both coins will
land "heads up", and the surprisal associated with that
happening is two bits. Now imagine (or sample) the surprisal
associated with three coins landing "heads up", or four, or
ten. The twenty bits of surprisal associated with
throwing "heads" on
twenty coins on the first try will be better appreciated after
you've tossed those twenty coins hundreds of thousands of
times, without finding all heads up even once. Thus a few
coins in your pocket can give you a feel for what a given
amount of surprisal means, anytime you need a reminder.

The equation above tells how to calculate probability given surprisal. To do the inverse (i.e. to calculate surprisal in bits given probability) you can use a calculator to try different bit values in the equation p = 1/2^{#bits} until you get the correct probability, or you can use the logarithmic relations: s = ln_{2}[1/p] = ln[1/p]/ln[2]. Thus in gambling, over 5 bits (~ln_{2}[36]) of surprisal are involved in throwing "snake eyes" with two sixsided die, while a throw of "seven" or "craps" is only half as surprising (s = ln_{2}[6]). Since "snake eyes" has twice the surprisal of "craps", you're as likely to throw snake eyes in one throw of a pair of dice as to throw seven twice in two.
Surprisal can also be useful in assessing risk. In fact, we argue that consumers may be able to make informed decisions on taking a given risk by considering the chance that tossing an appropriate handful of coins will (or will not) result in all heads. For example, suppose you plan an action that will reduce the surprisal of you catching smallpox to 16 bits (like that of throwing 16 heads on the first throw of 16 coins). Still not very likely. But if the surprisal of dying from smallpox is only 2 bits (i.e. probability = 1/2^{2} = 1/4), then you might consider getting a vaccination to protect you as long as the surprisal of harm from the vaccination is greater than that of getting done in by smallpox (16+2=18 bits). In practice the surprisal of harm from the vaccine might be closer to 20 bits. The odds of something bad happening either way are tiny, but this simple calculation would let you take informed responsibility for whichever choice you make.
Thus we should perhaps encourage newsmedia to provide surprisal estimates, instead of just telling us that "there's a small chance" of something bad or good happening given the large difference between something with a few bits of surprisal (say 3 heads on 3 coins) and something with more than a dozen bits of surprisal. The risks of: "shaving with an electric razor" versus "walking under a power line", or "eating an apple" versus "smoking a cigarette", might thus routinely be put into context. Likewise use of surprisals in communicating and monitoring risks to medical patients, so that decisions about actions with a small chance of dire outcomes are as informed as possible, might also reduce the costs of medical malpractice in the long run by making the need for legal redress less frequent.
Although surprisals are quite useful (and large) when we are talking about things that are very unlikely to happen, a similar measure of unexpectedness is oft needed in assessing likely hypotheses as well e.g. guilt of the suspect in a crime. A simple but robust measure of evidence for hypotheses, designed to accomodate new information as it comes in (additively if new bits of evidence are independent) turns out to be the surprisal that a hypothesis is false minus the surprisal that it is true (cf. items 7 and 9 here). The resulting mnemonic for this evidence in bits (ebits) becomes "odds ratio" = 2^{ebits}. Thus "ebits" is just the binarylog version of goodold racetrack odds. It goes to zero when the odds of something are 50:50, and to negative and positive extremes slowly (like surprisal) as the probability approaches 0 or 1. An ebit of evidence for a proposition's truth means that it has 2:1 odds, two ebits means 4:1 odds, three ebits 8:1 odds, etc.
Thus surprisal differences (like ebits) are a potentially useful element of Bayesian jurisprudence programs dedicated to development of objective tools for juries to work with in cases when the information they have can (like DNA evidence) be put into quantitative form. Such measures: (i) are conceptually accessible to most citizens, (ii) agree with common sense, and (iii) can apply different standards of proof e.g. in civil versus criminal trials. However, they leave entirely up to human judgement whether the right yesno question is being asked to begin with. It is likely that the quantitative modeling of individual culpability in the face of evidence, including the weighing of question alternatives, may be put onto an even more solid footing in days ahead via more general netsurprisal (KLinformation) measures whose application development e.g. in ecology is already underway. For the time being those techniques are not yet ready for nonspecialists, and vice versa.Here you'll find more on the relevance of surprisal measures to: