Chapter 5: Weak Typicality. In this chapter we address the following issue: for a random vector X equals X_1, X_2 up to X_n where X_i are i.i.d with distribution p(x), what would be a typical outcome of the random vector X? We are going to see how typical sequences are related to data compression. Specifically, we will take a second look at data compression, namely, we will discuss Shannon's source coding theorem. Section 5.1, the weak Asymptotic Equipartition Property, or simply the weak AEP. Let us now introduce the notion of typical sequences. Consider tossing a fair coin n times. If the outcome is head for approximately half of the time, the sequence of the outcome is, quote unquote normal, or typical. So the question is, how to measure the typicality of a sequence with respect to the generic distribution of an i.i.d. process? There are two commonly used such measures in information theory, namely weak typicality and strong typicality. The main theorems are the weak and strong Asymptotic Equipartition Property or AEP, which are consequences of the weak law of large numbers. In this chapter, we will discuss weak typicality. Consider a discrete time random process X_k, k greater than or equal to 1, where X_k are i.i.d. with generic distribution p(x). Let X denote the generic random variable with entropy H(X) being finite. Let bold X be the random vector X_1, X_2 up to X_n because the random variables are i.i.d, we have p of the random vector X equals p of X_1 times p of X_2, all the way to p of X_n. Now here we assume that the alphabet X may be countably infinite. Let the base of the logarithm be 2, that is, H(X) is in bits. Theorem 5.1 is the first version of the weak asymptotic equipartition property, which says that the following three equivalent statements hold. First, minus 1 over n times log of the probability of the random vector X, tends to entropy of X in probability, as n tends to infinity. This means that for any epsilon greater than zero, limit n goes to infinity, the probability, that the absolute value of minus 1 over n, log p of the random vector X, minus entropy of X, greater than epsilon is equal to zero. Another way to say the same, is that for any epsilon greater than zero, for n sufficiently large, the probability of the absolute value of minus 1 over n, log p of X, minus entropy of X is less than or equal to epsilon, is greater than one minus epsilon. The proof of Week AEP 1 depends on the weak law of large numbers, which says that for i.i.d random variables, Y_1, Y_2 so on and so forth, with generic random variable Y, 1 over n times summation k equals one up to n Y_k, that is, the average of Y_1, Y_2 up to Y_n tends to the expectation of Y as n tends to infinity. Here's the proof of Weak AEP 1. Since X_1, X_2 up to X_n are i.i.d., we have p of the random vector X equals p(X_1) times p(X_2) all the way to p(X_n). Then minus 1 over n times log of p of the random vector X, is equal to minus 1 over n, times log of p(X_1) times p(X_2), all the way to p(X_n), which is minus 1 over n, summation k equals 1 up to n, log of p(X_k), because the log of a product is equal to the summation of the logs. Now the random variables log p(X_k) are also i.i.d. Now by the weak law of large numbers, the expression in equation one tends to minus the expectation of log of p(X), which is equal to the entropy of X. And this convergence is in probability and that proves the theorem. We now define the weakly typical set which has two parameters, n, a positive integer, and epsilon, a small positive quantity. For clarity, we mark all n's in blue and all epsilons in red. The weakly typical set W sub X epsilon sup n with respect to p(x), a distribution on the alphabet script X, is the set of sequences, small x equals x_1, x_2, up to x_n, in the set script X to the power n, such that the absolute value of minus 1 over n, log p(X) minus entropy of X, is less than or equal to epsilon, or equivalently, minus 1 over n log p(x) is between entropy of X minus epsilon and entropy of X plus epsilon. The sequences in the typical set are called weakly epsilon typical sequences. In other words, to test whether a sequence is epsilon-typical, we take a log of the probability of that sequence and multiply it by minus one over n. If it is close to the entropy of X within the margin epsilon, then the sequence is weakly epsilon-typical. We now define the empirical entropy of a sequence x as minus 1 of n log of p(x). Now p(x) is equal to the product of p(x_k) k equals 1 up to n, because the random variables are i.i.d. And log of the product of p(x_k) is equal to summation of log of p(x_k)'s. And the expression can be further written as 1 over n times summation k equals 1 up to n minus log of p(x_k). In other words, the empirical entropy, is equal to the average of the minus log of p(x_k)'s. With this definition, we can say that the empirical entropy of a weakly typical sequence is close to the true entropy H(X). Let us now take another look at definition 5.2. [BLANK_AUDIO] Theorem 5.2 is the second version of the Weak AEP which says that the following hold for any epsilon greater than zero. First, if a sequence x is weakly epsilon typical, then the probability of the sequence is lower bounded by 2 to the power minus n, times entropy of X plus epsilon, and upper bounded by 2 to the power minus n times entropy of X minus epsilon. Second, for n sufficiently large, the probability, that the random sequence X being weakly epsilon typical, is greater than 1 minus epsilon. Third, for n sufficiently large, the size of the weakly epsilon typical set, is lower bounded by 1 minus epsilon, times 2 to the power n times entropy of X minus epsilon, and it is upper bounded by 2 to the power n times entropy of X plus epsilon. Here is the proof of Weak AEP 2. From definition 5.2, for weakly epsilon typical sequence x, we have -1/n log p(x) is lower bounded by H(X) - epsilon, and upper bounded by H(X) + epsilon. By multiplying n, we have - log p(x) is lower-bounded by n(H(X) - epsilon) and upper-bounded by n(H(X) + epsilon). Multiplying by -1, and so changing direction of the inequalities, we have log p(x) lower bounded by -n(H(X) + epsilon), and upper bounded by -n(H(X) - epsilon). This is equivalent to property one. Second, property 2 is equivalent to theorem 5.1 because by definition, thee event that the random sequence x being weakly epsilon typical, is equivalent to the event that the empirical entropy of the random sequence x is close to the entropy H(X). Now to prove property 3 we use the lower bound in 1 to obtain, that the probability of the weakly typical set, is lower bounded by the size of the typical set times 2^{-n(H(X) + epsilon)}, which is the lower bound on the probability of a weakly typical set. Now because the probability of the weakly typical set is always less than or equal to 1, we obtain the size of the typical set times 2^{-n(H(X) + epsilon)} is less than or equal to 1, which implies the size of the weakly typical set is less than or equal to 2^{n(H(X) + epsilon)}. Note that this upper bound holds for any n greater than or equal to one. On the other hand, using the upper bond in 1, we obtain that the probability of the weakly typical set is upper bounded by the size of the typical set, times 2^{-n(H(X) - epsilon)} which is the upper bound on the probability of a weakly typical sequence. Using property 2, we have the probability of the weakly typical set, greater than or equal to (1 - epsilon). So we obtain (1 - epsilon) less than or equal to the size of the typical set times 2^{n(H(X) - epsilon)}. Therefore, we obtain the size of the typical set greater than or equal to (1 - epsilon) times 2^{n(H(X) - epsilon)}. Now combining 2, the upper bound on the size of the typical set and 3, the lower buond on the size of the typical set, we obtain property three, the theorem is proved. Basically the WAEP says that for large n the probability of occurrence of the sequence drawn is close to 2^{-nH(X)} with very high probability. Second, the total number of weakly typical sequences, is approximately equal to 2^{nH(X)}. However, the weak AEP does not say, that most of the sequences in script X^n that is the set of all possible sequences, are weakly typical. It also does not say that the most likely sequence is weakly typical. To illustrate these ideas, let us look at a simple example. Consider a binary random variable X such that the probability of zero is equal to 0.2 and the probability of one is equal to 0.8. With X being the generic random variable, the most likely sequence of length n is the all 1 sequence with probability equal to (0.8)^n. Now the empirical-entropy of the all 1 sequence is equal to -1/n log p(x), which is equal to -1/n log (0.8)^n. This is equal to - log (0.8) and this is not close to the H(X). And therefore the all 1 sequence is not weakly typical. This seems to be a contradiction because on the one hand, the probability of the weakly typical set is approximately equal to 1, but all 1 sequence, which is the most likely sequence, is not in the typical set. However, there is actually no contradiction because as n tends to infinity the probability of the all 1 sequence tends to 0. Therefore, the weakly typical set though does not contain all 1 sequence, can still have probability approaching 1, as n tends to infinity. The interpretation of the weak AEP is the following: One can almost think of the sequence X as being obtained by choosing a sequence from the weekly typical set, according to the uniform distribution. Specifically the size of the typical set is approximately equal to 2^{nH(X)}. If the H(X) is strictly less than log |X| which is usually the case, then 2^{nH(X)} would be much smaller than 2^{n log |X|}, which is equal to (2^{log |X|})^n, which is equal to the size of the set of all possible sequences. Therefore the size of the typical set would be much smaller than the size of the set of all possible sequences. By WEAP we also have that the probability of the typical set is approximately equal to one. For a typical sequence, the probability is approximately equal to 2^{-nH(X)}. And so, the conditional distribution on the typical set is almost uniform.