Chapter two. Information Measures. In this chapter, we are going to learn some basic tools in information theory. First we will talk about some basic concepts in probability. We will then introduce Shannon's information measures and prove some of their properties. Then we'll talk about some other useful information measures. And then we'll talk about some useful identities and inequalities in information theory. Section 2.1 is about independence and Markov chain. Here are some notations that will be used throughout this course. Capital letter X denotes a discrete random variable taking values in a set script X. called the alphabet of the random variable X P of X is the probability distribution for random variable X. Support of X is denoted by S of X, and this is a set of all outcomes x such that the probability is non zero. If the support of X is equal to the alphabet of X i.e. all the probability masses are positive then we say that p is strictly positive. Non-strictly positive distributions are dangerous, in the sense that we need to handle them with great care. And we are going to look at an example in Proposition 2.12 to illustrate this point. Definition 2.1 is about independence of two random variables. Two random variables X and Y are independent, denoted by X perp Y, if p(x,y) is equal to p(x) times p(y) for all x and y. Definition 2.2 is about mutual independence. For n greater than or equal to 3, random variables, X1, X2, up to Xn, are mutually independent, if p(x1, x2, ... xn), is equal to p(x1), times p(x2) all the way to p(xn). For all x1, x2, up to xn. Definition 2.3 is about Pairwise Independence. For n greater than or equal to 3, random variables X1, X2, up to Xn, are Pairwise Independent if Xi and Xj are independent for all i, j between 1 and n. i.e. any two of these variables are independent. It can be shown that Pairwise Independence is implied by mutual independence but not vice versa. Definition 2.4 is about conditional independence. For random variables, X Y and Z X is independent of Z, conditioning on Y, denoted by X perp Z, given Y if p(x,y,z) is equal to p(x,y) times p(y,z) divided by p(y) if p(y) is bigger than 0. And p(x,y,z) is equal to 0, if p(y) is equal to 0. Here are some remarks. First, if p(y) is bigger than 0, then p(x,y,z) is equal to p(x,y) times p(y,z) divided by p(y). Here, p(y,z) divided by p(y), is equal to p(z|y) and therefore we have p(x,y,z) equals p(x,y) times p(z|y). So we can as well use this in the definition for conditional independence. Conceptually, when X is independent of Z, given Y X, Y, Z are related, as shown in the following diagram. We start with a random variable, X, and then we pass it through a channel, p(y|x) to obtain the random variable Y. And then we pass Y through another channel, p(z|y), to obtain random variable Z. So the joint distribution of X, Y, and Z is given by p(x,y,z) equals p(x) times p(y|x), times p(z|y) z given y where p(x) times p(y|x) can be written as p(x,y) and so we have p(x,y,z) equals p(x,y) times p(z|y), as we have seen before.