Chapter Ten, differential Entropy. In this course, all the random variables discussed so far are discrete random variables. In this chapter we will discuss real valued random variables, in particular, random vectors. We will discuss in depth the properties of symmetric matrices, positive definite matrices, and covariance matrices. We will then introduce differential entropy and mutual information for real valued random variables. The AEP and informational divergence for real valued random variables will be discussed. Finally, we will discuss in depth the Gaussian distribution which will be used as the noise model in the next chapter. We first discuss some very basic properties of real random variables. A real random variable X can be discrete, continuous, or mixed. This is characterized by the cumulative distribution function, or CDF, F_X(x), or simply F(x) when there is no ambiguity, which is defined as the probability that the random variable X is less than or equal to x. The random variable is discrete if F_X increases only at a countable number of values of x. Here is an illustration. The value of F_X stays constant, except for a countable number of values. The steps occur at those values of X where there is a probability mass, and the height of the step represents the value of the probability mass. The random variable is continuous if F_X is continuous, or equivalently, the probability that X is equal to x is equal to zero, for every value of x. Here is an illustration. [BLANK_AUDIO] Finally, the random variable is mixed if F_X is neither discrete nor continuous. Here is an illustration. [BLANK_AUDIO] Again, the height of the step at the discontinuity represents the value of the probability mass at that particular x. [BLANK_AUDIO] The support of the random variable X is the set of all x such that F_X is strictly greater than F_X of x minus epsilon for all epsilon greater than zero. Here is the CDF of the mixed distribution that we have seen before. And this is the support of X, which is precisely the compliment of the set of all x such that F_X stays constant. [BLANK_AUDIO] The expected value of a function g of a random variable X, is equal to integrating g(x) with respect to dF(x) over the the support S of X, where the right hand side is a Lebesgue-Stieltjes integration, which covers all cases that is discrete, continuous, and mixed for the CDF F_X. We will come back to this in a moment. [BLANK_AUDIO] A non-negative function, f_X is called a probability density function, or pdf of the random variable X if by integrating this function from minus infinity to x, it is exactly equal to the value of the CDF, evaluated at x for all values of X. By the fundamental theorem of calculus, the derivative of the CDF with respect to x, is equal to d by dx, the integral of f_X(u) du, from minus infinity to x. And this is equal to f_X(x). Note that in the last step, we have replaced the dummy variable u, in the density function f_X, by the x in red. In other words, the pdf of a random variable X is equal to the derivative of the CDF of X. If a random variable X has a pdf then it is necessarily continuous, but not vice versa. This means that it is possible for a random variable X to have a continuous CDF, but the derivative of the CDF does not exist. [BLANK_AUDIO] Let us now go back to the last slide, where we introduced the Lebesgue-Stieltjes integration. In the case that the random variable X has a pdf, dF_X(x) can be written as f_X(x)dx. For those of you who are not familiar with measure theory, you can by and large think of dF_X(x) as f_X(x)dx and keep in mind that it represents something more general. [BLANK_AUDIO] Let X and Y be two real random variables, with joint CDF F_XY, defined as the probability that X is less than or equal to x, and Y less than or equal to y. The marginal CDF of X, F_X(x), is defined as the evaluation of the joint CDF F_XY at x and infinity. A non-negative function f_XY is called a joint pdf of the random variables X and Y if by integrating f_XY(u,v), where v is from minus infinity to y and u is from minus infinity to x, we obtain the joint CDF. [BLANK_AUDIO] The conditional pdf of Y, given a particular value of X, namely f_{Y|X} is defined as f_XY divided by f_X. The conditional CDF of Y, given a particular value of X is obtained by integrating the conditional pdf of Y given a particular x from minus infinity to y. For a real random variable X, the variance is defined as the expectation of the square of X minus the expectation of X, which can easily be shown to be equal to the expectation of X square, minus the square of the expectation of X. The proof is elementary. We first start with the definition of the variance of X. By expanding the square, we obtain X square minus 2X times the expectation of X, plus the square of the expectation of X. By the linearity of the expectation operator, we obtain expectation of X square, minus two times the expectation of X, times the expectation of X, plus the square of the expectation of X. And this is equal to expectation of X square minus the square of the expectation of X. The covariance between two random variables X and Y is defined as the expectation of X minus expectation of X times Y minus the expectation of Y. And this can be shown to be expectation of XY minus expectation of X times expectation of Y. Note that when X is equal to Y, the covariance of X and Y is equal to the variance of X. Here are some remarks. First, the variance of X plus Y is equal to the variance of X plus the variance of Y, plus two times the covariance of X and Y. This is actually the reason why the covariance between X and Y is an important quantity. The proof is as follows. First the variance of X plus Y is equal to the expectation of the square of X plus Y, minus the square of the expectation of X plus Y. By expanding the first term and the second term, we obtain this expression, where expectation of X square, minus the square of the expectation of X, gives variance of X. The expectation of Y square, minus the square of the expectation of Y, gives variance of Y. And expectation of X times Y, minus expectation of X (times) expectation of Y, gives covariance of X and Y. Second, if X is independent of Y, then covariance of X and Y is equal to 0. In this case, we say that X and Y are uncorrelated. The reason is, when X and Y are independent, then the expectation of X times Y is equal to the expectation of X times the expectation of Y and hence, covariance of X and Y is equal to zero. However, the converse is not true, that is X and Y are uncorrelated does not imply that X is independent of Y. Finally, for n random variables, X_1, X_2, up to X_n, if they are mutually independent, then the variance of summation X_i is equal to the summation of variance X_i.