Chapter 4 EX

Chapter 4 expectations expected values

The Expectation of a Random Variable The distribution of a random variable X contains all of the probabilistic information about X. The entire distribution of X, however, is usually too cumbersome for presenting this information. Summaries of the distribution, such as the average value, or expected value, can be useful for giving people an idea of where we expect X to be without trying to describe the entire distribution. The expected value also plays an important role in the approximation methods that arise in Chapter 6

Expectation of random variables Summary The expectation, expected value, or mean of a random variable is a summary of its distribution. If the probability distribution is thought of as a distribution of mass along the real line, then the mean is the center of mass. The mean of a function r of a random variable X can be calculated directly from the distribution of X without first finding the distribution of r(X). Similarly, the mean of a function of a random vector X can be calculated directly from the distribution of X.

In this section, we present some results that simplify the calculation of expectations for some common functions of random variables.

4.2 properties of expectations Summary The mean of a linear function of a random vector is the linear function of the mean. In particular, the mean of a sum is the sum of the means. As an example, the mean of the binomial distribution with parameters n and p is np. No such relationship holds in general for nonlinear functions. For independent random variables, the mean of the product is the product of the means.

4.3 Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution. For example, a random variable X with mean 2 has the same mean as the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant. To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is. The variance of X is one such measure. The standard deviation of X is the square root of the variance. The variance also plays an important role in the approximation methods that arise in Chapter 6.

Summary The variance of X, denoted by Var(X), is the mean of [X − E(X)] 2 and measures how spread out the distribution of X is. The variance also equals E(X2) − [E(X)] 2. The standard deviation is the square root of the variance. The variance of aX + b, where a and b are constants, is a2 Var(X). The variance of the sum of independent random variables is the sum of the variances. As an example, the variance of the binomial distribution with parameters n and p is np(1 − p). The interquartile range (IQR) is the difference between the 0.75 and 0.25 quantiles. The IQR is a measure of spread that exists for every distribution.


4.4 Moments For a random variable X, the means of powers Xk (called moments) for k > 2 have useful theoretical properties, and some of them are used for additional summaries of a distribution. The moment generating function is a related tool that aids in deriving distributions of sums of independent random variables and limiting properties of distributions.

Summary
If the kth moment of a random variable exists, then so does the j th moment for every
j<k. The moment generating function of X, ψ(t) = E(etX), if it is finite for t in a
neighborhood of 0, can be used to find moments of X. The kth derivative of ψ(t) at
t = 0 is E(Xk). The m.g.f. characterizes the distribution in the sense that all random
variables that have the same m.g.f. have the same distribution.

4.5 The Mean and the Median Although the mean of a distribution is a measure of central location, the median (see Definition 3.3.3) is also a measure of central location for a distribution. This section presents some comparisons and contrasts between these two location summaries of a distribution.

Summary A median of X is any number m such that Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2. To minimize E(|X − d|) by choice of d, one must choose d to be a median of X. To minimize E[(X − d)2] by choice of d, one must choose d = E(X).

4.6 Covariance and Correlation When we are interested in the joint distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other. The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence.

Summary The covariance of X and Y is Cov(X, Y ) = E{[X − E(X)][Y − E(Y )]}. The correlation is ρ(X, Y ) = Cov(X, Y )/[Var(X) Var(Y )] 1/2, and it measures the extent to which X and Y are linearly related. Indeed, X and Y are precisely linearly related if and only if |ρ(X, Y )| = 1. The variance of a sum of random variables can be expressed as the sum of the variances plus two times the sum of the covariances. The variance of a linear function is Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ) + 2ab Cov(X, Y ).

4.7 Conditional Expectation Since expectations (including variances and covariances) are properties of distributions, there will exist conditional versions of all such distributional summaries as well as conditional versions of all theorems that we have proven or will later prove about expectations. In particular, suppose that we wish to predict one random variable Y using a function d(X) of another random variable X so as to minimize E([Y − d(X)] 2). Then d(X) should be the conditional mean of Y given X. There is also a very useful theorem that is an extension to expectations of the law of total probability.

Summary The conditional mean E(Y |x) of Y given X = x is the mean of the conditional distribution of Y given X = x. This conditional distribution was defined in Chapter 3. Likewise, the conditional variance Var(Y |x) of Y given X = x is the variance of the conditional distribution. The law of total probability for expectations says that E[E(Y |X)] = E(Y ). If we will observe X and then need to predict Y , the predictor that leads to the smallest M.S.E. is the conditional mean E(Y |X).

4.8 Utility Much of statistical inference consists of choosing between several available actions. Generally, we do not know for certain which choice will be best, because some important random variable has not yet been observed. For some values of that random variable one choice is best, and for other values some other choice is best. We can try to weigh the costs and benefits of the various choices against the probabilities that the various choices turn out to be best. Utility is one tool for assigning values to the costs and benefits of our choices. The expected value of the utility then balances the costs and benefits according to how likely the uncertain possibilities are

Summary When we have to make choices in the face of uncertainty, we need to assess what our gains and losses will be under each of the uncertain possibilities. Utility is the value to us of those gains and losses. For example, if X represents the random gain from a possible choice, then U (X) is the value to us of the random gain we would receive if we were to make that choice. We should make the choice such that E[U (X)] is as large as possible.











Comments

Popular posts from this blog

ft

karpatkey