ch 5 special distributions (and some of 6 CLT)


5.1 Introduction In this chapter, we shall define and discuss several special families of distributions that are widely used in applications of probability and statistics. The distributions that will be presented here include discrete and continuous distributions of univariate, bivariate, and multivariate types. The discrete univariate distributions are the families of Bernoulli, binomial, hypergeometric, Poisson, negative binomial, and geometric distributions. The continuous univariate distributions are the families of normal, lognormal, gamma, exponential, and beta distributions. Other continuous univariate distributions (introduced in exercises and examples) are the families of Weibull and Pareto distributions. Also discussed is the multinomial family of multivariate discrete distributions, and the bivariate normal family of bivariate continuous distributions. We shall briefly describe how each of these families of distributions arise in applied problems and show why each might be an appropriate probability model for some experiment. For each family, we shall present the form of the p.f. or the p.d.f. and discuss some of the basic properties of the distributions in the family. The list of distributions presented in this chapter, or in this entire text for that matter, is not intended to be exhaustive. These distributions are known to be useful in a wide variety of applied problems. In many real-world problems, however, one will need to consider other distributions not mentioned here. The tools that we develop for use with these distributions can be generalized for use with other distributions. Our purpose in providing in-depth presentations of the most popular distributions here is to give the reader a feel for how to use probablity to model the variation and uncertainty in applied problems as well as some of the tools that get used during probability modeling. 5.2 The Bernoulli and Binomial Distributions The simplest type of experiment has only two possible outcomes, call them 0 and 1. If X equals the outcome from such an experiment, then X has the simplest type of nondegenerate distribution, which is a member of the family of Bernoulli distributions. If n independent random variables X1,...,Xn all have the same 275 276 Chapter 5 Special Distributions Bernoulli distribution, then their sum is equal to the number of the Xi’s that equal 1, and the distribution of the sum is a member of the binomial family.

5.3 The Hypergeometric Distributions In this section, we consider dependent Bernoulli random variables. A common source of dependent Bernoulli random variables is sampling without replacement from a finite population. Suppose that a finite population consists of a known number of successes and failures. If we sample a fixed number of units from that population, the number of successes in our sample will have a distribution that is a member of the family of hypergeometric distributions.

5.4 The Poisson Distributions Many experiments consist of observing the occurrence times of random arrivals. Examples include arrivals of customers for service, arrivals of calls at a switchboard, occurrences of floods and other natural and man-made disasters, and so forth. The family of Poisson distributions is used to model the number of such arrivals that occur in a fixed time period. Poisson distributions are also useful approximations to binomial distributions with very small success probabilities.

5.5 The Negative Binomial Distributions Earlier we learned that, in n Bernoulli trials with probability of success p, the number of successes has the binomial distribution with parameters n and p. Instead of counting successes in a fixed number of trials, it is often necessary to observe the trials until we see a fixed number of successes. For example, while monitoring a piece of equipment to see when it needs maintenance, we might let it run until it produces a fixed number of errors and then repair it. The number of failures until a fixed number of successes has a distribution in the family of negative binomial distributions.

5.6 The Normal Distributions The most widely used model for random variables with continuous distributions is the family of normal distributions. These distributions are the first ones we shall see whose p.d.f.’s cannot be integrated in closed form, and hence tables of the c.d.f. or computer programs are necessary in order to compute probabilities and quantiles for normal distributions.

5.7 The Gamma Distributions The family of gamma distributions is a popular model for random variables that are known to be positive. The family of exponential distributions is a subfamily of the gamma distributions. The times between successive occurrences in a Poisson process have an exponential distribution. The gamma function, related to the gamma distributions, is an extension of factorials from integers to all positive numbers.

5.8 The Beta Distributions The family of beta distributions is a popular model for random variables that are known to take values in the interval [0, 1]. One common example of such a random variable is the unknown proportion of successes in a sequence of Bernoulli trials.

5.9 The Multinomial Distributions Many times we observe data that can assume three or more possible values. The family of multinomial distributions is an extension of the family of binomial distributions to handle these cases. The multinomial distributions are multivariate distributions.

5.10 The Bivariate Normal Distributions The first family of multivariate continuous distributions for which we have a name is a generalization of the family of normal distributions to two coordinates. There is more structure to a bivariate normal distribution than just a pair of normal marginal distributions.

6.1 Introduction In this chapter, we introduce a number of approximation results that simplify the analysis of large random samples. In the first section, we give two examples to illustrate the types of analyses that we might wish to perform and how additional tools may be needed to be able to perform them.

The law of large numbers (Theorem 6.2.4) will give a mathematical foundation to the intuition that the average of a large sample of i.i.d. random variables, such as the waiting times in Example 6.1.2, should be close to their mean. The central limit theorem (Theorem 6.3.1) will give us a way to approximate the probability that the sample average is close to the mean.

6.2 The Law of Large Numbers The average of a random sample of i.i.d. random variables is called their sample mean. The sample mean is useful for summarizing the information in a random sample in much the same way that the mean of a probability distribution summarizes the information in the distribution. In this section, we present some results that illustrate the connection between the sample mean and the expected value of the individual random variables that comprise the random sample.

Summary The law of large numbers says that the sample mean of a random sample converges in probability to the mean μ of the individual random variables, if the variance exists. This means that the sample mean will be close to μ if the size of the random sample is sufficiently large. The Chebyshev inequality provides a (crude) bound on how high the probability is that the sample mean will be close to μ. Chernoff bounds can be sharper, but are harder to compute.

6.3 The Central Limit Theorem The sample mean of a large random sample of random variables with mean μ and finite variance σ2 has approximately the normal distribution with mean μ and variance σ2/n. This result helps to justify the use of the normal distribution as a model for many random variables that can be thought of as being made up of many independent parts. Another version of the central limit theorem is given that applies to independent random variables that are not identically distributed. We also introduce the delta method, which allows us to compute approximate distributions for functions of random variables.

Summary Two versions of the central limit theorem were given. They conclude that the distribution of the average of a large number of independent random variables is close to a normal distribution. One theorem requires that the random variables all have the same distribution with finite variance. The other theorem does not require that the random variables be identically distributed, but instead requires that their third moments exist and satisfy condition (6.3.9). The delta method lets us find the approximate distribution of a smooth function of a sample average.







Comments

Popular posts from this blog

ft

gillian tett 1