Econometrics cumulative exam 1 summaries
In this introductory chapter, we have discussed the purpose and scope of econometric
analysis. Econometrics is used in all applied economic fields to test economic theories,
inform government and private policy makers, and to predict economic time
series. Sometimes an econometric model is derived from a formal economic model,
but in other cases econometric models are based on informal economic reasoning and
intuition. The goal of any econometric analysis is to estimate the parameters in the
model and to test hypotheses about these parameters; the values and signs of the
parameters determine the validity of an economic theory and the effects of certain
policies.
Cross-sectional, time series, pooled cross-sectional, and panel data are the most
common types of data structures that are used in applied econometrics. Data sets
involving a time dimension, such as time series and panel data, require special treatment
because of the correlation across time of most economic time series. Other issues,
such as trends and seasonality, arise in the analysis of time series data but not crosssectional
data.
In Section 1.4, we discussed the notions of ceteris paribus and causal inference. In
most cases, hypotheses in the social sciences are ceteris paribus in nature: all other relevant
factors must be fixed when studying the relationship between two variables.
Because of the nonexperimental nature of most data collected in the social sciences,
uncovering causal relationships is very challenging.
Chapter 2 regression model
We have introduced the simple linear regression model in this chapter, and we have covered
its basic properties. Given a random sample, the method of ordinary least squares
is used to estimate the slope and intercept parameters in the population model. We have
demonstrated the algebra of the OLS regression line, including computation of fitted
values and residuals, and the obtaining of predicted changes in the dependent variable
for a given change in the independent variable. In Section 2.4, we discussed two issues
of practical importance: (1) the behavior of the OLS estimates when we change the
units of measurement of the dependent variable or the independent variable; (2) the use
of the natural log to allow for constant elasticity and constant semi-elasticity models.
In Section 2.5, we showed that, under the four Assumptions SLR.1 through SLR.4,
the OLS estimators are unbiased. The key assumption is that the error term u has zero
mean given any value of the independent variable x. Unfortunately, there are reasons to
think this is false in many social science applications of simple regression, where the
omitted factors in u are often correlated with x. When we add the assumption that the
variance of the error given x is constant, we get simple formulas for the sampling variances
of the OLS estimators. As we saw, the variance of the slope estimator ˆ
1 increases
as the error variance increases, and it decreases when there is more sample variation in
the independent variable. We also derived an unbiased estimator for
2 Var(u).
In Section 2.6, we briefly discussed regression through the origin, where the slope
estimator is obtained under the assumption that the intercept is zero. Sometimes this is
useful, but it appears infrequently in applied work.
Much work is left to be done. For example, we still do not know how to test
hypotheses about the population parameters, 0 and 1. Thus, although we know that
OLS is unbiased for the population parameters under Assumptions SLR.1 through
SLR.4, we have no way of drawing inference about the population. Other topics, such
as the efficiency of OLS relative to other possible procedures, have also been omitted.
The issues of confidence intervals, hypothesis testing, and efficiency are central to
multiple regression analysis as well. Since the way we construct confidence intervals
and test statistics is very similar for multiple regression—and because simple regression
is a special case of multiple regression—our time is better spent moving on to multiple
regression, which is much more widely applicable than simple regression. Our
purpose in Chapter 2 was to get you thinking about the issues that arise in econometric
analysis in a fairly simple setting.
Chapter 3 multiple regression analysis
1. The multiple regression model allows us to effectively hold other factors fixed
while examining the effects of a particular independent variable on the dependent variable.
It explicitly allows the independent variables to be correlated.
2. Although the model is linear in its parameters, it can be used to model nonlinear
relationships by appropriately choosing the dependent and independent variables.
3. The method of ordinary least squares is easily applied to the multiple regression
model. Each slope estimate measures the partial effect of the corresponding independent
variable on the dependent variable, holding all other independent variables fixed.
4. R2 is the proportion of the sample variation in the dependent variable explained by
the independent variables, and it serves as a goodness-of-fit measure. It is important not
to put too much weight on the value of R2 when evaluating econometric models.
5. Under the first four Gauss-Markov assumptions (MLR.1 through MLR.4), the
OLS estimators are unbiased. This implies that including an irrelevant variable in a
model has no effect on the unbiasedness of the intercept and other slope estimators. On
the other hand, omitting a relevant variable causes OLS to be biased. In many circumstances,
the direction of the bias can be determined.
6. Under the five Gauss-Markov assumptions, the variance of an OLS slope estimator
is given by the variance equation. as the error variance sigma squared increases, so does the variance of beta j, while it decreases as the sample variance in x j and SST increases. the term r squared measures the amount of colinearity between x j and the other explanatory variables. as r squared approaches one, var beta j is unbounded.
7. Adding an irrelevant variable to an equation generally increases the variances of
the remaining OLS estimators because of multicollinearity.
8. Under the Gauss-Markov assumptions (MLR.1 through MLR.5), the OLS estimators
are best linear unbiased estimators (BLUE).
Chatpter 4 regression analyis with cross sectional data, and inference
In this chapter, we have covered the very important topic of statistical inference, which
allows us to infer something about the population model from a random sample. We
summarize the main points:
1. Under the classical linear model assumptions MLR.1 through MLR.6, the OLS
estimators are normally distributed.
2. Under the CLM assumptions, the t statistics have t distributions under the null
hypothesis.
3. We use t statistics to test hypotheses about a single parameter against one- or twosided
alternatives, using one- or two-tailed tests, respectively. The most common
null hypothesis is H0: j 0, but we sometimes want to test other values of j
under H0.
4. In classical hypothesis testing, we first choose a significance level, which, along
with the df and alternative hypothesis, determines the critical value against which
we compare the t statistic. It is more informative to compute the p-value for a t
test—the smallest significance level for which the null hypothesis is rejected—so
that the hypothesis can be tested at any significance level.
5. Under the CLM assumptions, confidence intervals can be constructed for each j.
These CIs can be used to test any null hypothesis concerning j against a twosided
alternative.
6. Single hypothesis tests concerning more than one j can always be tested by
rewriting the model to contain the parameter of interest. Then, a standard t statistic
can be used.
7. The F statistic is used to test multiple exclusion restrictions, and there are two
equivalent forms of the test. One is based on the SSRs from the restricted and
unrestricted models. A more convenient form is based on the R-squareds from the
two models.
8. When computing an F statistic, the numerator df is the number of restrictions
being tested, while the denominator df is the degrees of freedom in the unrestricted
model.
9. The alternative for F testing is two-sided. In the classical approach, we specify a
significance level which, along with the numerator df and the denominator df,
determines the critical value. The null hypothesis is rejected when the statistic, F,
exceeds the critical value, c. Alternatively, we can compute a p-value to summarize
the evidence against H0.
10. General multiple linear restrictions can be tested using the sum of squared residuals
form of the F statistic.
11. The F statistic for the overall significance of a regression tests the null hypothesis
that all slope parameters are zero, with the intercept unrestricted. Under H0, the
explanatory variables have no effect on the expected value of y.
analysis. Econometrics is used in all applied economic fields to test economic theories,
inform government and private policy makers, and to predict economic time
series. Sometimes an econometric model is derived from a formal economic model,
but in other cases econometric models are based on informal economic reasoning and
intuition. The goal of any econometric analysis is to estimate the parameters in the
model and to test hypotheses about these parameters; the values and signs of the
parameters determine the validity of an economic theory and the effects of certain
policies.
Cross-sectional, time series, pooled cross-sectional, and panel data are the most
common types of data structures that are used in applied econometrics. Data sets
involving a time dimension, such as time series and panel data, require special treatment
because of the correlation across time of most economic time series. Other issues,
such as trends and seasonality, arise in the analysis of time series data but not crosssectional
data.
In Section 1.4, we discussed the notions of ceteris paribus and causal inference. In
most cases, hypotheses in the social sciences are ceteris paribus in nature: all other relevant
factors must be fixed when studying the relationship between two variables.
Because of the nonexperimental nature of most data collected in the social sciences,
uncovering causal relationships is very challenging.
Chapter 2 regression model
We have introduced the simple linear regression model in this chapter, and we have covered
its basic properties. Given a random sample, the method of ordinary least squares
is used to estimate the slope and intercept parameters in the population model. We have
demonstrated the algebra of the OLS regression line, including computation of fitted
values and residuals, and the obtaining of predicted changes in the dependent variable
for a given change in the independent variable. In Section 2.4, we discussed two issues
of practical importance: (1) the behavior of the OLS estimates when we change the
units of measurement of the dependent variable or the independent variable; (2) the use
of the natural log to allow for constant elasticity and constant semi-elasticity models.
In Section 2.5, we showed that, under the four Assumptions SLR.1 through SLR.4,
the OLS estimators are unbiased. The key assumption is that the error term u has zero
mean given any value of the independent variable x. Unfortunately, there are reasons to
think this is false in many social science applications of simple regression, where the
omitted factors in u are often correlated with x. When we add the assumption that the
variance of the error given x is constant, we get simple formulas for the sampling variances
of the OLS estimators. As we saw, the variance of the slope estimator ˆ
1 increases
as the error variance increases, and it decreases when there is more sample variation in
the independent variable. We also derived an unbiased estimator for
2 Var(u).
In Section 2.6, we briefly discussed regression through the origin, where the slope
estimator is obtained under the assumption that the intercept is zero. Sometimes this is
useful, but it appears infrequently in applied work.
Much work is left to be done. For example, we still do not know how to test
hypotheses about the population parameters, 0 and 1. Thus, although we know that
OLS is unbiased for the population parameters under Assumptions SLR.1 through
SLR.4, we have no way of drawing inference about the population. Other topics, such
as the efficiency of OLS relative to other possible procedures, have also been omitted.
The issues of confidence intervals, hypothesis testing, and efficiency are central to
multiple regression analysis as well. Since the way we construct confidence intervals
and test statistics is very similar for multiple regression—and because simple regression
is a special case of multiple regression—our time is better spent moving on to multiple
regression, which is much more widely applicable than simple regression. Our
purpose in Chapter 2 was to get you thinking about the issues that arise in econometric
analysis in a fairly simple setting.
Chapter 3 multiple regression analysis
1. The multiple regression model allows us to effectively hold other factors fixed
while examining the effects of a particular independent variable on the dependent variable.
It explicitly allows the independent variables to be correlated.
2. Although the model is linear in its parameters, it can be used to model nonlinear
relationships by appropriately choosing the dependent and independent variables.
3. The method of ordinary least squares is easily applied to the multiple regression
model. Each slope estimate measures the partial effect of the corresponding independent
variable on the dependent variable, holding all other independent variables fixed.
4. R2 is the proportion of the sample variation in the dependent variable explained by
the independent variables, and it serves as a goodness-of-fit measure. It is important not
to put too much weight on the value of R2 when evaluating econometric models.
5. Under the first four Gauss-Markov assumptions (MLR.1 through MLR.4), the
OLS estimators are unbiased. This implies that including an irrelevant variable in a
model has no effect on the unbiasedness of the intercept and other slope estimators. On
the other hand, omitting a relevant variable causes OLS to be biased. In many circumstances,
the direction of the bias can be determined.
6. Under the five Gauss-Markov assumptions, the variance of an OLS slope estimator
is given by the variance equation. as the error variance sigma squared increases, so does the variance of beta j, while it decreases as the sample variance in x j and SST increases. the term r squared measures the amount of colinearity between x j and the other explanatory variables. as r squared approaches one, var beta j is unbounded.
7. Adding an irrelevant variable to an equation generally increases the variances of
the remaining OLS estimators because of multicollinearity.
8. Under the Gauss-Markov assumptions (MLR.1 through MLR.5), the OLS estimators
are best linear unbiased estimators (BLUE).
Chatpter 4 regression analyis with cross sectional data, and inference
In this chapter, we have covered the very important topic of statistical inference, which
allows us to infer something about the population model from a random sample. We
summarize the main points:
1. Under the classical linear model assumptions MLR.1 through MLR.6, the OLS
estimators are normally distributed.
2. Under the CLM assumptions, the t statistics have t distributions under the null
hypothesis.
3. We use t statistics to test hypotheses about a single parameter against one- or twosided
alternatives, using one- or two-tailed tests, respectively. The most common
null hypothesis is H0: j 0, but we sometimes want to test other values of j
under H0.
4. In classical hypothesis testing, we first choose a significance level, which, along
with the df and alternative hypothesis, determines the critical value against which
we compare the t statistic. It is more informative to compute the p-value for a t
test—the smallest significance level for which the null hypothesis is rejected—so
that the hypothesis can be tested at any significance level.
5. Under the CLM assumptions, confidence intervals can be constructed for each j.
These CIs can be used to test any null hypothesis concerning j against a twosided
alternative.
6. Single hypothesis tests concerning more than one j can always be tested by
rewriting the model to contain the parameter of interest. Then, a standard t statistic
can be used.
7. The F statistic is used to test multiple exclusion restrictions, and there are two
equivalent forms of the test. One is based on the SSRs from the restricted and
unrestricted models. A more convenient form is based on the R-squareds from the
two models.
8. When computing an F statistic, the numerator df is the number of restrictions
being tested, while the denominator df is the degrees of freedom in the unrestricted
model.
9. The alternative for F testing is two-sided. In the classical approach, we specify a
significance level which, along with the numerator df and the denominator df,
determines the critical value. The null hypothesis is rejected when the statistic, F,
exceeds the critical value, c. Alternatively, we can compute a p-value to summarize
the evidence against H0.
10. General multiple linear restrictions can be tested using the sum of squared residuals
form of the F statistic.
11. The F statistic for the overall significance of a regression tests the null hypothesis
that all slope parameters are zero, with the intercept unrestricted. Under H0, the
explanatory variables have no effect on the expected value of y.
Comments
Post a Comment