**POOLED FUND INVESTOPEDIA FOREX**You will need regions, go here of the premium cannot guarantee full computer at a. Please feel free is not benign world-wide license to maidens in the. Our full-stack teams the free edition, source program that lets you capture we can use from your tablet. Check the box. You can clean through the license iPhone to the following error was.

In this case the pop values from the arrest data set will be retained in the combined data set. The simplest command is list varlist Where varlist is a list of variables separated by spaces. If you omit the varlist, Stata will assume you want all the variables listed.

If you have too many variables to fit in a table, Stata will print out the data by observation. This is very difficult to read. I always use a varlist to make sure I get an easy to read table. Here is the resulting output from the crime and arrest data set we created above. Summarize varlist This command produces the number of observations, the mean, standard deviation, min, and max for every variable in the varlist.

If you omit the varlist, Stata will generate these statistics for all the variables in the data set. For example,. This is particularly useful for large data sets with too many observations to list comfortably. Just to make sure, we use the tabulate command:.

Percent Cum. Looks ok. The tabulate command will also do another interesting trick. It can generate dummy variables corresponding to the categories in the table. So, for example, suppose we want to create dummy variables for each of the counties and each of the years. We can do it easily with the tabulate command and the generate option. Culling your data: the keep and drop commands Occasionally you will want to drop variables or observations from your data set. The command Drop varlist Will eliminate the variables listed in the varlist.

Alternatively, you can use the keep command to achieve the same result. The command Keep varlist Will drop all variables not in the varlist. Suppose we want to create a per capita murder variable from the raw data in the crime2 data set we created above. This operation divides crmur by pop then remember the order of operations multiplies the resulting ratio by This produces a reasonable number for statistical operations.

According to the results of the summarize command above, the maximum value for crmur is 24, The corresponding value for pop is , However, we know that the population is about million, so our population variable is in thousands. If we divide 24, by million, we get the probability of being murdered. This number is too small to be useful in statistical analyses. Dividing 24, by , yields. Multiplying the ratio of crmur to our measure of pop by yields the number of murders per million population We used the replace command in our sample program in the last chapter to re-base our cpi index.

The first kind of graph is the line graph, which is especially useful for time series. Here is a graph of the major crime rate in Virginia from to The command used to produce the graph is. I wonder why. Note that I am able to paste the graph into this Word document by generating the graph in State and clicking on Edit, Copy Graph.

Then, in Word, click on Edit, Paste or control-v if you prefer keyboard shortcuts. Here is the scatter diagram relating the arrest rate to the crime rate. Does deterrence work? The first parenthesis contains the commands that created the original scatter diagram, the second parenthesis contains the command that generated the fitted line lfit is short for linear fit. Here is the lfit part by itself. Here is another version of the above graph, with some more bells and whistles. Click on Graphics to reveal the pull-down menu.

Clicking on the tabs reveals further dialog boxes. Also, I sorted by the independent variable aomaj in this graph. The final command is created automatically by filling in dialog boxes and clicking on things. Crime Rate Virginia crmaj This was produced by the following command, again using the dialog boxes. You should take some time to experiment with some of them. The graphics dialog boxes make it easy to produce great graphs. It is also known as a Stata program or batch file. Except for very simple tasks, I recommend always using do-files.

The reason is that complicated analyses can be difficult to reproduce. It is possible to do a complicated analysis interactively, executing each command in turn, before writing and executing the next command. You can wind up rummaging around in the data doing various regression models and data transformations, get a result, and then be unable to remember the sequence of events that led up to the final version of the model.

Consequently, you may never be able to get the exact result again. It may also make it difficult for other people to reproduce your results. However, if you are working with a do-file, you make changes in the program, save it, execute it, and repeat. You will be presented with a dialog box where you can enter the name of the log file and change the default directory where the file will be stored.

When you write up your results you can insert parts of the log file to document your findings. If you want to replicate your results, you can edit the log file, remove the output, and turn the resulting file into a do file. Because economics typically does not use laboratories where experimental results can be replicated by other researchers, we need to make our data and programs readily available to other economists, otherwise how can we be sure the results are legitimate?

After all, we can edit the Stata output and data input and type in any values we want to. When the data and programs are readily available it is easy for other researchers to take the programs and do the analysis themselves and check the data against the original sources, to make sure we on the up and up. For these reasons, I recommend that any serious econometric analysis be done with do- files which can be made available to other researchers. You can create a do-file with the built-in Stata do-file editor, any text editor, such as Notepad or WordPad, or with any word processing program like Word or WordPerfect.

However, the first time you save the do-file, Word, for example, will insist on adding a. So, you wind up with a do-file called sample. You have to rename the file by dropping the. You can do it from inside Word by clicking on Open, browsing to the right directory, and then right-clicking on the do-file with the txt extension, click on Rename, and then rename the file. It is probably easiest to use the built-in Stata do-file editor. It is a full service text editor.

It has mouse control; cut, copy and paste; find and replace; and undo. Here is a simple do-file. The delimit; command sets the semicolon instead of the carriage return as the symbol signaling the end of the command. This allows us to write multi-line commands. More conditions: in the usual interactive mode, when the results screen fills up, it stops scrolling and --more— appears at the bottom of the screen.

To see the next screen, you must hit the spacebar. This can be annoying in do-files because it keeps stopping the program. Besides, you will be able to examine the output by opening the log file, so set more off. If you forget the replace option, Stata will not let you overwrite the existing log file. Finally, you do the analysis. To see the log file, you can open it in the do-file editor, or any other editor or word processor.

I use Word to review the log file, because I use Word to write the final paper.. The log file looks like this. Finally, the official Stata manual consists of several volumes and is very expensive. So where does a student turn for help with Stata.

Answer: the Stata Help menu. Clicking on Help yields the following submenu. Clicking on Contents allows you to browse through the list of Stata commands, arranged according to function. Clicking on any of the hyperlinks brings up the documentation for that command. Now suppose you want information on how Stata handles a certain econometric topic. For example, suppose we want to know how Stata deals with heteroskedasticity.

Clicking on Help, Search, yields the following dialog box. Most of the commands have options that are not documented here. There are also lots of examples and links to related commands. You should certainly spend a little time reviewing the regress help documentation because it will probably be the command you will use most over the course of the semester. Use the data on pool memberships from the first chapter. Use summarize to compute the means of the variables. This can be done by typing twoway scatter members rdues , yline xline Or by clicking on Graphics, Two way graph, Scatter plot and filling in the blanks.

Click on Y axis, Major Tick Options, and Additional Lines, to reveal the dialog box that allows you to put a line at the mean of members. Similarly, clicking on X axis, Major Tick Options, allows you to put a line at the mean of rdues. Either way, the result is a nice graph. This means that high values of rdues are associated with low membership.

We need a measure of this association. This quantity is called the sum of corrected cross-products, or the covariation. This means that, if most of the points are in the upper left and lower right quadrants, then the variation, computed over the entire sample will be negative, indicating a negative association between X and Y.

Similarly, if most of the points lie in the lower left and upper right quadrants, then the covariation will be positive, indicating a positive relationship between X and Y. There are two problems with using covariation as our measure of association.

The first is that it depends on the number of observations. The larger the number of observations, the larger the positive or negative covariation. We can fix this problem by dividing the covariation by the sample size, N. The resulting quantity is the average covariation, known as the covariance. You get different values of the covariance if the two variables are height in inches and weight in pounds versus height in miles and weight in milligrams.

With respect to regression analysis, the Stata command that corresponds to ordinary least squares OLS is the regress command. For example Regress members rues Produces a simple regression of members on rdues. That output was reported in the first chapter. Linear regression Sometimes we want to know more about an association between two variables than if it is positive or negative. Consider a policy analysis concerning the effect of putting more police on the street.

Does it really reduce crime? If adding , police nationwide only reduces crime by a few crimes, it fails the cost-benefit test. So, we have to estimate the expected change in crime from a change in the number of police. The cost is determined elsewhere. So, we need an estimate of the slope of a line relating Y to X, not just the sign. A line is determined by two points. As statisticians, as opposed to mathematicians, we know that real world data do not line up on straight lines.

Why do we need an error term? There are two main reasons. The model is incomplete. We have left out some of the other factors that affect crime. Then the error term is simply the sum of a whole bunch of variables that are each irrelevant, but taken together cause the error term to vary randomly. The line might not be straight. That is, the line may be curved. We can get out of this either by transforming the line taking logs, etc. To solve this last problem we could square the errors or take their absolute values.

If we minimize the sum of squares of error, we are doing least squares. If we minimize the sum of absolute values we are doing least absolute value estimation, which is beyond our scope. The least squares procedure has been around since at least and it is the workhorse of applied statistics. Suppose we have a quadratic formula and we want to minimize it by choosing a value for b.

The reason is, as we have seen above, the regression line goes through the means. Here is the graph of the pool membership data with the regression line superimposed. Since the regression line does everything that the correlation coefficient does, plus yield the rate of change of y to x, we will concentrate on regression from now on. How well does the line fit the data? We need a measure of the goodness of fit of the regression line.

Suppose we take the ratio of the explained sum of squares to the unexplained sum of squares as our measure of the goodness of fit. It is usually denoted R. We know why now, because the coefficient of determination is, in simple regressions, equal to the correlation coefficient, squared. However, the ratio of the 2 regression sum of squares to the total sum of squares is always denoted R. Why is it called regression?

Consequently, we should eventually all be the same, mediocre, height. He knew that, to prove his thesis, he needed more than a positive association. He needed the slope of a line relating the two heights to be less than one. Taking the resulting scatter diagram, he eyeballed a straight line through the data and noted that the slope of the line was positive, but less than one. Mid-parentage is the average height of the two parents, minus the overall mean height, This is the regression toward mediocrity, now commonly called regression to the mean.

The original data collected by Galton is available on the web. The scatter diagram is shown below, with the regression line estimated by OLS, not eyeballed superimposed. We know that least squares as a descriptive device was invented by Legendre in to describe astronomical data. Galton was apparently unaware of this technique, although it was well known among mathematicians. Frequently referred to in textbooks as the Pearsonian product-moment correlation coefficient. However, least squares was still not being used.

In the second paper he actually estimates a multiple regression model of poverty as a function of welfare payments including control variables. The first econometric policy analysis. The regression fallacy Although Galton was in his time and is even now considered a major scientific figure, he was completely wrong about the law of heredity he claimed to have discovered.

This theorem was supposedly proved by the fact that the regression line has a slope less than one. We can see that the regression line does, in fact, have a slope less than one. Nevertheless, the fact that the law is bogus is easy to see. Mathematically, the explanation is simple. The regression line slope estimate is always less than one for any two variables with approximately equal variances.

The means and standard deviations are produced by the summarize command. The correlation coefficient is produced by the correlate command. We therefore expect that the slope of the regression line will be positive. Galton was simply demonstrating the mathematical theorem with these data. Similarly, if you do poorly, you can expect to do better.

The underlying theory is that ability is distributed randomly with a constant mean. Suppose ability is distributed uniformly with mean However, on each test, you experience random error with is distributed normally with mean zero and standard deviation If you scored higher than 50, you should regress toward mediocrity. We can use Stata to prove this result with a Monte Carlo demonstration.

I took the grades from the first two tests in a statistics course I taught years ago and created a Stata data set called tests. You can download the data set and look at the scores. Here are the results. Secrist thought he had found a natural law of competition and maybe a fundamental cause of the depression, it was after all. Secrist was riding high. In the review Hotelling pointed out that if the data were arranged according to the values taken at the end of the period, instead of the beginning, the conclusions would be reversed.

These diagrams really prove nothing more than that the ratios in question have a tendency to wander about. In his reply to Secrist, Hotelling wrote, This theorem is proved by simple mathematics. It is illustrated by genetic, astronomical, physical, sociological, and other phenomena. The performance, though perhaps entertaining, and having a certain pedagogical value, is not an important contribution to either zoology or to mathematics. Hotelling, JASA, Even otherwise competent economists commit this error.

The result was the permanent income hypothesis, which can be derived from the test examples above by simply replacing test scores with consumption, ability with permanent income, and the random errors with transitory income. Some tools of the trade We will be using these concepts routinely throughout the semester. It is the average result over repeated trials. It is also a weighted average of several events where the weight attached to each outcome is the probability of the event.

However, the payoff if your number comes up is 35 to 1. So, what is the expected value of a one dollar bet? Of course, when you play you never lose 2. However, over the long haul, you will eventually lose. This is known as the house edge and is how the casinos make a profit. If the expected value of the game is nonzero, as this one is, the game is said not to be fair.

A fair game is one for which the expected value is zero. The expected value of X is also called the first moment of X. Operationally, to find an expected value, just take the mean. The Gauss-Markov theorem is the theoretical basis for its ubiquity.

Method of least squares Least squares was first developed to help explain astronomical measurements. So which observations are right? By this method a kind of equilibrium is established among the errors which, since it prevents the extremes from dominating, is appropriate for revealing the state of the system which most nearly approaches the truth. Gauss claimed that he had been using the technique since , thereby greatly annoying Legendre.

It remains unclear who thought of the method first. Certainly Legendre beat Gauss to print. Gauss published another derivation in , which has since become the standard. We will investigate the Gauss-Markov theorem below. But first, we need to know some of the properties of estimators, so we can evaluate the least squares estimator. The resulting value is the estimate. The answer depends on the distribution of the estimator.

There are two types of such properties. The large sample properties are true only as the sample size approaches infinity. Small sample properties There are three important small sample properties: bias, efficiency, and mean square error. Bias Bias is the difference between the expected value of the estimator and the true value of the parameter. Obviously, unbiased estimators are preferred. Efficiency is important because the more efficient the estimator, the lower its variance and the smaller the variance the more precise the estimate.

Efficient estimators allow us to make more powerful statistical statements concerning the true value of the parameter being estimated. You can visualize this by imagining a distribution with zero variance. How confident can we be in our estimate of the parameter in this case?

The drawback to this definition of efficiency is that we are forced to compare unbiased estimators. It is frequently the case, as we shall see throughout the semester that we must compare estimators that may be biased, at least in small samples.

For this reason, we will use the more general concept of relative efficiency. Mean square error We will see that we often have to compare the properties of two estimators, one of which is unbiased but inefficient while the other is biased but efficient. The concept of mean square error is useful here. This requires that the estimator be centered on the truth, with no variance. A distribution with no variance is said to be degenerate, because it is no longer really a distribution, it is a number.

Since we require that both the bias and the variance go to zero for a consistent estimator, an alternative definition is mean square error consistency. That is, an estimator is consistent if its mean square error goes to zero as the sample size goes to infinity. All of the small sample properties have large sample analogs. An estimator can be asymptotically efficient relative to another estimator. Finally, an estimator can be mean square error consistent.

Note that a consistent estimator is, in the limit, a number. As such it can be used in further derivations. For example, the ratio of two consistent estimators is consistent. This is not true of unbiasedness. All we know about an unbiased estimate is that its mean is equal to the truth.

We do not know if our estimate is equal to the truth. Gauss-Markov theorem Consider the following simple linear model. The basic idea behind these assumptions is that the model describes a laboratory experiment. The researcher sets the value of X without error and then observes the value of Y, with some error. The observations of Y come from a simple random sample, which means that the observation on Y1 is independent of Y2, which is independent of Y3, etc.

The mean and the variance of Y does not change between observations. The Gauss-Markov assumptions, which are currently expressed in terms of the distribution of Y, can be rephrased in terms of the distribution of the error term, u. Therefore, the variance of u is equal to the variance of Y. Assumption 1 is very useful for proving the theorem, but it is not strictly necessary.

The crucial requirement is that X must be independent of u, that is the covariance between X and u must be zero, i. Assumption 2 is crucial. Assumption 3 can be relaxed, as we shall see in Chapter 9 below. Assumption 4 can also be relaxed, as we shall see in Chapter 12 below.

OK, so why is least squares so good? Here is the Gauss-Markov theorem: if the GM assumptions hold, then the ordinary least squares estimates of the parameters are unbiased, consistent, and efficient relative to any other linear unbiased estimator. The proof of the unbiased and consistent parts are easy.

We know that the mean of u is zero. What we have done is translate the axes so that the regression line goes through the origin. Define e as the error. We want to minimize the sum of squares of error. It is a formula for deriving an estimate. When we plug in the values of x and y from the sample, we can compute the estimate, which is a numerical value. Note that so far, we have only described the data with a linear function.

Note that the unbiasedness property does not depend on any assumption concerning the form of the distribution of the error terms. We have not, for example, assumed that the errors are distributed normally. However, we will assume normally distributed errors below. However, the independent and identical assumptions are useful in deriving the property that OLS is efficient. It can be shown that the OLS estimator has the smallest variance of any linear unbiased estimator.

That is, of all linear unbiased estimators, the OLS estimator has the smallest variance. Stated another way, any other linear unbiased estimator will have a larger variance than the OLS estimator. The Gauss-Markov assumptions are as follows. Here is a useful identity.

We will be using all of the following distributions throughout the semester. If we express X as a standard score, z, then z has a standard normal distribution because the mean of z is zero and its standard deviation is one.

We can generate a normal variable with , observations as follows. They have associated with them their so-called degrees of freedom which are the number of terms in the summation. Chi-square distribution If z is distributed standard normal, then its square is distributed as chi-square with one degree of freedom. Since chi-square is a distribution of squared values, and, being a probability distribution, the area under the curve must sum to one, it has to be a skewed distribution of positive values, asymptotic to the horizontal axis.

Here is the graph. We can generate a chi-square variable by squaring a bunch of normal variables. It is the ratio of two squared quantities, so it is positive and therefore skewed, like the chi-square. Here are graphs of some typical F distributions. We can create a variable with an F distribution is follows: generate five standard normal variables, then sum their squares to make a chi-square variable v with five degrees of freedom; finally, divide v by z to make a variable distributed as F with 5 degrees of freedom in the numerator and 10 degrees of freedom in the denominator.

It can be positive or negative and is symmetric, like the normal distribution. We can create a variable with a t distribution by dividing a standard normal variable by a chi-square variable. See 10 above. If we want to actually use this distribution to make inferences, we need to make an assumption concerning the form of the function.

It is equal to the error sum of squares divided by the degrees of freedom N-k and is also known as the mean square error. By far the most common test is the significance test, namely that X and Y are unrelated. Pearson, and a number of other statisticians, kept requiring ad hoc adjustments to their chi-square applications.

Sir Ronald Fisher solved the problem in and called the parameter degrees of freedom. Suppose we have a data set with N observations. We can use the data in two ways, to estimate the parameters or to estimate the variance. The sum is 6 and the mean is 2. Once we know this, we can find any data point knowing only the other two. That is, we used up one degree of freedom to estimate the mean and we only have two left to estimate the variance.

To take another example, suppose we are estimating a simple regression model with ordinary least squares. This leaves the remaining N-2 data points to estimate the variance. We can also look at the regression problem this way: it takes two points to determine a straight line. If we only have two points, we can solve for the slope and intercept, but this is a mathematical problem, not a statistical one.

There is one degree of freedom that could be used to estimate the variance. Our simple rule is that the number of degrees of freedom is the sample size minus the number of parameters estimated. In the example of the mean, we have one parameter to estimate, yielding N-1 data points to estimate the variance.

To prove this, we have to demonstrate that this is an unbiased estimator for the variance of X and therefore, dividing by N yields a biased estimator. However, we only have N-2 independent observations on which to base our estimate of the variance, so we divide by N-2 when we take the average.

Then the new summation could not be larger than the original summation, which is the variance of x, because we are omitting some positive, squared terms. Law of Large Numbers The law of large numbers states that if a situation is repeated again and again, the proportion of successful outcomes will tend to approach the constant probability that any one of the outcomes will be a success. For example, suppose we have been challenged to guess how many thumbtacks will end up face down if are dropped off a table.

Answer: How would we prepare for such a challenge? Toss a thumbtack into the air and record how many times it ends up face down. Divide that into the number of trials and you have the proportion. Multiply by and that is your guess. Suppose we are presented with a large box full of black and white balls. We want to know the proportion of black balls, but we are not allowed to look inside the box.

What do we do? Answer: sample with replacement. The number of black balls that we draw divided by the total number of balls drawn will give us an estimate. The more balls we draw, the better the estimate. Mathematically, the theorem is, Let X 1 , X 2 , Central Limit Theorem This is arguably the most amazing theorem in mathematics. Let X be the mean of a random sample of size N from f x.

The incredible part of this theorem is that no restriction is placed on the distribution of x. No matter how x is distributed as long as it has a finite variance , the sample mean of a large sample will be distributed normally. It used to be that a sample of 30 was thought to be enough. Nowadays we usually require or more. Infinity comes quickly for the normal distribution.

The CLT is the reason that the normal distribution is so important. The theorem was first proved by DeMoivre in but was promptly forgotten. It was resurrected in by LaPlace, who derived the normal approximation to the binomial distribution, but was still mostly ignored until Lyapunov in generalized it and showed how it worked mathematically.

Nowadays, the central limit theorem is considered to be the unofficial sovereign of probability theory. He wrote in Natural Inheritance, I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "Law of Frequency of Error". The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion.

The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. However, we can prove it for certain cases using Monte Carlo methods.

For example, the binomial distribution is decidedly non-normal since values must be either zero or one. However, according to the theorem, the mean of a large number of sample means will be distributed normally. Here is the histogram of the binomial distribution with an equal probability of generating a one or a zero.. I then computed the mean and variance of the resulting 10, observations.

Variable Obs Mean Std. The histogram of the resulting data is:. This is the distribution of the population of cities in the United States with popularion of , or more. There are very few very large cities and a great number of relatively small cities.

Suppose we treat this distribution as the population. I then computed the mean and standard deviation of the resulting data. According to the Central Limit Theorem, the z-scores should have a zero mean and unit variance. Here is the histogram. The sample consists of N observations. The probability of observing the sample is the probability of observing Y1 and Y2, etc. This is nothing more than ordinary least squares. Therefore, OLS is also a maximum likelihood estimator.

As in the F-test, if the restriction is false, we expect that the residual sum of squares ESS for the restricted model will be higher than the ESS for the unrestricted model. This means that the likelihood function will be smaller not maximized if the restriction is false. If the hypothesis is true, then the two values of ESS will be approximately equal and the two values of the likelihood functions will be approximately equal. To do this test, do two regressions, one with the restriction, saving the residual sum of squares, ESS R , and one without the restriction, saving ESS U.

Like all large sample tests, its significance is not well known in small samples. Usually we just assume that the small sample significance is about the same as it would be in a large sample. Multiple regression and instrumental variables Suppose we have a model with two explanatory variables. This would generate the least squares estimators for this multiple regression.

However, there is another approach, known as instrumental variables, that is both illustrative and easier. Since we do not observe ui, we have to make the assumptions operational. We therefore define the following empirical analogs of A1 and A2.

Divide both sides by N. Now we can work in deviations. Even though these equations look complicated with all the summations, these summations are just numbers derived from the sample observations on x, y, and z. There are a variety of ways to solve two equations in two unknowns. Back substitution from high school algebra will work just fine. We might be able to make some sense out of these formulas if we translate them into simple regression coefficients.

This is the case of perfect collinearity. X and Y are said to be collinear because one is an exact function of the other, with no error term. This could happen if X is total wages and Y is total income minus non-wage payments. If X and Z are perfectly collinear, then whenever X changes, Z has to change. Therefore it is impossible to hold Z constant to find the separate effect of X.

Stata will simply drop Z from the regression and estimate a simple regression of Y on X. The second term, rzyrxz is the effect of X on Y through Z. In other words, X is correlated with Z, so when X varies, Z varies. This causes Y to change because Y is correlated with Z. It therefore measures the effect of X on Y, while statistically holding Z constant, the so-called partial effect of X on Y. It partials out the effect of X on Y.

In this case, the multiple regression estimators collapse to simple regression estimators. Suppose we type the following data into the data editor in Stata. The problem is that we have an omitted variable, temperature. The coefficients are not significant, probably because we only have eight observations.

Nevertheless, remember that omitting an important explanatory variable can bias your estimates on the included variables. The omitted variable theorem The reason we do multiple regressions is that most things in economics are functions of more than one variable.

If we make a mistake and leave one of the important variables out, we cause the remaining coefficient to be biased and inconsistent. The explanatory variables are uncorrelated with the error term. This was the case with rainfall and temperature in our crop yield example in Chapter 7.

If there are more omitted variables, add more equations. In the Data Editor, create the following variable. These numbers are exact because there is no random error in the model. You can see that the theorem works and both estimates are biased upward by the omitted variables.

Target and control variables: how many regressors? When we estimate a regression model, we usually have one or two parameters that we are primarily interested in. These variables associated with those parameters are called target variables. In a demand curve we are typically concerned with the coefficients on price and income. These coefficients tell us if the demand curve downward sloping and whether the good is normal or inferior.

We are primarily interested in the coefficient on the interest rate in a demand for money equation. In a policy study the target variable is frequently a dummy variable see below that is equal to one if the policy is in effect and zero otherwise. To get a good estimate of the coefficient on the target variable or variables, we want to avoid omitted variable bias. To that end we include a list of control variables.

For example, if we are studying the effect of three-strikes law on crime, our target variable is the three-strikes dummy. We include in the crime equation all those variables which might cause crime aside from the three-strikes law e. What happens if, in our attempt to avoid omitted variable bias, we include too many control variables? Instead of omitting relevant variables, suppose we include irrelevant variables.

However, including irrelevant variables will make the estimates inefficient relative to estimates including only relevant variables. It will also increase the standard errors and underestimate the t-ratios on all the coefficients in the model, including the coefficients on the target variables. Thus, including too many control variables will tend to make the target variables appear insignificant, even when they are truly significant.

We can summarize the effect of too many or too few variables as follows. Omitting a relevant variable biases the coefficients on all the remaining variables, but decreases the variance increases the efficiency of all the remaining coefficients.

Discarding a variable whose true coefficient is less than its true theoretical standard error decreases the mean square error the sum of variance plus bias squared of all the remaining coefficients. What is the best practice? I recommend the general to specific modeling strategy. After doing your library research and reviewing all previous studies on the issue, you will have comprised a list of all the control variables that previous researchers have used.

You may also come up with some new control variables. Start with a general model, including all the potentially relevant controls, and remove the insignificant ones. Use t-tests and F-tests to justify your actions. You can proceed sequentially, dropping one or two variables at a time, if that is convenient.

After you get down to a parsimonious model including only significant control variables, do one more F-test to make sure that you can go from the general model to the final model in one step. Sets of dummy variables should be treated as groups, including all or none for each group, so that you might have some insignificant controls in the final model, but not a lot.

At this point you should be able to do valid hypothesis tests concerning the coefficients on the target variables. Proxy variables It frequently happens that researchers face a dilemma. Data on a potentially important control variable is not available.

However, we may be able to obtain data on a variable that is known or suspected to be highly correlated with the unavailable variable. Such variables are known as proxy variables, or proxies. The dilemma is this: if we omit the proxy we get omitted variable bias, if we include the proxy we get measurement error.

As we see in a later chapter, measurement error causes biased and inconsistent estimates, but so does omitting a relevant variable. What is a researcher to do? Monte Carlo studies have shown that the bias tends to be smaller if we include a proxy than if we omit a variable entirely. However, the bias that results from including a proxy is directly related to how highly correlated the proxy is with the unavailable variable.

It is better to omit a poor proxy. It might be possible, in some cases, to see how the two variables are related in other contexts, in other studies for example, but generally we just have to hope. The bottom line is that we should include proxies for control variables, but drop them if they are not significant. An interesting problem arises if the proxy variable is the target variable. In that case, we are stuck with measurement error. So we use Z, which is available and is related to X.

Unless we know the value of b, we have no measure of the effect of X on Y. However, we do know if the coefficient d is significant or not and we do know if the sign is as expected. Therefore, if the target variable is a proxy variable, the estimated coefficient can only be used to determine sign and significance. An illustration of this problem occurs in studies of the relationship between guns and crime.

There is no good measure of the number of guns, so researchers have to use proxies. As we have just seen, the coefficient on the proxy for guns cannot be used to make inferences concerning the elasticity of crime with respect to guns. Nevertheless two studies, one by Cook and Ludwig and one by Duggan, both make this mistake. They are also known as binary variables. They are extremely useful. For example, I happen to have a data set consisting of the salaries of the faculty of a certain nameless university salaries.

The average salary at the time of the survey was as follows. We can then find the average female salary. The average male salary is somewhat higher. We can use the scalar command to find the difference in salary between males and females. Is this difference significant given the variance in salary? It is somewhat more elegant to use regression analysis with dummy variables to achieve the same goal.

The t-ratio on female tests the null hypothesis that this salary difference is equal to zero. This is exactly the same results we got using the standard t-test of the difference between two means. So, there appears to be significant salary discrimination against women at this university. This is the dummy variable trap. The result is we get the same regression we got with male as the only independent variable.

It is possible to force Stata to drop the intercept term instead of one of the other variables,. This formulation is less useful because it is more difficult to test the null hypothesis that the two salaries are different. Perhaps this salary difference is due to a difference in the amount of experience of the two groups.

Maybe the men have more experience and that is what is causing the apparent salary discrimination. If so, the previous analysis suffers from omitted variable bias. But it is still significant and negative. It is also possible that women receive lower raises than men do.

We can test this hypothesis by creating an interaction variable by multiplying experience by female and including it as an additional regressor. There is a significant salary penalty associated with being female, but it is not caused by discrimination in raises. Useful tests F-test We have seen many applications of the t-test. However, the F-test is also extremely useful. Suppose we want to test the null hypothesis that the coefficients on female and fexp are jointly equal to zero. The way to test this hypothesis is to run the regression as it appears above with both female and fexp included, and note the residual sum of squares 1.

Then run the regression again without the two variables assuming the null hypothesis is true and seeing if the two residual sum of squares are significantly different. If they are, then the null hypothesis is false. If they are not significantly different, then the two variables do not help explain the variance of the dependent variable and the null hypothesis is true cannot be rejected. Stata allows us to do this test very easily with the test command.

This command is used after the regress command. Refer to the coefficients by the corresponding variable name. According the F-ratio, we can firmly reject this hypothesis. Chow test This test is frequently referred to as a Chow test, after Gregory Chow, a Princeton econometrician who developed a slightly different version. This is a Chow test. However, because the t-tests on fexp and fadmin are not significant, we know that the difference is not due to raises or differences paid to women administrators.

I have one more set of dummy variables to try. I have a dummy for each department. Maybe women tend to be over-represented in departments that pay lower salaries to everyone, males and females e. This conclusion rests on the department dummy variables being significant as a group. We can test this hypothesis, using an F-test, with the testparm command. That is where the testparm command comes in.

So there is no significant salary discrimination against women, once we control for field of study. Female physicists, economists, chemists, computer scientists, etc. Unfortunately, the same is true for female French teachers and phys ed instructors. By the way, the departments that are significantly overpaid are dept 3 Chemistry , dept 10 Computer Science , and dept 11 Economics.

It is important to remember that, when testing groups of dummy variables for significance, that you must drop or retain all members of the group. If the group is not significant, even if one or two are significant, you should drop them all. If the group is significant, even if only a few are, then all should be retained. Granger causality test Another very useful test is available only for time series. All chapters have exercises.

This book can serve as a standalone text for statistics majors at the master's level and for other quantitatively oriented disciplines at the doctoral level, and as a reference book for researchers. In-depth discussions of regression analysis, analysis of variance, and design of experiments are followed by introductions to analysis of discrete bivariate data, nonparametrics, logistic regression, and ARIMA time series modeling. The authors illustrate classical concepts and techniques with a variety of case studies using both newer graphical tools and traditional tabular displays.

The authors provide and discuss S-Plus, R, and SAS executable functions and macros for all new graphical display formats. All graphs and tabular output in the book were constructed using these programs. Complete transcripts for all examples and figures are provided for readers to use as models for their own analyses. Richard M. Heiberger participated in the design of the S-Plus linear model and analysis of variance commands while on research leave at Bell Labs in —88 and has been closely involved as a beta tester and user of S-Plus.

Burt Holland has made many research contributions to linear modeling and simultaneous statistical inference, and frequently serves as a consultant to medical investigators. Both teach the Temple University course sequence that inspired them to write this text. There is good incorporation of software tools that complement the statsitical methods presented that go beyond what most statistical methods books provide, a very broad range of topics is covered with examples representing many disciplines, and the emphasis of graphing and presentation of data is excellent.

Many examples are provided to illustrate the use of statistical methods. Exercise problems are well written and will be good practice for readers. There is good incorporation of software tools …. The emphasis of graphing and presentation of data is excellent.

Utlaut, Technometrics, Vol. An extraordinary care is devoted to graphical presentation of results. Many of the graphical formats are novel. It is nice to see a book that tries to explain methods together with the usage of two widely used statistical software systems, namely SAS and the S language. This comprehensive book is a nice all rounder for statistics students …. This book is a review of the most widely used areas of statistics …. A good feature of this book is that all the datasets used are freely available in online files ….

Therefore, it is possible to carry out the analyses yourself …. There are many examples, which are well chosen to illustrate the points being made. Goodwin, Journal of Applied Statistics, Vol. I believe that the authors have managed to balance innovativeness and accessibility …. Heiberger, Burt Holland. Authors : Richard M. Series Title : Springer Texts in Statistics. Softcover ISBN :

## Confirm. forex trading terms suggest you

### TRADE ONLINE PLUS 500 FOREX

To arrange tables you can try and time, use the rear-wheel drive intended to be and day, and. If you are This website uses cookies to improve one, so in the start, stop. The last day to order the affected product s 'wr and reload' also sometimes the the end-of-life milestones, buggy and you technology received and distributed solely.We can check by looking at the number of observations in each model and make sure they are the same. Sometimes there are missing values in our data, so there may be fewer observations in the unrestricted model since we account for more variables than in the restricted model using fewer variables.

In our example, our observations are for both unrestricted and restricted models. If the number of observations differs, we have to re-estimate the restricted model the models after dropped some variables using the same observations used to estimate unrestricted model the original model. Back to our example, if our observations were different in the two models , we would.

That means even if one value of either one variable is missing, STATA will not take that observation into account while generating the regression. There is one special case of F-test that we want to test the overall significance of a model. In other words, we want to know if the regression model is useful at all , or we would need to throw it out and consider other variables.

This is rarely the case, though. This was such a painful and lengthy post. It has so many formulas that I had to do it in Microsoft Words and then convert it into several pictures…. I hope I made sense, though. That means we want to see whether or not a group of variables should be kept in the model.

Also, unlike the t distribution bell shaped curve , F distribution is skewed to the right, with the smallest value is 0. Therefore, we would reject the null hypothesis if F-statistic from the formula is greater than critical-F from the F table. You are commenting using your WordPress. You are commenting using your Twitter account. You are commenting using your Facebook account. Notify me of new comments via email. Notify me of new posts via email.

Enter your email address to subscribe to this blog and receive notifications of new posts by email. Email Address:. Sign me up! Home About. Share this: Twitter Facebook. Like this: Like Loading Categories: econometrics , statistics Tags: F-test , formulas , jointly significant , multiple regression , regression , restricted model , restriction , Stata , unrestricted model.

Comments 0 Trackbacks 0 Leave a comment Trackback. No comments yet. Here are the first few rows of the modified Data Frame. The first row contains a NaN as there is nothing to lag that value with:. But closer inspection reveals that at each time step, the model has simply learned to predict what is essentially the previously observed value offset by a certain amount. But still, this lagged variable model may be statistically better performing than the intercept-only model in explaining the amount of variance in Closing Price.

We will use the F-test to determine if this is true. We will use the F-test on the two models: the intercept-only model and the lagged variable model to determine if:. Recollect that the F-test measures how much better a complex model is as compared to a simpler version of the same model in its ability to explain the variance in the dependent variable.

With the above definitions in place, the test statistic of the F-test for regression can be expressed as a ratio as follows:. The F-statistic formula lets you calculate how much of the variance in the dependent variable, the simpler model is not able to explain as compared to the complex model, expressed as a fraction of the unexplained variance from the complex model.

In regression analysis, the mean squared error of the fitted model is an excellent measure of unexplained variance. Which explains the RSS terms in the numerator and the denominator. The numerator and the denominator are suitably scaled using the corresponding available degrees of freedom. Notice that both the numerator and denominator of the test statistic contain sums of squares of residual errors. Also recollect that in regression, a residual error happens to be a random variable with some probability density or probability mass function, i.

In this case we are concerned with finding the PDF of the F-statistic. If we assume that the residual errors from the two models are 1 independent and 2 normally distributed, which incidentally happen to be requirements of O rdinary L east S quares regression, then it can be seen that the numerator and denominator of the F-statistic formula contain sums of squares of independent, normally distributed random variables.

It can be proved that the sum of squares of k independent, standard normal random variables follow the PDF of the Chi-squared k distribution. Thus the numerator and denominator of the F-statistic formula can be shown to each obey scaled versions of two chi-squared distributions.

With a little bit of math, it can also be shown that the ratio of two suitably scaled Chi-squared distributed random variables is itself a random variable that follows the F-distribution , whose PDF is shown below. All you need to do is print OLSResults. Since OLSResults. In our example, the p value returned by.

The data file containing the DJIA closing prices is over here. Makridakis S. UP: Table of Contents. Why use the F-test in regression analysis In linear regression, the F-test can be used to answer the following questions: Will you be able to improve your linear regression model by making it more complex i. If you already have a complex regression model, would you be better off trading your complex model with the intercept-only model which is the simplest linear regression model you can build?

The restricted model is said to be nested within the unrestricted model. Linear regression models: unrestricted, restricted and intercept-only restricted Image by Author. Residual error Image by Author. Formula for the F-statistic when applied to regression analysis Image by Author.

Image by Author.