Search Tidbits
Tidbit Entries by Topic

## Stata Tidbits

These tidbits contain bits and pieces of information I hope you find helpful to use Stata more effectively. You can receive notifications of new tidbits as they are added (via email) by clicking on the subscribe box at the left. (Every email has an unsubscribe link, making it a snap to unsubscribe.)
Tuesday
Feb092010

## Robust regression vs. robust standard errors, Part 1

The name robust regression sounds similar to regression with robust standard errors, but these are actually very different techniques used for different kinds of situations. This tidbit briefly describes robust regression and when that technique could be useful. Next week's tidbit will address regression with robust standard errors and contrast these two techniques.

First, let's consider robust regression. This is a technique that is useful when there are outlying observations that could influence the regression coefficients. Outlying observations are downweighted (their influence is diminished) and extremely outlying observations can be weighted by a factor of 0 (removing their influence entirely).

The Stata auto dataset contains a good example that we can use, looking at the relationship between a cars weight and its miles per gallon. Let's first use this file.

`. sysuse auto(1978 Automobile Data)`

Now let's have a quick look at a scatterplot of miles per gallon (mpg) by weight of the car (weight). But first, let's make a variable called wt1k that is weight divided by 1000.

`. generate wt1k = weight / 1000`

Now let's look at the scatterplot of mpg by wt1k. The low weight, high mpg cars could be influential (e.g., the VW Diesel, the Datsun 210, Suburu, and Plym. Champ).

`. scatter mpg wt1k, mlabel(make) mlabsize(large)`

Let's run an OLS regression predicting mpg from wt1k.

`. regress mpg wt1k      Source |       SS       df       MS              Number of obs =      74-------------+------------------------------           F(  1,    72) =  134.62       Model |  1591.99024     1  1591.99024           Prob > F      =  0.0000    Residual |  851.469221    72  11.8259614           R-squared     =  0.6515-------------+------------------------------           Adj R-squared =  0.6467       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4389------------------------------------------------------------------------------         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------        wt1k |  -6.008687   .5178782   -11.60   0.000    -7.041058   -4.976316       _cons |   39.44028   1.614003    24.44   0.000     36.22283    42.65774------------------------------------------------------------------------------`

This regression shows that for every increase in 1000 pounds, the mpg is expected to decrease by 6 miles per gallon. Looking at the leverage by residual squared (below), the VW Diesel has the highest squared residual and has an above average leverage. This could be an influential observation

`. lvr2plot ,  mlabel(make) mlabsize(large)`

Let's try running this as a robust regression, and we can compare the results to the OLS results. But, first let's save the predicted values from the OLS regression, calling them yhatols.

`. predict yhatols(option xb assumed; fitted values)`

Now, let's run this as a robust regression using the rreg command.

`. rreg mpg wt1k   Huber iteration 1:  maximum difference in weights = .79065461   Huber iteration 2:  maximum difference in weights = .16435059   Huber iteration 3:  maximum difference in weights = .07997524   Huber iteration 4:  maximum difference in weights = .0208614Biweight iteration 5:  maximum difference in weights = .27513221Biweight iteration 6:  maximum difference in weights = .12290071Biweight iteration 7:  maximum difference in weights = .0699518Biweight iteration 8:  maximum difference in weights = .01619963Biweight iteration 9:  maximum difference in weights = .00890816Robust regression                                      Number of obs =      74                                                       F(  1,    72) =  249.65                                                       Prob > F      =  0.0000------------------------------------------------------------------------------         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------        wt1k |  -5.341891   .3380843   -15.80   0.000     -6.01585   -4.667933       _cons |   36.66249   1.053663    34.80   0.000     34.56205    38.76293------------------------------------------------------------------------------`

The coefficient for weight was -6.008 in the OLS regression, and is -5.341 for the robust regression. In the robust regression, the slope is slightly more flat than in the OLS regression. Let's create a variable yhatrreg that contains the predicted values for the robust regression.

`. predict yhatrreg(option xb assumed; fitted values)`

Now let's visually compare the results of the OLS and robust regressions, as shown below.

`. graph twoway (scatter mpg wt1k) (line yhatols yhatrreg wt1k, sort) , ///>   legend(label (1 "Observed MPG") label(2 "OLS Regression") label(3 "Robust Regression"))`

The graph above shows the OLS regression line in red and the robust regression in green. The robust regression line is not as steep, because it was influenced less by outlying observations like the the VW Diesel. In this case, the robust regression may appropriately discount the influence outlying observations.

Next week we will look at regression with robust standard errors.

You can download the example data files from this tidbit (as well as all of the other tidbits) as shown below. These will download all of the example data files into the current folder on your computer. (If you have done this before, then you may need to specify net get stowdata, replace to overwrite the existing files.

`net from http://www.MichaelNormanMitchell.com/storage/stowdatanet get stowdata`

If you have thoughts on this Stata Tidbit of the Week, you can post a comment. You can also send me an email at MichaelNormanMitchell and then the at sign and gmail dot com. If you are receiving this tidbit via email, you can find the web version at http://www.michaelnormanmitchell.com/ .

View Printer Friendly Version

Email Article to Friend

When or why would one want to use this robust regression technique as opposed to quantile regression, which also places less emphasis on outliers? Are there any rules of thumb when to use one versus the other?

February 9, 2010 | Brian

This is a great question Brian. As you note, both "robust regression" and "quantile regression" are techniques that reduce the influence of outliers. I am not aware of any "rules of thumb" on when to choose one over the other. To me, there are two factors I would weigh. First, I would ask whether we want to be estimate the "mean" of the outcome, conditional on the predictors, or the "median" (or other percentile) conditional on the predictors. If we want to estimate the "mean", then go with robust regression, or the median then go with quantile regression. The other factor that comes to my mind is the quantity and influence of the outliers. If there are a very large number of highly influential outliers (say 5% or 10% of the cases) and that the results of OLS vs. robust regression would be very substantially different, then it seems to me it is pushing the limits of "robust regression", since such a high percentage of observations are being downweighted. In such a case, then quantile regression seems preferable.

Thanks for the question. I hope others might weigh in with other thoughts.

February 9, 2010 | Michael Mitchell

Fistly thanks for putting together a great website, may there me more tidbits to come.

Now I don't want to 'nitpick' a 'tidbit' .. but in the graph where you plot the predicted curves comparing robust to OLS regression, you label the scatter points as 'predicted MPG', though aren't they really 'observed MPG'?

March 18, 2010 | charles

Dear Charles

Thanks both for the kind words and the correction. You are right on the mark and I am grateful for letting me know so I can fix it. The graph and the code have been fixed. Feel free to 'nitpick' any time!

Best regards,

Michael

March 18, 2010 | Michael
Editor Permission Required
You must have editing permission for this entry in order to post comments.