Research Guides: R: R for Economics

1 Introduction

1.1 Objectives

We’ll be able to:

Pull economic data from the web
Make a plot with that data
Do all that in a reusable script

Along the way, we’ll learn:

How to access R documentation
How R differs from Stata, Excel, etc.
How economists use R
Where you can learn more about R

1.2 What is R?

R is a statistical package and mathematical programming language.

Unlike Stata, SAS, SPSS, Matlab and other statistical packages, it is totally open source. Students can easily install on their own computers and use after they graduate.

Unlike Excel, in R you can easily write scripts that make your analysis reproducible.

1.3 What is RStudio?

RStudio is an integrated development environment (IDE) for R. You still need to install R to use RStudio, but it is a more helpful, graphical environment for working with R. Windows for scripts, files, packages, and plots, in addition to the console, make it easier to keep track of what you are doing. It has many add-ins to make R more powerful.

RStudio is especially good for make reports and presentations with the knitr package. It can make PDFs in Latex, even.

2 A little about how R works

2.1 It can be a really fancy calculator

You can go ahead and type some arithmetic into the console and it will print the answer to the screen.

1+2

## [1] 3

2.2 But working with objects makes it more flexible

However, what makes it so flexible is that everything in R can be an object. The objects can hold numeric values, text strings, datasets, models, anything, really. For example, let’s create an object to hold the value of 1 + 2. <- is the assignment operator that lets us assign a value to a variable.

oneplustwo <- 1 + 2

Nothing will print to the screen. But, in RStudio, what do you seen in the Environment pane, under Global Environment?

The object’s name chosen here, oneplustwo, is arbitrary and, for a number of reasons that may soon become apparent, not very smart. You can name objects whatever you want, but try to find names that are meaningful, and do not start with numbers or special characters, and do not contain spaces.

Every object has a class, a type, and a structure, which will affect what you are able to do with the object.

class(oneplustwo)

## [1] "numeric"

str(oneplustwo)

##  num 3

typeof(oneplustwo)

## [1] "double"

The class of the object can be changed (or rather, coerced) on the fly using as.character(), as.numeric(), etc., which can be very useful.

str(as.character(oneplustwo))

##  chr "3"

How would you increase the value of oneplustwo by 1?

We can store multiple values in a vector using the function c().

oneANDtwo <- c(1,2)
class(oneANDtwo)

## [1] "numeric"

str(oneANDtwo)

##  num [1:2] 1 2

typeof(oneANDtwo)

## [1] "double"

These vectors can be the building blocks of datasets, which in R parlance we call data frames. A simple way to put together a data frame is the data.frame() function.

data.frame(oneANDtwo, oneplustwo)

##   oneANDtwo oneplustwo
## 1         1          3
## 2         2          3

What did R do with the value in oneplustwo when it made this data frame?

Will you find this data frame in the Global Environment?

Datasets can be objects, regression models can be objects, anything can be an object.

What do you think will happen with the following?

oneANDtwo - 1

Let’s dispose of these objects.

rm(oneplustwo)
rm(oneANDtwo)

2.3 Scripts (.R files)

Everything in R can be automated, which makes it really powerful. In RStudio, make a new .R script by going to File -> New File -> R Script, and paste in the code we typed in above. If you save this script, it can be run and re-run whenever you need it.

You have been provided with a script with all the commands we will use in this session, downloadable on the left of this page.

RStudio has some handy tools that make it easier to write a script, especially under the “Code” menu. Another helpful menu is Session -> Set Working Directory.

2.4 Packages

One of the strengths of R is the huge number of user-contributed packages that extend its functionality. It’s also one of its weaknesses, in that there are so many packages to keep track of, and many ways of doing many tasks.

Packages only need to be installed once. Below are a few of the packages we’ll use today. In RStudio, the Packages window is quite handy.

install.packages("pdfetch")
install.packages("xts")
install.packages("stargazer")
install.packages("zoo")
install.packages("ggplot2")

But the packages need to be loaded in every session. And by session I mean, every time you open and close RStudio. Typically the library() function is used at the top of a script.

library(pdfetch)
library(stargazer)
library(xts)
library(zoo)
library(ggplot2)

Installing new packages can take time, so let’s have a little economics interlude.

2.5 How does R relate to economics?

R can do all of the things that a statistical package like Stata can do, plus more sophisticated modeling and machine learning techniques. Stata, however, will most likely continue to be use for typical regression analysis, because it is built for that and is so easy to use for those cases. R is most likely to outcompete a mathematical language like Matlab.

Where R really shines for economists is machine learning. Machine learning describes ahem Big Data ahem techniques like decision trees, LASSO, etc. used for prediction.

For an example, see the scripts that accompany this article from the Quarterly Journal of Economics:

Kleinberg, Jon;Lakkaraju, Himabindu;Leskovec, Jure;Ludwig, Jens;Mullainathan, Sendhil, 2017, “Replication Data for: ‘Human Decisions and Machine Predictions’”, https://doi.org/10.7910/DVN/VWDGHT, Harvard Dataverse, V1

3 Getting data into R

R has some built-in datasets that could be used for demonstration purposes (list)

data("mtcars")
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

This is a data frame–the R object most similar to a spreadsheet or other kind of dataset, and the kind of object you’d usually use for data analysis. It’s easy to import simple tables, such as CSV files, using read.csv() or read.table(). However, in RStudio it is also simple to use the Import Dataset window, which you can find under Environment -> Import Dataset, or File -> Import Dataset.

realGDPgrowth <- read.csv("K:/My Drive/classes/ECON 456/R/realGDPgrowth.csv")

The foreign and haven packages can help you get datasets of many, many formats into R. I’ve found it useful just for dealing with opening datasets in the format of software for which we do not have a license. A lifesaver!

3.1 Data packages

What is really fun is using special packages for pulling in data automatically. These packages take the guesswork out of using an API to connect to a data source on the web.

We’ll look at pdfetch. Type ?pdfetch into the console to access help. FRED is just one of the sources from which it can fetch data.

pdfetch_FRED("THREEFY10")

Assuming I intend to do something with this data, what did I do wrong here?

We can get multiple series from FRED at a time by concatenating them with c().

treasury <- pdfetch_FRED(c("THREEFY10", "THREEFYTP10"))
plot(treasury)

It’s not a great plot, but could it have been any easier to make?

Look through the list of data series in FRED and find your own series to import. When you go to the landing page of the series, the identifier is next to the title in parentheses. These series identifiers are case sensitive and should be enclosed in quotation marks within the pdfetch_FRED() function. Give them a name that is meaningful so we can refer to it later.

There are other handy packages that can import data directly into R, including:

Quandl Many economic series. Requires an api key.
quantmod Among other things, downloads data from Yahoo! Finance
Analyze Survey Data for Free Not actually a package, but code for importing a huge number of survey datasets

Additional data packages are listed in the following:

Crantastic These are packages tagged as “onlineData”
ropenSci Packages A lot are for biology, but many are of general interest.
R Views: Recent R Data Packages

You might be thinking, well this made it easy to get data in, but can I ever get it out? Yes, you can!

write.csv(as.data.frame(treasury),"treasury.csv")

3.2 Times series objects

There are a number of ways to store and work with time series data in R. ts object and xts objects are both used to store time series. Unlike the basic data frame objects mentioned above, these are specifically indexed by time. We’ll take a look at xts objects because that is what pdfetch will fetch for you.

3.2.1 Information about xts objects

Let’s find out a little about our treasury object. class(), dim(), and names() are functions you can use with other kinds of objects; start() and end() are specific to xts objects.

class(treasury)

## [1] "xts" "zoo"

dim(treasury)

## [1] 7358    2

names(treasury)

## [1] "THREEFY10"   "THREEFYTP10"

start(treasury)

## [1] "1990-07-18"

end(treasury)

## [1] "2018-09-28"

first(treasury)

##            THREEFY10 THREEFYTP10
## 1990-07-18    8.4931      2.2688

last(treasury)

##            THREEFY10 THREEFYTP10
## 2018-09-28    3.2447     -0.1119

periodicity(treasury)

## Daily periodicity from 1990-07-18 to 2018-09-28

3.2.2 Extracting data from xts objects

The dollar sign and square bracket are important for selecting certain parts of the xts object.

head(treasury$THREEFY10)
treasury["2000"]
treasury["2000-07"]
treasury_subset <- treasury["2008/2011"]

Try to extract to values from THREEFY10 for October of 2008.

You can change the periodicity of your series. Try the code below, then use head() on one to see the first six observations.

THREEFY10.monthly <- to.monthly(treasury$THREEFY10)
THREEFYTP10.monthly <- to.monthly(treasury$THREEFYTP10)

We mentioned that xts objects are just one way of storing time series; ts is another. If necessary it is possible to convert between them.

gdp.FRED <- pdfetch_FRED("A191RO1Q156NBEA")  # Real GDP growth: Percent Change from Quarter One Year Ago, Seasonally Adjusted
gdp = ts(gdp.FRED, start=start(to.quarterly(gdp.FRED)), end=end(to.quarterly(gdp.FRED)), frequency=4)
plot(gdp)

3.3 Pretty plots

The plot() function works passably well, but I’m attached to ggplot2-style graphics. ggplot2 works on data frames, which xts objects are not. However, autoplot.zoo() will convert xts to data frame and feed into ggplot2 for you.

p <- autoplot.zoo(treasury, facets=NULL)
p

These plots can be customized in many ways.

p1 <- p +  labs(title = "Ten Year Treasury Yield and Term Premium, 1990-2017", caption = "Sources: Federal Reserve Bank of New York, Federal Reserve", x = "Year") + scale_colour_grey()  + theme(legend.title = element_blank()) + theme(legend.justification=c(1,1), legend.position=c(.95,.95))
p1 
p1 + theme_bw()

The last line creates the plot with the black and white theme by adding theme_bw(). You can find your options for themes at ggplot2: Complete themes. Try the line p1 + theme_bw() but with a different theme instead of theme_bw().

3.4 Linear regression

For any kind of regression, you first create a model object, then get summary information out of it.

model <- lm(mpg ~ wt + cyl, data = mtcars)
str(model)

## List of 12
##  $ coefficients : Named num [1:3] 39.69 -3.19 -1.51
##   ..- attr(*, "names")= chr [1:3] "(Intercept)" "wt" "cyl"
##  $ residuals    : Named num [1:32] -1.279 -0.465 -3.452 1.019 2.053 ...
##   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##  $ effects      : Named num [1:32] -113.65 -29.12 -9.34 1.33 1.6 ...
##   ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "cyl" "" ...
##  $ rank         : int 3
##  $ fitted.values: Named num [1:32] 22.3 21.5 26.3 20.4 16.6 ...
##   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##  $ assign       : int [1:3] 0 1 2
##  $ qr           :List of 5
##   ..$ qr   : num [1:32, 1:3] -5.657 0.177 0.177 0.177 0.177 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   .. .. ..$ : chr [1:3] "(Intercept)" "wt" "cyl"
##   .. ..- attr(*, "assign")= int [1:3] 0 1 2
##   ..$ qraux: num [1:3] 1.18 1.05 1.17
##   ..$ pivot: int [1:3] 1 2 3
##   ..$ tol  : num 1e-07
##   ..$ rank : int 3
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 29
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = mpg ~ wt + cyl, data = mtcars)
##  $ terms        :Classes 'terms', 'formula'  language mpg ~ wt + cyl
##   .. ..- attr(*, "variables")= language list(mpg, wt, cyl)
##   .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:3] "mpg" "wt" "cyl"
##   .. .. .. ..$ : chr [1:2] "wt" "cyl"
##   .. ..- attr(*, "term.labels")= chr [1:2] "wt" "cyl"
##   .. ..- attr(*, "order")= int [1:2] 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(mpg, wt, cyl)
##   .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:3] "mpg" "wt" "cyl"
##  $ model        :'data.frame':   32 obs. of  3 variables:
##   ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##   ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##   ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt + cyl
##   .. .. ..- attr(*, "variables")= language list(mpg, wt, cyl)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "mpg" "wt" "cyl"
##   .. .. .. .. ..$ : chr [1:2] "wt" "cyl"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "wt" "cyl"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(mpg, wt, cyl)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "mpg" "wt" "cyl"
##  - attr(*, "class")= chr "lm"

class(model)

## [1] "lm"

summary(model)

## 
## Call:
## lm(formula = mpg ~ wt + cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2893 -1.5512 -0.4684  1.5743  6.1004 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.6863     1.7150  23.141  < 2e-16 ***
## wt           -3.1910     0.7569  -4.216 0.000222 ***
## cyl          -1.5078     0.4147  -3.636 0.001064 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.568 on 29 degrees of freedom
## Multiple R-squared:  0.8302, Adjusted R-squared:  0.8185 
## F-statistic: 70.91 on 2 and 29 DF,  p-value: 6.809e-12

There are a variety of packages that can turn your model into a nice regression table.

stargazer(model, type="html")


	Dependent variable:

	mpg

wt	-3.191^***
	(0.757)

cyl	-1.508^***
	(0.415)

Constant	39.686^***
	(1.715)


Observations	32
R²	0.830
Adjusted R²	0.819
Residual Std. Error	2.568 (df = 29)
F Statistic	70.908^*** (df = 2; 29)

Note:	p<0.1; p<0.05; p<0.01

4 Making reports with Markdown

This is pretty easy if you are using RStudio (and I assume you are).

It starts with an .Rmd file. In RStudio, go to File -> New File -> R Markdown…

Little chunks of R code are inserted after ```{r}. Go ahead and paste in some code we have been working on into a chunk.

Markdown is similar to HTML in that it structures documents, but it is much easier. Take a look at the RMarkdown Cheatsheet (PDF) for pointers.

When you have something in your .Rmd file, you are ready to knit! This will create an HTML document when you click on the ball of yarn.

You can also make PDFs and Word documents, but it’s a little touchier.

5 Choose your own R adventure

Basic R fluency
- QuickR A good reference.
- Swirl An interactive course that works inside the R console.
- Coursera The Johns Hopkins R courses have been useful.
In-depth with R programming
- Software Carpentry: Programming with R
- Hadley Wickham’s Advanced R
Data manipulation, data cleaning
Reports, Markdown, Latex, and all that
- knitr in a knutshell
- RMarkdown Cheatsheet (PDF)
Time series, econometrics, etc.
- Manipulating Time Series Data in R with xts and zoo
- xts Cheat Sheet: Time Series in R
Geospatial analysis
Data visualization
- ggplot2 Probably the most popular for making charts.
Making apps
- Shiny
- dygraphs

R: R for Economics

Materials