Country Name | Country Code | Series Name | Series Code | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] |
---|---|---|---|---|---|---|---|
Liberia | LBR | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Madagascar | MDG | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Malawi | MWI | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | 29.27341 | 28.03011 |
Mali | MLI | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Mozambique | MOZ | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Nepal | NPL | Central government debt, total | GC.DOD.TOTL.GD.ZS | 42.97866 | 43.79999 | .. | 33.86252 |
Niger | NER | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Uganda | UGA | Central government debt, total | GC.DOD.TOTL.GD.ZS | 26.4924 | 33.1932 | 28.46653 | 27.54292 |
Zimbabwe | ZWE | Central government debt, total | GC.DOD.TOTL.GD.ZS | .. | .. | .. | .. |
Data from database: World Development Indicators | |||||||
Last Updated: 02/01/2017 |
The World Bank Databank, which includes the World Development Indicators among other datasets,may be the perfect source for cross-national panel data on economic, social, and health topics. However, if you download from their website using the default settings, you may find that it is not optimally set up for a panel data analysis.
There are three big problems you'll see:
This is an add-on module that brings World Bank data directly into Stata for you.
ssc install wbopendata
db wbopendata
Downloading may take a while, but give it a second, then open up the Data Editor to see your data beautifully laid out in long format.
countryname | countrycode | iso2code | region | regioncode | year | bm_gsr_cmcp_zs | bm_gsr_fcty_cd | bm_gsr_gnfs_cd |
---|---|---|---|---|---|---|---|---|
Afghanistan | AFG | AF | South Asia | SAS | 1979 | 34.9931 | 2.40E+07 | 7.70E+08 |
Afghanistan | AFG | AF | South Asia | SAS | 1980 | 17.5939 | 6.90E+06 | 9.20E+08 |
Afghanistan | AFG | AF | South Asia | SAS | 1981 | 16.3703 | 2.00E+07 | 1.10E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1982 | 10.3989 | 2.00E+07 | 9.70E+08 |
Afghanistan | AFG | AF | South Asia | SAS | 1983 | 9.82736 | 2.10E+07 | 1.00E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1984 | 6.98267 | 1.90E+07 | 1.40E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1985 | 9.60415 | 8.60E+06 | 1.10E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1986 | 6.12245 | 3.50E+07 | 1.30E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1987 | 13.4997 | 1.10E+07 | 1.10E+09 |
Afghanistan | AFG | AF | South Asia | SAS | 1988 | 8.83333 | 1.20E+07 | 8.50E+08 |
Afghanistan | AFG | AF | South Asia | SAS | 1989 | 10.9284 | 7.90E+06 | 7.30E+08 |
I definitely recommend using wbopendata, but if you insist, here are instructions for starting with the Excel default downloads from World Databank.
In Stata, go to File > Import > Text data (delimited, csv, etc.). This will bring up the import menu.
First, use the Browse button to find your CSV file. Next, make sure the drop down under "Use first row for variable names" is set to Always. Finally, in the data preview at the bottom of the menu, scroll over to look at your variables for each year. Because of those pesky double-period missing values, they will import as strings, which is why they are in red. You want them in numeric format. To fix this, click on the column so that they are highlighted in blue (hold down the Ctrl key to select more than one at once), then right-click and select "Force selected columns to use numeric types." Now they should turn black, indicating a numeric variable.
Now click OK. The data should now import.
The command now in the Review window should like something like this:
import delimited "C:\Users\yourusername\Downloads\Data_Extract_From_World_Development_Indicators.csv", varnames(1) numericcols(5 6) clear
destring yr2009, replace force
drop if countrycode==""
reshape long yr, i( countryname countrycode seriesname seriescode ) j(year)
yr comes from yr2000, yr2001, etc.: it is a bit of text that every numeric variable starts with. This is the "stub." The year information should be the second half of the variable name. In i(), you'll put all the identifying variables that uniquely identify every country and year. In j(), you'll put the name of the new variable you are creating. It can be anything, but in this case year makes the most sense.
Now our data is long in the extreme. Take a look:
countryname | countrycode | seriesname | seriescode | year | yr |
---|---|---|---|---|---|
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2007 | 48.888 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2008 | 48.473 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2009 | 48.018 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2010 | 47.529 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2011 | 47.018 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2012 | 46.499 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2013 | 45.985 |
Angola | AGO | Birth rate, crude (per 1,000 people) | SP.DYN.CBRT.IN | 2014 | 45.483 |
First let's use encode to turn the seriesname variable into a numeric variable with value labels.
encode seriesname, gen(series) codebook seriescodebook will tell you what these new numeric variables are, like so:
------------------------------------------------------------------------------------------------ series Series Name ------------------------------------------------------------------------------------------------ type: numeric (long) label: series range: [1,2] units: 1 unique values: 2 missing .: 0/960 tabulation: Freq. Numeric Label 480 1 Birth rate, crude (per 1,000 people) 480 2 Debt service on external debt, long-term (TDS, current US$)A little more housekeeping: let's drop the seriesname and seriescode, because we don't need them. And, let's rename yr to var, because it doesn't contain the year, actually.
drop seriesname seriescode rename yr var
reshape wide var, i( countryname countrycode year) j( series)Now, the data look great.
countryname | countrycode | year | var1 | var2 |
---|---|---|---|---|
Angola | AGO | 2007 | 48.888 | 4.40E+09 |
Angola | AGO | 2008 | 48.473 | 1.60E+09 |
Angola | AGO | 2009 | 48.018 | 3.50E+09 |
Angola | AGO | 2010 | 47.529 | 2.30E+09 |
Angola | AGO | 2011 | 47.018 | 2.80E+09 |
Angola | AGO | 2012 | 46.499 | 4.20E+09 |
Angola | AGO | 2013 | 45.985 | 4.60E+09 |
Angola | AGO | 2014 | 45.483 | 5.90E+09 |
Many do not realize that the World Bank DataBank has advanced download options that let you select the long format from the beginning. If you are downloading more than one series, however, you will still need to reshape the data a bit to get separate columns for each series.