Working with Economic data in Python¶

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

  1. Import many types of data, including
    • CSV files
    • Tab or other types of delimited files
    • Excel (xls, xlsx) files
    • Stata files
  1. Open files directly from a website
  2. Merge, select, join data
  3. Perform statistical analyses
  4. Create plots of your data

and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.

In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
Using matplotlib backend: <object object at 0x147549690>
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib

Working with Pandas¶

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.

In [2]:
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')
0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.

In [3]:
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())
0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.

In [4]:
df = pd.DataFrame([countries, data])
df
Out[4]:
0 1 2 3 4
country Colombia Turkey USA Germany Chile
noise 0.469112 -0.282863 -1.509059 -1.135632 1.212112

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.

In [5]:
df = df.T
df
Out[5]:
country noise
0 Colombia 0.469112
1 Turkey -0.282863
2 USA -1.509059
3 Germany -1.135632
4 Chile 1.212112

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.

In [6]:
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df
Out[6]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
1 Turkey -0.282863 0.080012 -0.202852 6
2 USA -1.509059 2.277258 0.768199 3
3 Germany -1.135632 1.289661 0.154029 7
4 Chile 1.212112 1.469216 2.681328 5

This shows some of the ways in which you can create new data. Especially useful is the apply method, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perform computations using various columns.

Let's see some other ways in which we can interact with dataframes. First, let's select some observations, e.g., all countries in the South America.

In [7]:
# Let's create a list of South American countries
south_america = ['Colombia', 'Chile']
# Select the rows for South American countries
df.loc[df.country.apply(lambda x: x in south_america)]
Out[7]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
4 Chile 1.212112 1.469216 2.681328 5

Now let's use this to create a dummy indicating whether a country belongs to South America. To understand what is going on let's show the result of the condition for selecting rows.

In [8]:
df.country.apply(lambda x: x in south_america)
Out[8]:
0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

So in the previous selection of rows we told pandas which rows we wanted or not to be included by passing a series of booleans (True, False). We can use this result to create the dummy, we only need to convert the output to int.

In [9]:
df['South America'] = df.country.apply(lambda x: x in south_america).astype(int)

Now, let's plot the various series in the dataframe

In [10]:
df.plot()
Out[10]:
<Axes: >
No description has been provided for this image

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.

In [11]:
df = df.set_index('country')
print(df)
df.plot()
             noise  noise_sq noise and its square  name length  South America
country                                                                      
Colombia  0.469112  0.220066             0.689179            8              1
Turkey   -0.282863  0.080012            -0.202852            6              0
USA      -1.509059  2.277258             0.768199            3              0
Germany  -1.135632  1.289661             0.154029            7              0
Chile     1.212112  1.469216             2.681328            5              1
Out[11]:
<Axes: xlabel='country'>
No description has been provided for this image

Better, but still not very informative. Below we will improve on this when we work with some real data.

Notice that by using the set_index function we have assigned the index to the country names. This may be useful to select data. E.g., if we want to see only the row for Colombia we can

In [12]:
df.loc['Colombia']
Out[12]:
noise                   0.469112
noise_sq                0.220066
noise and its square    0.689179
name length                    8
South America                  1
Name: Colombia, dtype: object

Getting data¶

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data¶

  • World Bank provides all kinds of socio-economic data.
  • Penn World Tables is a database with information on relative levels of income, output, input and productivity, covering 182 countries between 1950 and 2017.
  • Maddison Historical Data provides the most used historical statistics on population and GDP
  • The Maddison Project Database provides information on comparative economic growth and income levels over the very long run, follow-up to Maddison.
  • Comparative Historical National Accounts provides information on Gross Domestic Product, including an industry breakdown, for the 19th and 20th centuries.
  • Human Mortality Database provides detailed mortality and population data for the world for the last two centuries.

Censuses, Surveys, and other micro-level data¶

  • IPUMS: provides census and survey data from around the world integrated across time and space.
  • General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
  • European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
  • UK Data Service is the UK’s largest collection of social, economic and population data resources.
  • SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time¶

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths¶

In [13]:
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data¶

In [14]:
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2020.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.dta')
    maddison_new.to_stata(pathout + 'Maddison2020.dta', write_index=False, version=117)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False, version=117)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False, version=117)
In [15]:
maddison_new
Out[15]:
countrycode country year gdppc pop
0 AFG Afghanistan 1820 NaN 3280.00000
1 AFG Afghanistan 1870 NaN 4207.00000
2 AFG Afghanistan 1913 NaN 5730.00000
3 AFG Afghanistan 1950 1156.0000 8150.00000
4 AFG Afghanistan 1951 1170.0000 8284.00000
... ... ... ... ... ...
21677 ZWE Zimbabwe 2014 1594.0000 13313.99205
21678 ZWE Zimbabwe 2015 1560.0000 13479.13812
21679 ZWE Zimbabwe 2016 1534.0000 13664.79457
21680 ZWE Zimbabwe 2017 1582.3662 13870.26413
21681 ZWE Zimbabwe 2018 1611.4052 14096.61179

21682 rows × 5 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this

In [16]:
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new
Out[16]:
countrycode country year gdppc pop
0 AFG Afghanistan 1820 NaN 3280.00000
1 AFG Afghanistan 1870 NaN 4207.00000
2 AFG Afghanistan 1913 NaN 5730.00000
3 AFG Afghanistan 1950 1156.0000 8150.00000
4 AFG Afghanistan 1951 1170.0000 8284.00000
... ... ... ... ... ...
21677 ZWE Zimbabwe 2014 1594.0000 13313.99205
21678 ZWE Zimbabwe 2015 1560.0000 13479.13812
21679 ZWE Zimbabwe 2016 1534.0000 13664.79457
21680 ZWE Zimbabwe 2017 1582.3662 13870.26413
21681 ZWE Zimbabwe 2018 1611.4052 14096.61179

21682 rows × 5 columns

Original Maddison Data¶

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.

In [17]:
if not os.path.exists(pathout + 'Maddison_original.xlsx'):
    import urllib
    dataurl = "https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/md2010_horizontal.xlsx"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xlsx')

Some data munging¶

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.

In [18]:
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xlsx', sheet_name="Population", skiprows=2)
maddison_old_pop
Out[18]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 2002 2003 2004 2005 2006 2007 2008 2009 Unnamed: 201 2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 NaN 700.0 NaN 2000.0 NaN 2500.0 NaN 2500.0 ... 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 NaN 8120.000
2 Belgium 300.0 NaN 400.0 NaN 1400.0 NaN 1600.0 NaN 2000.0 ... 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 NaN 10409.000
3 Denmark 180.0 NaN 360.0 NaN 600.0 NaN 650.0 NaN 700.0 ... 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 NaN 5730.488
4 Finland 20.0 NaN 40.0 NaN 300.0 NaN 400.0 NaN 400.0 ... 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 NaN 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. NaN 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. NaN 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. NaN 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. NaN 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. NaN 2308.205

278 rows × 203 columns

In [22]:
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xlsx', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc
Out[22]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 NaN 425.000000 NaN 707 NaN 837.200000 NaN 993.200000 ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 NaN 425.000000 NaN 875 NaN 975.625000 NaN 1144.000000 ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 NaN 400.000000 NaN 738.333333 NaN 875.384615 NaN 1038.571429 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 NaN 400.000000 NaN 453.333333 NaN 537.500000 NaN 637.500000 ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 NaN 424.767802 NaN 413.709504 NaN 422.071584 NaN 420.628684 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 NaN 453.402162 NaN 566.389464 NaN 595.783856 NaN 614.853602 ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names

In [23]:
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data

In [24]:
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable

In [25]:
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]
In [26]:
maddison_old_pop
Out[26]:
Country pop_1 pop_1000 pop_1500 pop_1600 pop_1700 pop_1820 pop_1821 pop_1822 pop_1823 ... pop_2001 pop_2002 pop_2003 pop_2004 pop_2005 pop_2006 pop_2007 pop_2008 pop_2009 pop_2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 700.0 2000.0 2500.0 2500.0 3369.0 3386.0 3402.0 3419.0 ... 8131.690 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 8120.000
2 Belgium 300.0 400.0 1400.0 1600.0 2000.0 3434.0 3464.0 3495.0 3526.0 ... 10291.679 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 10409.000
3 Denmark 180.0 360.0 600.0 650.0 700.0 1155.0 1167.0 1179.0 1196.0 ... 5355.826 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 5730.488
4 Finland 20.0 40.0 300.0 400.0 400.0 1169.0 1186.0 1202.0 1219.0 ... 5180.309 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 431.170 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 177.562 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 418.454 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 732.570 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1759.756 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. 2308.205

278 rows × 197 columns

In [27]:
maddison_old_gdppc
Out[27]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 425.000000 707 837.200000 993.200000 1218.165628 NaN NaN NaN ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 425.000000 875 975.625000 1144.000000 1318.870122 NaN NaN NaN ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 400.000000 738.333333 875.384615 1038.571429 1273.593074 1320.479863 1326.547922 1307.692308 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 400.000000 453.333333 537.500000 637.500000 781.009410 NaN NaN NaN ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 453.402162 566.389464 595.783856 614.853602 665.735330 NaN NaN NaN ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.

In [28]:
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc
Out[28]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 30 Western Europe 576.167665 427.425665 771.093805 887.906964 993.456911 1194.184683 NaN NaN NaN ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 NaN NaN NaN ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 7 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 NaN NaN NaN ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457143 437.558140 526.639004 691.060678 NaN NaN NaN ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.4179 573.550859 571.605276 580.626115 NaN NaN NaN ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 195 columns

Let's drop missing values

In [29]:
gdppc = gdppc.dropna(axis=1, how='any')
gdppc
Out[29]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1870 gdppc_1900 gdppc_1913 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 30 Western Europe 576.167665 427.425665 771.093805 887.906964 993.456911 1194.184683 1953.068150 2884.661525 3456.576178 ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 2419.152411 4014.870040 5232.816582 ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 7 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 936.628265 1437.944586 1694.879668 ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457143 437.558140 526.639004 691.060678 676.005331 1113.071149 1494.431922 ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.4179 573.550859 571.605276 580.626115 553.459947 637.615593 695.131881 ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 500.011054 601.236364 637.433138 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 70 columns

Let's convert from wide to long format

In [30]:
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc
Out[30]:
Country year gdppc_
0 30 Western Europe 1 576.167665
1 Western Offshoots 1 400.0
2 7 East Europe 1 411.789474
3 Latin America 1 400.0
4 Asia 1 455.671021
... ... ... ...
409 Western Offshoots 2008 30151.80588
410 7 East Europe 2008 8568.967581
411 Latin America 2008 6973.134656
412 Asia 2008 5611.198564
413 Africa 2008 1780.265474

414 rows × 3 columns

Plotting¶

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.

In [31]:
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.

In [32]:
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2
Out[32]:
Country 30 Western Europe 7 East Europe Africa Asia Latin America Western Offshoots
year
1 576.167665 411.789474 472.352941 455.671021 400.0 400.0
1000 427.425665 400.0 424.767802 469.961665 400.0 400.0
1500 771.093805 496 413.709504 568.4179 416.457143 400
1600 887.906964 548.023599 422.071584 573.550859 437.55814 400.0
1700 993.456911 606.010638 420.628684 571.605276 526.639004 476.0
... ... ... ... ... ... ...
2004 20199.2207 6942.136596 1558.099461 4661.517477 6063.068969 28807.845958
2005 20522.238008 7261.721015 1603.686517 4900.563281 6265.525702 29415.399334
2006 21087.304789 7730.09757 1663.531318 5187.253152 6530.533583 29922.741918
2007 21589.011346 8192.881904 1724.226776 5408.383588 6783.869986 30344.425293
2008 21671.774225 8568.967581 1780.265474 5611.198564 6973.134656 30151.80588

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.

In [33]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[33]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
No description has been provided for this image
In [34]:
fig
Out[34]:
No description has been provided for this image

Now, let's use seaborn

In [35]:
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[35]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
No description has been provided for this image
In [36]:
fig
Out[36]:
No description has been provided for this image

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.

In [37]:
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [38]:
fig
Out[38]:
No description has been provided for this image

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.

In [39]:
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [40]:
fig
Out[40]:
No description has been provided for this image

Let's show the evolution starting from other periods.

In [41]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [42]:
fig
Out[42]:
No description has been provided for this image
In [43]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [44]:
fig
Out[44]:
No description has been provided for this image
In [45]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [46]:
fig
Out[46]:
No description has been provided for this image
In [47]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [48]:
fig
Out[48]:
No description has been provided for this image

Let's plot the evolution of GDP per capita for the whole world

In [49]:
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [50]:
fig
Out[50]:
No description has been provided for this image

Let's plot $log(GDPpc)$ during the modern era when we have sustained economic growth

In [51]:
gdppc['lgdppc'] = np.log(gdppc.gdppc_)

# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='lgdppc', hue='Region', data=gdppc.loc[(gdppc.year>=1950)].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('Log[GDP per capita (1990 Int\'l US$)]')
plt.savefig(pathgraphs + 'sg1950-2000.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [52]:
fig
Out[52]:
No description has been provided for this image
In [53]:
mycolors2 = ["#34495e", "#2ecc71"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year>=1870) & (maddison_new_region.region.apply(lambda x: x in ['we', 'wo']))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=['D', '^'],)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1f}'))
ax.set_yscale('log')
ax.set_yticks([500, 5000, 50000])
ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$, log-scale)')
plt.savefig(pathgraphs + 'sg1870-2000.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image

Growth Rates¶

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.

In [54]:
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc
Out[54]:
Country year gdppc_ Region mysample
0 World Average 1 466.752281 World Average 1
1 World Average 1000 453.402162 World Average 1
2 World Average 1500 566.389464 World Average 1
3 World Average 1600 595.783856 World Average 0
4 World Average 1700 614.853602 World Average 0
... ... ... ... ... ...
189 World Average 2004 6738.281333 World Average 0
190 World Average 2005 6960.031035 World Average 0
191 World Average 2006 7238.383483 World Average 0
192 World Average 2007 7467.648232 World Average 0
193 World Average 2008 7613.922924 World Average 1

69 rows × 5 columns

In [55]:
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth    
Out[55]:
Country year gdppc_ Region mysample year_prev growth Period
0 World Average 1 466.752281 World Average 1 NaN NaN NaN
1 World Average 1000 453.402162 World Average 1 999.0 -0.000029 1-1000
2 World Average 1500 566.389464 World Average 1 500.0 0.000445 1000-1500
3 World Average 1820 665.735330 World Average 1 320.0 0.000505 1500-1820
4 World Average 2008 7613.922924 World Average 1 188.0 0.013046 1820-2008
In [56]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/2763387324.py:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/2763387324.py:6: UserWarning: The palette list has more values (5) than needed (4), which may not be intended.
  sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
No description has been provided for this image
In [57]:
fig
Out[57]:
No description has been provided for this image

Growth of population and income (by regions)¶

In [58]:
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)
         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average     1   466.752281  World         1        NaN       NaN        NaN
1  World Average  1000   453.402162  World         1      999.0 -0.000029     1-1000
2  World Average  1500   566.389464  World         1      500.0  0.000445  1000-1500
3  World Average  1820   665.735330  World         1      320.0  0.000505  1500-1820
4  World Average  1913  1524.430799  World         1       93.0  0.008948  1820-1913
In [59]:
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)    
       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total     1  2.258200e+05  World         1        NaN       NaN        NaN
1  World Total  1000  2.673300e+05  World         1      999.0  0.000169     1-1000
2  World Total  1500  4.384280e+05  World         1      500.0  0.000990  1000-1500
3  World Total  1820  1.041708e+06  World         1      320.0  0.002708  1500-1820
4  World Total  1913  1.792925e+06  World         1       93.0  0.005856  1820-1913
In [60]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth
Out[60]:
Region Period GDPpc Population
1 World 1-1000 -0.000029 0.000169
2 World 1000-1500 0.000445 0.000990
3 World 1500-1820 0.000505 0.002708
4 World 1820-1913 0.008948 0.005856
In [61]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth
Out[61]:
Region Period variable growth
0 World 1-1000 Income per capita -0.000029
1 World 1000-1500 Income per capita 0.000445
2 World 1500-1820 Income per capita 0.000505
3 World 1820-1913 Income per capita 0.008948
4 World 1-1000 Population 0.000169
5 World 1000-1500 Population 0.000990
6 World 1500-1820 Population 0.002708
7 World 1820-1913 Population 0.005856
In [62]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/1073964587.py:6: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [63]:
fig
Out[63]:
No description has been provided for this image
In [64]:
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/1993645060.py:47: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [65]:
fig
Out[65]:
No description has been provided for this image
In [66]:
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/437351408.py:47: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [67]:
fig
Out[67]:
No description has been provided for this image
In [68]:
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/389994245.py:47: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [69]:
fig
Out[69]:
No description has been provided for this image
In [70]:
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/3977064746.py:47: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [71]:
fig
Out[71]:
No description has been provided for this image
In [72]:
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/4253798638.py:47: UserWarning: The palette list has more values (6) than needed (2), which may not be intended.
  sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
No description has been provided for this image
In [73]:
fig
Out[73]:
No description has been provided for this image

Comparing richest to poorest region across time¶

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.

In [74]:
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')
In [75]:
gdp_ratio
Out[75]:
Country year 30 Western Europe 7 East Europe Africa Asia Latin America Western Offshoots Richest-Poorest Ratio Region
0 1 576.167665 411.789474 472.352941 455.671021 400.0 400.0 1.440419 Richest-Poorest
1 1000 427.425665 400.0 424.767802 469.961665 400.0 400.0 1.174904 Richest-Poorest
2 1500 771.093805 496 413.709504 568.4179 416.457143 400 1.927735 Richest-Poorest
3 1700 993.456911 606.010638 420.628684 571.605276 526.639004 476.0 2.361838 Richest-Poorest
4 1820 1194.184683 683.160984 419.755914 580.626115 691.060678 1201.993477 2.863553 Richest-Poorest
5 1870 1953.06815 936.628265 500.011054 553.459947 676.005331 2419.152411 4.838198 Richest-Poorest
6 1913 3456.576178 1694.879668 637.433138 695.131881 1494.431922 5232.816582 8.209201 Richest-Poorest
7 1940 4554.045082 1968.706774 813.374613 893.992784 1932.850716 6837.844866 8.40676 Richest-Poorest
8 1960 6879.294331 3069.750386 1055.114678 1025.743131 3135.517072 10961.082848 10.685992 Richest-Poorest
9 1980 13154.033928 5785.933433 1514.558119 2028.654705 5437.924365 18060.162963 11.924378 Richest-Poorest
10 2000 19176.001655 5970.165085 1447.071701 3797.608955 5889.237351 27393.808035 18.930512 Richest-Poorest
11 2008 21671.774225 8568.967581 1780.265474 5611.198564 6973.134656 30151.80588 16.936691 Richest-Poorest
In [76]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [77]:
fig
Out[77]:
No description has been provided for this image

Visualize as Table¶

In [78]:
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})
Out[78]:
Country year 30 Western Europe 7 East Europe Africa Asia Latin America Western Offshoots Richest-Poorest Ratio Region
0 1 576.167665 411.789474 472.352941 455.671021 400.000000 400.000000 1.440419 Richest-Poorest
1 1000 427.425665 400.000000 424.767802 469.961665 400.000000 400.000000 1.174904 Richest-Poorest
2 1500 771.093805 496 413.709504 568.417900 416.457143 400 1.927735 Richest-Poorest
3 1700 993.456911 606.010638 420.628684 571.605276 526.639004 476.000000 2.361838 Richest-Poorest
4 1820 1194.184683 683.160984 419.755914 580.626115 691.060678 1201.993477 2.863553 Richest-Poorest
5 1870 1953.068150 936.628265 500.011054 553.459947 676.005331 2419.152411 4.838198 Richest-Poorest
6 1913 3456.576178 1694.879668 637.433138 695.131881 1494.431922 5232.816582 8.209201 Richest-Poorest
7 1940 4554.045082 1968.706774 813.374613 893.992784 1932.850716 6837.844866 8.406760 Richest-Poorest
8 1960 6879.294331 3069.750386 1055.114678 1025.743131 3135.517072 10961.082848 10.685992 Richest-Poorest
9 1980 13154.033928 5785.933433 1514.558119 2028.654705 5437.924365 18060.162963 11.924378 Richest-Poorest
10 2000 19176.001655 5970.165085 1447.071701 3797.608955 5889.237351 27393.808035 18.930512 Richest-Poorest
11 2008 21671.774225 8568.967581 1780.265474 5611.198564 6973.134656 30151.805880 16.936691 Richest-Poorest

Export table to LaTeX¶

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.

In [79]:
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))
\begin{tabular}{lrllllllll}
\toprule
Country & year & 30  Western Europe & 7 East Europe & Africa & Asia & Latin America & Western Offshoots & Richest-Poorest Ratio & Region \\
\midrule
0 & 1 & 576.167665 & 411.789474 & 472.352941 & 455.671021 & 400.000000 & 400.000000 & 1.440419 & Richest-Poorest \\
1 & 1000 & 427.425665 & 400.000000 & 424.767802 & 469.961665 & 400.000000 & 400.000000 & 1.174904 & Richest-Poorest \\
2 & 1500 & 771.093805 & 496 & 413.709504 & 568.417900 & 416.457143 & 400 & 1.927735 & Richest-Poorest \\
3 & 1700 & 993.456911 & 606.010638 & 420.628684 & 571.605276 & 526.639004 & 476.000000 & 2.361838 & Richest-Poorest \\
4 & 1820 & 1194.184683 & 683.160984 & 419.755914 & 580.626115 & 691.060678 & 1201.993477 & 2.863553 & Richest-Poorest \\
5 & 1870 & 1953.068150 & 936.628265 & 500.011054 & 553.459947 & 676.005331 & 2419.152411 & 4.838198 & Richest-Poorest \\
6 & 1913 & 3456.576178 & 1694.879668 & 637.433138 & 695.131881 & 1494.431922 & 5232.816582 & 8.209201 & Richest-Poorest \\
7 & 1940 & 4554.045082 & 1968.706774 & 813.374613 & 893.992784 & 1932.850716 & 6837.844866 & 8.406760 & Richest-Poorest \\
8 & 1960 & 6879.294331 & 3069.750386 & 1055.114678 & 1025.743131 & 3135.517072 & 10961.082848 & 10.685992 & Richest-Poorest \\
9 & 1980 & 13154.033928 & 5785.933433 & 1514.558119 & 2028.654705 & 5437.924365 & 18060.162963 & 11.924378 & Richest-Poorest \\
10 & 2000 & 19176.001655 & 5970.165085 & 1447.071701 & 3797.608955 & 5889.237351 & 27393.808035 & 18.930512 & Richest-Poorest \\
11 & 2008 & 21671.774225 & 8568.967581 & 1780.265474 & 5611.198564 & 6973.134656 & 30151.805880 & 16.936691 & Richest-Poorest \\
\bottomrule
\end{tabular}

In [80]:
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}
\begin{tabular}{lrrrrrrrrrrrr} \toprule year & 1 & 1000 & 1500 & 1700 & 1820 & 1870 & 1913 & 1940 & 1960 & 1980 & 2000 & 2008 \\ Country & & & & & & & & & & & & \\ \midrule Africa & 472.4 & 424.8 & 413.7 & 420.6 & 419.8 & 500.0 & 637.4 & 813.4 & 1,055.1 & 1,514.6 & 1,447.1 & 1,780.3 \\ Asia & 455.7 & 470.0 & 568.4 & 571.6 & 580.6 & 553.5 & 695.1 & 894.0 & 1,025.7 & 2,028.7 & 3,797.6 & 5,611.2 \\ East Europe & 411.8 & 400.0 & 496.0 & 606.0 & 683.2 & 936.6 & 1,694.9 & 1,968.7 & 3,069.8 & 5,785.9 & 5,970.2 & 8,569.0 \\ Latin America & 400.0 & 400.0 & 416.5 & 526.6 & 691.1 & 676.0 & 1,494.4 & 1,932.9 & 3,135.5 & 5,437.9 & 5,889.2 & 6,973.1 \\ Western Europe & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 & 6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\ Western Offshoots & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\ Richest-Poorest Ratio & 1.4 & 1.2 & 1.9 & 2.4 & 2.9 & 4.8 & 8.2 & 8.4 & 10.7 & 11.9 & 18.9 & 16.9 \\ \bottomrule \end{tabular}

Export Table to HTML¶

In [81]:
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))
Country year 30 Western Europe 7 East Europe Africa Asia Latin America Western Offshoots Richest-Poorest Ratio Region
0 1 576.167665 411.789474 472.352941 455.671021 400.0 400.0 1.440419 Richest-Poorest
1 1000 427.425665 400.0 424.767802 469.961665 400.0 400.0 1.174904 Richest-Poorest
2 1500 771.093805 496 413.709504 568.4179 416.457143 400 1.927735 Richest-Poorest
3 1700 993.456911 606.010638 420.628684 571.605276 526.639004 476.0 2.361838 Richest-Poorest
4 1820 1194.184683 683.160984 419.755914 580.626115 691.060678 1201.993477 2.863553 Richest-Poorest
5 1870 1953.06815 936.628265 500.011054 553.459947 676.005331 2419.152411 4.838198 Richest-Poorest
6 1913 3456.576178 1694.879668 637.433138 695.131881 1494.431922 5232.816582 8.209201 Richest-Poorest
7 1940 4554.045082 1968.706774 813.374613 893.992784 1932.850716 6837.844866 8.40676 Richest-Poorest
8 1960 6879.294331 3069.750386 1055.114678 1025.743131 3135.517072 10961.082848 10.685992 Richest-Poorest
9 1980 13154.033928 5785.933433 1514.558119 2028.654705 5437.924365 18060.162963 11.924378 Richest-Poorest
10 2000 19176.001655 5970.165085 1447.071701 3797.608955 5889.237351 27393.808035 18.930512 Richest-Poorest
11 2008 21671.774225 8568.967581 1780.265474 5611.198564 6973.134656 30151.80588 16.936691 Richest-Poorest

Take-off, industrialization and reversals¶

Industrialization per capita¶

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9

image.png

In [82]:
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12

image.png

In [83]:
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.

In [84]:
industrialization = industrialization.merge(industrialization2)
industrialization
Out[84]:
Country y1750 y1800 y1830 y1860 y1880 y1900 y1913 y1928 y1938 y1953 y1963 y1973 y1980
0 Developed Countries 8.0 8.0 11.0 16 24 35 55 71 81 135 194 315 344
1 Europe 8.0 8.0 11.0 17 23 33 45 76 94 107 166 260 280
2 Belgium 9.0 10.0 14.0 28 43 56 88 116 89 117 183 291 316
3 France 9.0 9.0 12.0 20 28 39 59 82 73 95 167 259 277
4 Germany 8.0 8.0 9.0 15 25 52 85 101 128 144 244 366 395
5 Italy 8.0 8.0 8.0 10 12 17 26 39 44 61 121 194 231
6 Spain 7.0 7.0 8.0 11 14 19 22 28 23 31 56 144 159
7 Sweden 7.0 8.0 9.0 15 24 41 67 84 135 163 262 405 409
8 Switzerland 7.0 10.0 16.0 26 39 67 87 90 88 167 259 366 354
9 United Kingdom 10.0 16.0 25.0 64 87 100 115 122 157 210 253 341 325
10 Canada NaN 5.0 6.0 7 10 24 46 82 84 185 237 370 379
11 United States 4.0 9.0 14.0 21 38 69 126 182 167 354 393 604 629
12 Japan 7.0 7.0 7.0 7 9 12 20 30 51 40 113 310 353
13 Third World 7.0 6.0 6.0 4 3 2 2 3 4 5 8 14 17
14 China 8.0 6.0 6.0 4 4 3 3 4 4 5 10 18 24
15 India 7.0 6.0 6.0 3 2 1 2 3 4 6 8 14 16
16 Brazil NaN NaN NaN 4 4 5 7 10 10 13 23 42 55
17 Mexico NaN NaN NaN 5 4 5 7 9 8 12 22 36 41
18 World 7.0 6.0 7.0 7 9 14 21 28 31 48 66 100 103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.

In [85]:
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)
In [86]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [87]:
fig
Out[87]:
No description has been provided for this image
In [88]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [89]:
fig
Out[89]:
No description has been provided for this image
In [90]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [91]:
fig
Out[91]:
No description has been provided for this image
In [92]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [93]:
fig
Out[93]:
No description has been provided for this image

Manufacturing¶

Let's use data from the same source to explore what happened to the share of manufacturing across regions.

image.png

image.png

In [94]:
# 1750-1913
manufacturing = [['Developed Countries', 27.0, 32.3, 39.5, 63.4, 79.1, 89.0, 92.5],
                 ['Europe', 23.2, 28.1, 34.2, 53.2, 61.3, 62.0, 56.6],
                 ['Austria-Hungary', 2.9, 3.2, 3.2, 4.2, 4.4, 4.7, 4.4],
                 ['Belgium', 0.3, 0.5, 0.7, 1.4, 1.8, 1.7, 1.8],
                 ['France', 4.0, 4.2, 5.2, 7.9, 7.8, 6.8, 6.1],
                 ['Germany', 2.9, 3.5, 3.5, 4.9, 8.5, 13.2, 14.8],
                 ['Italy', 2.4, 2.5, 2.3, 2.5, 2.5, 2.5, 2.4],
                 ['Russia', 5.0, 5.6, 5.6, 7.0, 7.6, 8.8, 8.2],
                 ['Spain', 1.2, 1.5, 1.5, 1.8, 1.8, 1.6, 1.2],
                 ['Sweden', 0.3, 0.3, 0.4, 0.6, 0.8, 0.9, 1.0],
                 ['Switzerland', 0.1, 0.3, 0.4, 0.7, 0.8, 1.0, 0.9],
                 ['United Kingdom', 1.9, 4.3, 9.5, 19.9, 22.9, 18.5, 13.6],
                 ['Canada', np.nan, np.nan, 0.1, 0.3, 0.4, 0.6, 0.9],
                 ['United States', 0.1, 0.8, 2.4, 7.2, 14.7, 23.6, 32.0],
                 ['Japan', 3.8, 3.5, 2.8, 2.6, 2.4, 2.4, 2.7],
                 ['Third World', 73.0, 67.7, 60.5, 36.6, 20.9, 11.0, 7.5],
                 ['China', 32.8, 33.3, 29.8, 19.7, 12.5, 6.2, 3.6],
                 ['India', 24.5, 19.7, 17.6, 8.6, 2.8, 1.7, 1.4],
                 ['Brazil', np.nan, np.nan, np.nan, 0.4, 0.3, 0.4, 0.5],
                 ['Mexico', np.nan, np.nan, np.nan, 0.4, 0.3, 0.3, 0.3]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
manufacturing = pd.DataFrame(manufacturing, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
manufacturing2 = [['Developed Countries', 92.5, 92.8, 92.8, 93.5, 91.5, 90.1, 88.0],
                  ['Market Economies', 76.7, 80.3, 76.5, 77.5, 70.5, 70.0, 66.9],
                  ['Europe', 40.8, 35.4, 37.3, 26.1, 26.5, 24.5, 22.9],
                  ['Belgium', 1.8, 1.7, 1.1, 0.8, 0.8, 0.7, 0.7],
                  ['France', 6.1, 6.0, 4.4, 3.2, 3.8, 3.5, 3.3],
                  ['Germany', 14.8, 11.6, 12.7, 5.9, 6.4, 5.9, 5.3],
                  ['Italy', 2.4, 2.7, 2.8, 2.3, 2.9, 2.9, 2.9],
                  ['Spain', 1.2, 1.1, 0.8, 0.7, 0.8, 1.3, 1.4],
                  ['Sweden', 1.0, 0.9, 1.2, 0.9, 0.9, 0.9, 0.8],
                  ['Switzerland', 0.9, 0.7, 0.5, 0.7, 0.7, 0.6, 0.5],
                  ['United Kingdom', 13.6, 9.9, 10.7, 8.4, 6.4, 4.9, 4.0],
                  ['Canada', 0.9, 1.5, 1.4, 2.2, 2.1, 2.1, 2.0],
                  ['United States', 32.0, 39.3, 31.4, 44.7, 35.1, 33.0, 31.5],
                  ['Japan', 2.7, 3.3, 5.2, 2.9, 5.1, 8.8, 9.1],
                  ['U.S.S.R.', 8.2, 5.3, 9.0, 10.7, 14.2, 14.4, 14.8],
                  ['Third World', 7.5, 7.2, 7.2, 6.5, 8.5, 9.9, 12.0],
                  ['India', 1.4, 1.9, 2.4, 1.7, 1.8, 2.1, 2.3],
                  ['Brazil', 0.5, 0.6, 0.6, 0.6, 0.8, 1.1, 1.4],
                  ['Mexico', 0.3, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6],
                  ['China', 3.6, 3.4, 3.1, 2.3, 3.5, 3.9, 5.0]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
manufacturing2 = pd.DataFrame(manufacturing2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
manufacturing = manufacturing.merge(manufacturing2)
manufacturing = pd.wide_to_long(manufacturing, ['y'], i='Country', j='year').reset_index()
manufacturing.rename(columns={'y':'manufacturing'}, inplace=True)
manufacturing['manufacturing'] = manufacturing.manufacturing / 100
manufacturing
Out[94]:
Country year manufacturing
0 Developed Countries 1750 0.270
1 Belgium 1750 0.003
2 France 1750 0.040
3 Germany 1750 0.029
4 Italy 1750 0.024
... ... ... ...
216 Third World 1980 0.120
217 China 1980 0.050
218 India 1980 0.023
219 Brazil 1980 0.014
220 Mexico 1980 0.006

221 rows × 3 columns

In [95]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [96]:
fig
Out[96]:
No description has been provided for this image
In [97]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

manufacturing['dev_level'] = manufacturing.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [98]:
fig
Out[98]:
No description has been provided for this image
In [99]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [100]:
fig
Out[100]:
No description has been provided for this image
In [101]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[
                 (manufacturing.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (manufacturing.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'manufacturing-UK-IND.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [102]:
fig
Out[102]:
No description has been provided for this image

Industrial Potential¶

We can also explore the industrial potantial of these countries.

image.png

image.png

In [103]:
# 1750-1913
indpotential = [['Developed Countries', 34.4, 47.4, 72.9, 143.2, 253.1, 481.2, 863.0,],
                ['Europe', 29.6, 41.2, 63.0, 120.3, 196.2, 335.4, 527.8,],
                ['Austria-Hungary', 3.7, 4.8, 5.8, 9.5, 14.0, 25.6, 40.7,],
                ['Belgium', 0.4, 0.7, 1.3, 3.1, 5.7, 9.2, 16.3,],
                ['France', 5.0, 6.2, 9.5, 17.9, 25.1, 36.8, 57.3,],
                ['Germany', 3.7, 5.2, 6.5, 11.1, 27.4, 71.2, 137.7,],
                ['Italy', 3.1, 3.7, 4.2, 5.7, 8.1, 13.6, 22.5,],
                ['Russia', 6.4, 8.3, 10.3, 15.8, 24.5, 47.5, 76.6,],
                ['Spain', 1.6, 2.1, 2.7, 4.0, 5.8, 8.5, 11.0,],
                ['Sweden', 0.3, 0.5, 0.6, 1.4, 2.6, 5.0, 9.0,],
                ['Switzerland', 0.2, 0.4, 0.8, 1.6, 2.6, 5.4, 8.0,],
                ['United Kingdom', 2.4, 6.2, 17.5, 45.0, 73.3, 100.0, 127.2,],
                ['Canada', np.nan, np.nan, 0.1, 0.6, 1.4, 3.2, 8.7,],
                ['United States', 0.1, 1.1, 4.6, 16.2, 46.9, 127.8, 298.1,],
                ['Japan', 4.8, 5.1, 5.2, 5.8, 7.6, 13.0, 25.1,],
                ['Third World', 92.9, 99.4, 111.5, 82.7, 67.0, 59.6, 69.5,],
                ['China', 41.7, 48.8, 54.9, 44.1, 39.9, 33.5, 33.3,],
                ['India', 31.2, 29.0, 32.5, 19.4, 8.8, 9.3, 13.1,],
                ['Brazil', np.nan, np.nan, np.nan, 0.9, 0.9, 2.1, 4.3,],
                ['Mexico', np.nan, np.nan, np.nan, 0.9, 0.8, 1.7, 2.7,],
                ['World', 127.3, 146.9, 184.4, 225.9, 320.1, 540.8, 932.5,]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
indpotential = pd.DataFrame(indpotential, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
indpotential2 = [['Developed Countries', 863, 1259, 1562, 2870, 4699, 8432, 9718],
                 ['Market Economies', 715, 1089, 1288, 2380, 3624, 6547, 7388],
                 ['Europe', 380, 480, 629, 801, 1361, 2290, 2529],
                 ['Belgium', 16, 22, 18, 25, 41, 69, 76],
                 ['France', 57, 82, 74, 98, 194, 328, 362],
                 ['Germany', 138, 158, 214, 180, 330, 550, 590],
                 ['Italy', 23, 37, 46, 71, 150, 258, 319],
                 ['Spain', 11, 16, 14, 22, 43, 122, 156],
                 ['Sweden', 9, 12, 21, 28, 48, 80, 83],
                 ['Switzerland', 8, 9, 9, 20, 37, 57, 54],
                 ['United Kingdom', 127, 135, 181, 258, 330, 462, 441],
                 ['Canada', 9, 20, 23, 66, 109, 199, 220],
                 ['United States', 298, 533, 528, 1373, 1804, 3089, 3475],
                 ['Japan', 25, 45, 88, 88, 264, 819, 1001],
                 ['U.S.S.R.', 77, 72, 152, 328, 760, 1345, 1630],
                 ['Third World', 70, 98, 122, 200, 439, 927, 1323],
                 ['India', 13, 26, 40, 52, 91, 194, 254],
                 ['Brazil', 4, 8, 10, 18, 42, 102, 159],
                 ['Mexico', 3, 3, 4, 9, 21, 47, 68],
                 ['China', 33, 46, 52, 71, 178, 369, 553],
                 ['World', 933, 1356, 1684, 3070, 5138, 9359, 11041]]

years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
indpotential2 = pd.DataFrame(indpotential2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
indpotential = indpotential.merge(indpotential2[indpotential2.columns.difference(['y1913'])])
indpotential = pd.wide_to_long(indpotential, ['y'], i='Country', j='year').reset_index()
indpotential.rename(columns={'y':'indpotential'}, inplace=True)
indpotential
Out[103]:
Country year indpotential
0 Developed Countries 1750 34.4
1 Europe 1750 29.6
2 Belgium 1750 0.4
3 France 1750 5.0
4 Germany 1750 3.7
... ... ... ...
242 China 1980 553.0
243 India 1980 254.0
244 Brazil 1980 159.0
245 Mexico 1980 68.0
246 World 1980 11041.0

247 rows × 3 columns

In [104]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [105]:
fig
Out[105]:
No description has been provided for this image
In [106]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

indpotential['dev_level'] = indpotential.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [107]:
fig
Out[107]:
No description has been provided for this image
In [108]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-NonDev.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [109]:
fig
Out[109]:
No description has been provided for this image
In [110]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[
                 (indpotential.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (indpotential.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-UK-IND.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [111]:
fig
Out[111]:
No description has been provided for this image

Persistence¶

Let's explore the persistence of economic development since 1950. To do so, let's get the Penn World Table and World Bank Data.

Penn World Table¶

Let's start by importing the data from the Penn World Tables

In [114]:
try:
    pwt_xls = pd.read_excel(pathout + 'pwt.xlsx',encoding='utf-8')
    pwt = pd.read_stata(pathout + 'pwt.dta')    
except:
    pwt_xls = pd.read_excel('https://dataverse.nl/api/access/datafile/354095',sheet_name=1)
    pwt = pd.read_stata('https://dataverse.nl/api/access/datafile/354098')
    pwt_xls.to_excel(pathout + 'pwt.xlsx', index=False)
    pwt.to_stata(pathout + 'pwt.dta', write_index=False, version=117)
    
# Get labels of variables
pwt_labels = pd.io.stata.StataReader(pathout + 'pwt.dta').variable_labels()

The excel file let's us know the defintion of the variables, while the Stata file has the data (of course the excel file also has the data). For some reason the original Stata file does not seem to have labels!

In [115]:
pwt_labels
Out[115]:
{'countrycode': '',
 'country': '',
 'currency_unit': '',
 'year': '',
 'rgdpe': '',
 'rgdpo': '',
 'pop': '',
 'emp': '',
 'avh': '',
 'hc': '',
 'ccon': '',
 'cda': '',
 'cgdpe': '',
 'cgdpo': '',
 'cn': '',
 'ck': '',
 'ctfp': '',
 'cwtfp': '',
 'rgdpna': '',
 'rconna': '',
 'rdana': '',
 'rnna': '',
 'rkna': '',
 'rtfpna': '',
 'rwtfpna': '',
 'labsh': '',
 'irr': '',
 'delta': '',
 'xr': '',
 'pl_con': '',
 'pl_da': '',
 'pl_gdpo': '',
 'i_cig': '',
 'i_xm': '',
 'i_xr': '',
 'i_outlier': '',
 'i_irr': '',
 'cor_exp': '',
 'statcap': '',
 'csh_c': '',
 'csh_i': '',
 'csh_g': '',
 'csh_x': '',
 'csh_m': '',
 'csh_r': '',
 'pl_c': '',
 'pl_i': '',
 'pl_g': '',
 'pl_x': '',
 'pl_m': '',
 'pl_n': '',
 'pl_k': ''}
In [116]:
pwt_xls
Out[116]:
Variable name Variable definition
0 Identifier variables NaN
1 countrycode 3-letter ISO country code
2 country Country name
3 currency_unit Currency unit
4 year Year
... ... ...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

67 rows × 2 columns

In [117]:
pwt
Out[117]:
countrycode country currency_unit year rgdpe rgdpo pop emp avh hc ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
0 ABW Aruba Aruban Guilder 1950 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 ABW Aruba Aruban Guilder 1951 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 ABW Aruba Aruban Guilder 1952 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 ABW Aruba Aruban Guilder 1953 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 ABW Aruba Aruban Guilder 1954 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe US Dollar 2015 40141.617188 39798.644531 13.814629 6.393752 NaN 2.584653 ... 0.140172 -0.287693 -0.051930 0.479228 0.651287 0.541446 0.616689 0.533235 0.425715 1.778124
12806 ZWE Zimbabwe US Dollar 2016 41875.203125 40963.191406 14.030331 6.504374 NaN 2.616257 ... 0.131920 -0.251232 -0.016258 0.470640 0.651027 0.539631 0.619789 0.519718 0.419446 1.728804
12807 ZWE Zimbabwe US Dollar 2017 44672.175781 44316.742188 14.236595 6.611773 NaN 2.648248 ... 0.126722 -0.202827 -0.039897 0.473560 0.639560 0.519956 0.619739 0.552042 0.418681 1.756007
12808 ZWE Zimbabwe US Dollar 2018 44325.109375 43420.898438 14.438802 6.714952 NaN 2.680630 ... 0.144485 -0.263658 -0.020791 0.543757 0.655473 0.529867 0.641361 0.561526 0.426527 1.830088
12809 ZWE Zimbabwe US Dollar 2019 42296.062500 40826.570312 14.645468 6.831017 NaN 2.713408 ... 0.213562 -0.270959 -0.089798 0.494755 0.652439 0.500927 0.487763 0.430082 0.419883 1.580885

12810 rows × 52 columns

In [118]:
# Describe the data
pwt.describe()
Out[118]:
year rgdpe rgdpo pop emp avh hc ccon cda cgdpe ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
count 12810.000000 1.039900e+04 1.039900e+04 10399.000000 9529.000000 3492.000000 8637.000000 1.039900e+04 1.039900e+04 1.039900e+04 ... 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10314.000000 7090.000000
mean 1984.500000 3.048523e+05 3.070802e+05 30.962982 14.171166 1986.923200 2.087200 2.249465e+05 3.049463e+05 3.061396e+05 ... 0.229317 -0.300829 0.017791 0.370850 0.423964 0.345635 0.409820 0.403422 0.364398 1.417965
std 20.205986 1.214332e+06 1.218457e+06 116.189454 58.056976 284.003338 0.727413 8.882342e+05 1.236096e+06 1.226056e+06 ... 0.266793 0.640212 0.216200 0.424091 0.635708 0.408015 0.196283 0.202566 0.435907 2.078080
min 1950.000000 2.036377e+01 2.765232e+01 0.004425 0.001200 1380.607643 1.007038 1.604856e+01 2.176663e+01 2.026185e+01 ... -1.937363 -23.237627 -12.568965 0.015589 0.006002 0.009270 0.007354 0.020806 0.014431 0.067465
25% 1967.000000 6.801782e+03 7.191773e+03 1.579663 0.775101 1788.478805 1.450483 5.892980e+03 7.194495e+03 6.689781e+03 ... 0.067027 -0.379584 -0.025204 0.171226 0.191958 0.116110 0.237652 0.240777 0.171785 0.691960
50% 1984.500000 3.031913e+04 3.084435e+04 6.150688 2.856044 1972.355973 1.987572 2.465417e+04 3.134864e+04 3.031841e+04 ... 0.140116 -0.200254 0.000326 0.306258 0.377943 0.244554 0.443856 0.452949 0.300982 1.000000
75% 2002.000000 1.559740e+05 1.587386e+05 19.934229 8.266107 2168.035042 2.674011 1.133967e+05 1.532297e+05 1.559242e+05 ... 0.300332 -0.102694 0.044529 0.484549 0.557087 0.452951 0.557034 0.541063 0.453684 1.511751
max 2019.000000 2.086051e+07 2.059584e+07 1433.783686 799.306641 3039.794005 4.351568 1.682624e+07 2.138355e+07 2.079136e+07 ... 3.523480 32.874020 7.598285 23.122841 34.444988 18.420809 2.056070 4.990355 20.694918 34.340618

8 rows × 44 columns

Computing $\log$ GDP per capita¶

Now, we can create new variables, transform and plot the data

To compute the $log$ of income per capita (GDPpc), the first thing we need is to know the name of the column that contains the GDPpc data in the dataframe. To do this, let's find among the variables those whic in their description have the word capita.

In [119]:
pwt_xls.columns
Out[119]:
Index(['Variable name', 'Variable definition'], dtype='object')

To be able to read the definitions better, let's tell pandas to show us more content.

In [120]:
pd.set_option("display.max_columns", 20)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 1000)
#pd.set_option('display.max_colwidth', -1)
In [121]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('capita')!=-1)]
Out[121]:
Variable name Variable definition
12 hc Human capital index, based on years of schooli...
19 cn Capital stock at current PPPs (in mil. 2017US$)
20 ck Capital services levels at current PPPs (USA=1)
28 rnna Capital stock at constant 2017 national prices...
29 rkna Capital services at constant 2017 national pri...
34 delta Average depreciation rate of the capital stock
47 i_irr 0/1/2/3: the observation for irr is not an out...
53 csh_i Share of gross capital formation at current PPPs
61 pl_i Price level of capital formation, price level...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

So, it seems the data does not contain that variable. But do not panic...we know how to compute it based on GDP and Population. Let's do it!

Identify the name of the variable for GDP¶

In [122]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('GDP')!=-1)]
Out[122]:
Variable name Variable definition
7 rgdpe Expenditure-side real GDP at chained PPPs (in ...
8 rgdpo Output-side real GDP at chained PPPs (in mil. ...
17 cgdpe Expenditure-side real GDP at current PPPs (in ...
18 cgdpo Output-side real GDP at current PPPs (in mil. ...
25 rgdpna Real GDP at constant 2017 national prices (in ...
32 labsh Share of labour compensation in GDP at current...
38 pl_con Price level of CCON (PPP/XR), price level of U...
39 pl_da Price level of CDA (PPP/XR), price level of US...
40 pl_gdpo Price level of CGDPo (PPP/XR), price level of ...
46 i_outlier 0/1: the observation on pl_gdpe or pl_gdpo is ...
57 csh_r Share of residual trade and GDP statistical di...
60 pl_c Price level of household consumption, price l...
61 pl_i Price level of capital formation, price level...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...

Identify the name of the variable for population¶

In [123]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('population')!=-1)]
Out[123]:
Variable name Variable definition
9 pop Population (in millions)

Create a new variables/columns with real GDPpc for all the measures included in PWT¶

In [124]:
# Get columns with GDP measures
gdpcols = pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('REAL GDP')!=-1), 'Variable name'].tolist()

# Generate GDPpc for each measure
for gdp in gdpcols:
    pwt[gdp + '_pc'] = pwt[gdp] / pwt['pop']

# GDPpc data
gdppccols = [col+'_pc' for col in gdpcols]
pwt[['countrycode', 'country', 'year'] + gdppccols]
Out[124]:
countrycode country year rgdpe_pc rgdpo_pc cgdpe_pc cgdpo_pc rgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe 2015 2905.732553 2880.905780 2892.674328 2856.095690 3040.848887
12806 ZWE Zimbabwe 2016 2984.619759 2919.616893 2970.770578 2912.558803 3016.730437
12807 ZWE Zimbabwe 2017 3137.841301 3112.875107 3137.841301 3112.875107 3112.875107
12808 ZWE Zimbabwe 2018 3069.860600 3007.236919 3071.061791 3017.391036 3217.517468
12809 ZWE Zimbabwe 2019 2887.996649 2787.658975 2889.980517 2805.080907 2915.172824

12810 rows × 8 columns

Now let's use the apply function to compute logs.

In [125]:
pwt[['l'+col for col in gdppccols]] = pwt[gdppccols].apply(np.log, axis=1)
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]]
Out[125]:
countrycode country year lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe 2015 7.974441 7.965860 7.969937 7.957211 8.019892
12806 ZWE Zimbabwe 2016 8.001228 7.979208 7.996577 7.976787 8.011929
12807 ZWE Zimbabwe 2017 8.051290 8.043302 8.051290 8.043302 8.043302
12808 ZWE Zimbabwe 2018 8.029387 8.008777 8.029779 8.012148 8.076365
12809 ZWE Zimbabwe 2019 7.968318 7.932957 7.969005 7.939188 7.977684

12810 rows × 8 columns

How correlated are these measures of log GDP per capita?

In [130]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr(numeric_only=True)
Out[130]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
year
1950 lrgdpe_pc 1.000000 0.995984 0.999441 0.995318 0.818344
lrgdpo_pc 0.995984 1.000000 0.996080 0.999158 0.823961
lcgdpe_pc 0.999441 0.996080 1.000000 0.996412 0.821171
lcgdpo_pc 0.995318 0.999158 0.996412 1.000000 0.828410
lrgdpna_pc 0.818344 0.823961 0.821171 0.828410 1.000000
... ... ... ... ... ... ...
2019 lrgdpe_pc 1.000000 0.996471 0.999999 0.996911 0.994584
lrgdpo_pc 0.996471 1.000000 0.996466 0.999960 0.997909
lcgdpe_pc 0.999999 0.996466 1.000000 0.996909 0.994581
lcgdpo_pc 0.996911 0.999960 0.996909 1.000000 0.997914
lrgdpna_pc 0.994584 0.997909 0.994581 0.997914 1.000000

350 rows × 5 columns

While it seems they are highly correlated, it is hard to see here directly. Let's get the statistics for each measures correlations across all years.

In [132]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr(numeric_only=True).describe()
Out[132]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
count 350.000000 350.000000 350.000000 350.000000 350.000000
mean 0.980488 0.976436 0.980427 0.979111 0.935778
std 0.036614 0.037566 0.036591 0.036655 0.052340
min 0.818344 0.822508 0.821171 0.825543 0.818344
25% 0.982882 0.975386 0.982467 0.978188 0.907210
50% 0.996006 0.991856 0.996237 0.995459 0.930806
75% 0.999751 0.999158 0.999751 0.999158 0.987301
max 1.000000 1.000000 1.000000 1.000000 1.000000

Ok. This gives us a better sense of how strongly correlated these measures of log GDP per capita are. In what follows we will use only one, namely Log[GDPpc] based on Expenditure-side real GDP at chained PPPs (in mil. 2011US$), i.e., lrgdpe_pc.

Convergence post-1960?¶

Let's start by looking at the distribution of Log[GDPpc] in 1960. For these we need to subset our dataframe and select only the rows for the year 1960. This is don with the loc property of the dataframe.

In [133]:
gdppc1960 = pwt.loc[pwt.year==1960, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
gdppc1960
Out[133]:
countrycode country year lrgdpe_pc
10 ABW Aruba 1960 NaN
80 AGO Angola 1960 NaN
150 AIA Anguilla 1960 NaN
220 ALB Albania 1960 NaN
290 ARE United Arab Emirates 1960 NaN
... ... ... ... ...
12470 VNM Viet Nam 1960 NaN
12540 YEM Yemen 1960 NaN
12610 ZAF South Africa 1960 8.783560
12680 ZMB Zambia 1960 7.958144
12750 ZWE Zimbabwe 1960 7.818258

183 rows × 4 columns

gdppc1960 has the data for all countries in th eyear 1960. We can plot the histogram using the functions of the dataframe.

In [134]:
gdppc1960.lrgdpe_pc.hist()
Out[134]:
<Axes: >
No description has been provided for this image

We can also plot it using the seaborn package. Let's plot the kernel density of the distribution

In [137]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, fill=True, label='1960', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-density.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [138]:
fig
Out[138]:
No description has been provided for this image

Let's now also include the distribution for other years

In [139]:
gdppc1980 = pwt.loc[pwt.year==1980, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, fill=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, fill=True, label='1980', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1960-1980-density.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [140]:
fig
Out[140]:
No description has been provided for this image
In [143]:
gdppc2000 = pwt.loc[pwt.year==2000, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, fill=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, fill=True, label='1980', linewidth=2)
sns.kdeplot(gdppc2000.lrgdpe_pc, ax=ax, fill=True, label='2000', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1960-2000-density.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [144]:
fig
Out[144]:
No description has been provided for this image

Let's show the evolution of the distribution by looking at it every 10 years starting from 1950 onwards. Moreover, let's do everything in a unique piece of code.

In [145]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1950, 2025, 10)) + [pwt.year.max()]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
fig, ax = plt.subplots()
k = 0
for t in period:
    sns.kdeplot(pwt.loc[pwt.year==t].lrgdpe_pc, ax=ax, fill=True, label=str(t), linewidth=2, color=mycolors[k])
    k += 1
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1950-2010-density.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [146]:
fig
Out[146]:
No description has been provided for this image

Persistence¶

The lack of convergence in the last 60 years suggest that there is some persistence in (recent) development. Let's explore this by plotting the association between past GDP per capita across different periods. In order to make things more comparable, let's normalize looking at income levels relative to the US. To do so, it's better to use the year as the index of the dataframe.

In [147]:
pwt.set_index('year', inplace=True)
pwt['lrgdpe_pc_US'] = pwt.loc[pwt.countrycode=='USA', 'lrgdpe_pc']
pwt['lrgdpe_pc_rel'] = pwt.lrgdpe_pc / pwt.lrgdpe_pc_US
pwt.reset_index(inplace=True)
pwt[['countrycode', 'country', 'year', 'lrgdpe_pc_rel']]
Out[147]:
countrycode country year lrgdpe_pc_rel
0 ABW Aruba 1950 NaN
1 ABW Aruba 1951 NaN
2 ABW Aruba 1952 NaN
3 ABW Aruba 1953 NaN
4 ABW Aruba 1954 NaN
... ... ... ... ...
12805 ZWE Zimbabwe 2015 0.726013
12806 ZWE Zimbabwe 2016 0.727573
12807 ZWE Zimbabwe 2017 0.730951
12808 ZWE Zimbabwe 2018 0.727346
12809 ZWE Zimbabwe 2019 0.720651

12810 rows × 4 columns

Let's plot the relative income levels in 1960 to 1980, 2000 and 2017. First let's create the wide version of this data.

In [148]:
relgdppc = pwt[['countrycode', 'year', 'lrgdpe_pc_rel']].pivot(index='countrycode', columns='year', values='lrgdpe_pc_rel')
relgdppc.columns = ['y' + str(col) for col in relgdppc.columns]
relgdppc.reset_index(inplace=True)
relgdppc
Out[148]:
countrycode y1950 y1951 y1952 y1953 y1954 y1955 y1956 y1957 y1958 ... y2010 y2011 y2012 y2013 y2014 y2015 y2016 y2017 y2018 y2019
0 ABW NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.971785 0.972698 0.966964 0.965812 0.961921 0.959951 0.958192 0.956832 0.953101 0.951019
1 AGO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.799210 0.820745 0.825169 0.825301 0.827355 0.815370 0.809379 0.809938 0.811589 0.802877
2 AIA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.947898 0.946339 0.937914 0.932076 0.934279 0.934142 0.928033 0.913169 0.915423 0.917034
3 ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.845753 0.846474 0.847895 0.845258 0.847195 0.847901 0.847705 0.850693 0.852540 0.852854
4 ARE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.019227 1.029266 1.026225 1.023208 1.024195 1.013461 1.010135 1.010522 1.005990 1.008647
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178 VNM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.771328 0.777442 0.784826 0.786367 0.789719 0.792964 0.797312 0.802582 0.806857 0.810306
179 YEM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.772557 0.758383 0.749334 0.747058 0.744926 0.707611 0.686753 0.657612 0.674280 0.673591
180 ZAF 0.89849 0.893105 0.881282 0.88735 0.895034 0.891319 0.894591 0.897244 0.896683 ... 0.868685 0.871505 0.869000 0.867254 0.864386 0.862042 0.860408 0.859622 0.857125 0.855232
181 ZMB NaN NaN NaN NaN NaN 0.814531 0.817760 0.797963 0.786983 ... 0.748095 0.756346 0.748758 0.745580 0.744860 0.736409 0.737912 0.733792 0.732958 0.731187
182 ZWE NaN NaN NaN NaN 0.780592 0.776128 0.781405 0.787890 0.786626 ... 0.706880 0.709856 0.722509 0.725709 0.724249 0.726013 0.727573 0.730951 0.727346 0.720651

183 rows × 71 columns

In [149]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
k = 0
fig, ax = plt.subplots()
ax.plot([relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], [relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], c='r', label='45 degree')
sns.regplot(x='y1960', y='y2019', data=relgdppc, ax=ax, label='1960-2019')
movex = relgdppc.y1960.mean() * 0.006125
movey = relgdppc.y2019.mean() * 0.006125
for line in range(0,relgdppc.shape[0]):
    if (np.isnan(relgdppc.y1960[line])==False) & (np.isnan(relgdppc.y2019[line])==False):
        ax.text(relgdppc.y1960[line]+movex, relgdppc.y2019[line]+movey, relgdppc.countrycode[line], horizontalalignment='left', fontsize=12, color='black', weight='semibold')
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in 2019] relative to US')
ax.legend()
plt.savefig(pathgraphs + '1960_versus_2019_drop.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [150]:
fig
Out[150]:
No description has been provided for this image

Let's create a function that will simplify our plotting of this figure for various years

In [151]:
def PersistencePlot(dfin, var0='y1960', var1='y2010', labelvar='countrycode', 
                    dx=0.006125, dy=0.006125, 
                    xlabel='Log[Income per capita 1960] relative to US', 
                    ylabel='Log[Income per capita in 2010] relative to US',
                    linelabel='1960-2010',
                    filename='1960_versus_2010_drop.pdf'):
    '''
    Plot the association between var0 and var in dataframe using labelvar for labels. 
    '''
    sns.set(rc={'figure.figsize':(11.7,8.27)})
    sns.set_context("talk")
    df = dfin.copy()
    df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
    # Plot
    k = 0
    fig, ax = plt.subplots()
    ax.plot([df[var0].min()*.99, df[var0].max()*1.01], [df[var0].min()*.99, df[var0].max()*1.01], c='r', label='45 degree')
    sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
    movex = df[var0].mean() * dx
    movey = df[var1].mean() * dy
    for line in range(0,df.shape[0]):
        ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=12, color='black')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.legend()
    plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
    pass
In [152]:
PersistencePlot(relgdppc, var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US',
                    filename='1980_versus_2010_drop.pdf')
No description has been provided for this image
In [153]:
PersistencePlot(relgdppc.loc[(relgdppc.countrycode!='BRN')& (relgdppc.countrycode!='ARE')], var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US', linelabel='1980-2010',
                filename='1980_versus_2010_drop.pdf')
No description has been provided for this image
In [154]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1980, 2020, 20)) + [pwt.year.max()]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
k = 0
fig, ax = plt.subplots()
for t in period:
    sns.regplot(x='y1960', y='y'+str(t), data=relgdppc, ax=ax, label='1960-'+str(t))
    k += 1
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in other period] relative to US')
ax.legend()
Out[154]:
<matplotlib.legend.Legend at 0x183a48c40>
No description has been provided for this image
In [155]:
fig
Out[155]:
No description has been provided for this image

Getting data from the World Bank¶

The World Bank (WB) is a major source of free data. pandas has a subpackage that allows you download from many sources including the WB. The package we will use to access these API is pandas-datareader. pandas-datareader can be used to download data from a host of sources including the WB, OECD, FRED (see here).

In [156]:
from pandas_datareader import data, wb

We can now use wb to get information and data from the WB. Let's start by downloading teh set of basic information about the countries included in the API.

In [157]:
wbcountries = wb.get_countries()
wbcountries['name'] = wbcountries.name.str.strip()
wbcountries
Out[157]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.5167
1 AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates NaN NaN
2 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.5228
3 AFR A9 Africa Aggregates Aggregates Aggregates NaN NaN
4 AFW ZI Africa Western and Central Aggregates Aggregates Aggregates NaN NaN
... ... ... ... ... ... ... ... ... ... ...
291 XZN A5 Sub-Saharan Africa excluding South Africa and ... Aggregates Aggregates Aggregates NaN NaN
292 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.3520
293 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.7460
294 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IDA Lusaka 28.2937 -15.3982
295 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.8312

296 rows × 10 columns

Let's use wb to find all the series that have the word "population".

In [158]:
popvars = wb.search(string='population')
popvars
Out[158]:
id name unit source sourceNote sourceOrganization topics
24 1.1_ACCESS.ELECTRICITY.TOT Access to electricity (% of total population) Sustainable Energy for All Access to electricity is the percentage of pop... b'World Bank Global Electrification Database 2...
39 1.2_ACCESS.ELECTRICITY.RURAL Access to electricity (% of rural population) Sustainable Energy for All Access to electricity is the percentage of rur... b'World Bank Global Electrification Database 2...
40 1.3_ACCESS.ELECTRICITY.URBAN Access to electricity (% of urban population) Sustainable Energy for All Access to electricity is the percentage of tot... b'World Bank Global Electrification Database 2...
161 2.1_ACCESS.CFT.TOT Access to Clean Fuels and Technologies for coo... Sustainable Energy for All b''
1152 BAR.NOED.1519.FE.ZS Barro-Lee: Percentage of female population age... Education Statistics Percentage of female population age 15-19 with... b'Robert J. Barro and Jong-Wha Lee: http://www... Education
... ... ... ... ... ... ... ...
24382 per_sionl.overlap_pop_urb Population only receiving All Social Insurance... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
24383 per_sionl.overlap_q1_preT_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
24384 per_sionl.overlap_q1_rur Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
24385 per_sionl.overlap_q1_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
24386 per_sionl.overlap_q1_urb Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor

2289 rows × 7 columns

Lot's of variables are available, from multiple sources that have been collected by the WB. If you check their website you can see more information on them, also identify and search the variables you may want to focus on. Here let's download the number of males and females in the population by age group, the total population, as well as the total urban population for the year 2017.

In [159]:
femalepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('FE'))]
malepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('MA'))]
popfields = ['SP.POP.0014.FE.IN', 'SP.POP.1564.FE.IN', 'SP.POP.65UP.FE.IN',
             'SP.POP.0014.MA.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.MA.IN',
             'SP.POP.TOTL.FE.IN', 'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL',
             'EN.URB.MCTY', 'EN.URB.LCTY'] + malepop.id.tolist() + femalepop.id.tolist()
popfields
Out[159]:
['SP.POP.0014.FE.IN',
 'SP.POP.1564.FE.IN',
 'SP.POP.65UP.FE.IN',
 'SP.POP.0014.MA.IN',
 'SP.POP.1564.MA.IN',
 'SP.POP.65UP.MA.IN',
 'SP.POP.TOTL.FE.IN',
 'SP.POP.TOTL.MA.IN',
 'SP.POP.TOTL',
 'EN.URB.MCTY',
 'EN.URB.LCTY',
 'SP.POP.0004.MA',
 'SP.POP.0509.MA',
 'SP.POP.1014.MA',
 'SP.POP.1519.MA',
 'SP.POP.2024.MA',
 'SP.POP.2529.MA',
 'SP.POP.3034.MA',
 'SP.POP.3539.MA',
 'SP.POP.4044.MA',
 'SP.POP.4549.MA',
 'SP.POP.5054.MA',
 'SP.POP.5559.MA',
 'SP.POP.6064.MA',
 'SP.POP.6569.MA',
 'SP.POP.7074.MA',
 'SP.POP.7579.MA',
 'SP.POP.80UP.MA',
 'SP.POP.0004.FE',
 'SP.POP.0509.FE',
 'SP.POP.1014.FE',
 'SP.POP.1519.FE',
 'SP.POP.2024.FE',
 'SP.POP.2529.FE',
 'SP.POP.3034.FE',
 'SP.POP.3539.FE',
 'SP.POP.4044.FE',
 'SP.POP.4549.FE',
 'SP.POP.5054.FE',
 'SP.POP.5559.FE',
 'SP.POP.6064.FE',
 'SP.POP.6569.FE',
 'SP.POP.7074.FE',
 'SP.POP.7579.FE',
 'SP.POP.80UP.FE']

Let's also download GDP per capita in PPP at constant 2011 prices, which is the series NY.GDP.PCAP.PP.KD.

In [160]:
wdi = wb.download(indicator=popfields+['NY.GDP.PCAP.PP.KD'], country=wbcountries.iso2c.values, start=2020, end=2020)

wdi
/Users/ozak/anaconda3/envs/EconGrowthUG/lib/python3.9/site-packages/pandas_datareader/wb.py:592: UserWarning: Non-standard ISO country codes: 1A, 1W, 4E, 6F, 6N, 6X, 7E, 8S, A4, A5, A9, B1, B2, B3, B4, B6, B7, B8, C4, C5, C6, C7, C8, C9, D2, D3, D4, D5, D6, D7, EU, F1, F6, JG, M1, M2, N6, OE, R6, S1, S2, S3, S4, T2, T3, T4, T5, T6, T7, V1, V2, V3, V4, XC, XD, XE, XF, XG, XH, XI, XJ, XK, XL, XM, XN, XO, XP, XQ, XT, XU, XY, Z4, Z7, ZB, ZF, ZG, ZH, ZI, ZJ, ZQ, ZT
  warnings.warn(
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_29873/2732027417.py:1: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead
  wdi = wb.download(indicator=popfields+['NY.GDP.PCAP.PP.KD'], country=wbcountries.iso2c.values, start=2020, end=2020)
Out[160]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Afghanistan 2020 8332095.0 1.041297e+07 534863.0 8.740826e+06 1.054440e+07 407078.0 1.927993e+07 1.969230e+07 3.897223e+07 4221532.0 ... 768023.0 633335.0 507800.0 397048.0 302270.0 218619.0 160542.0 93547.0 62155.0 1968.341002
Africa Eastern and Southern 2020 141247235.0 1.921395e+08 12503161.0 1.437468e+08 1.862840e+08 9192275.0 3.458899e+08 3.392231e+08 6.851130e+08 NaN ... 15395923.0 12524782.0 10625713.0 8505660.0 6613495.0 5007892.0 3491077.0 2131772.0 1872421.0 3467.484700
Africa Western and Central 2020 99662713.0 1.249775e+08 7237377.0 1.022874e+08 1.257562e+08 6267919.0 2.318776e+08 2.343115e+08 4.661891e+08 NaN ... 10646395.0 8403723.0 6673270.0 5395584.0 4188538.0 3055219.0 2030466.0 1263113.0 888582.0 3960.847898
Albania 2020 228549.0 9.597280e+05 230308.0 2.388870e+05 9.616770e+05 218700.0 1.418585e+06 1.419264e+06 2.837849e+06 NaN ... 80028.0 87422.0 96652.0 101778.0 94385.0 74597.0 58711.0 45242.0 51758.0 13278.434516
Algeria 2020 6514109.0 1.344434e+07 1360314.0 6.802559e+06 1.407596e+07 1254377.0 2.131877e+07 2.213290e+07 4.345167e+07 2767661.0 ... 1489438.0 1255755.0 1067267.0 857017.0 682176.0 532977.0 351505.0 232070.0 243762.0 10844.770764
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
West Bank and Gaza 2020 932281.0 1.384569e+06 91559.0 9.638950e+05 1.357662e+06 73304.0 2.408409e+06 2.394860e+06 4.803269e+06 NaN ... 112345.0 91777.0 76235.0 62183.0 47547.0 35414.0 24746.0 16514.0 14884.0 5402.538773
World 2020 973429809.0 2.503269e+09 410348021.0 1.036306e+09 2.570127e+09 326724638.0 3.887047e+09 3.933158e+09 7.820206e+09 NaN ... 242130383.0 236434353.0 221422176.0 195883003.0 165529683.0 141960156.0 103925353.0 70455922.0 94006592.0 16212.428176
Yemen, Rep. 2020 6332541.0 9.131117e+06 499409.0 6.642065e+06 9.300719e+06 378195.0 1.596307e+07 1.632098e+07 3.228405e+07 2972988.0 ... 707582.0 532845.0 415312.0 317871.0 242974.0 185041.0 146187.0 95079.0 73103.0 NaN
Zambia 2020 4135806.0 5.252019e+06 201278.0 4.131891e+06 5.080577e+06 126145.0 9.589102e+06 9.338613e+06 1.892772e+07 2774133.0 ... 440597.0 337444.0 248462.0 183631.0 130350.0 87662.0 55432.0 32712.0 25472.0 3183.650773
Zimbabwe 2020 3234256.0 4.735472e+06 314719.0 3.214765e+06 3.956124e+06 214331.0 8.284447e+06 7.385220e+06 1.566967e+07 1529920.0 ... 414475.0 302336.0 206909.0 153503.0 146678.0 132292.0 85703.0 50958.0 45766.0 1990.319419

266 rows × 46 columns

Looks like there are lots of missing values...but be not fooled. This is a strange behavior of wb. Since the original source differs, it is not linking the countries correctly. Let's see this

In [161]:
wdi.sort_index()
Out[161]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Afghanistan 2020 8332095.0 1.041297e+07 534863.0 8.740826e+06 1.054440e+07 407078.0 1.927993e+07 1.969230e+07 3.897223e+07 4221532.0 ... 768023.0 633335.0 507800.0 397048.0 302270.0 218619.0 160542.0 93547.0 62155.0 1968.341002
Africa Eastern and Southern 2020 141247235.0 1.921395e+08 12503161.0 1.437468e+08 1.862840e+08 9192275.0 3.458899e+08 3.392231e+08 6.851130e+08 NaN ... 15395923.0 12524782.0 10625713.0 8505660.0 6613495.0 5007892.0 3491077.0 2131772.0 1872421.0 3467.484700
Africa Western and Central 2020 99662713.0 1.249775e+08 7237377.0 1.022874e+08 1.257562e+08 6267919.0 2.318776e+08 2.343115e+08 4.661891e+08 NaN ... 10646395.0 8403723.0 6673270.0 5395584.0 4188538.0 3055219.0 2030466.0 1263113.0 888582.0 3960.847898
Albania 2020 228549.0 9.597280e+05 230308.0 2.388870e+05 9.616770e+05 218700.0 1.418585e+06 1.419264e+06 2.837849e+06 NaN ... 80028.0 87422.0 96652.0 101778.0 94385.0 74597.0 58711.0 45242.0 51758.0 13278.434516
Algeria 2020 6514109.0 1.344434e+07 1360314.0 6.802559e+06 1.407596e+07 1254377.0 2.131877e+07 2.213290e+07 4.345167e+07 2767661.0 ... 1489438.0 1255755.0 1067267.0 857017.0 682176.0 532977.0 351505.0 232070.0 243762.0 10844.770764
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
West Bank and Gaza 2020 932281.0 1.384569e+06 91559.0 9.638950e+05 1.357662e+06 73304.0 2.408409e+06 2.394860e+06 4.803269e+06 NaN ... 112345.0 91777.0 76235.0 62183.0 47547.0 35414.0 24746.0 16514.0 14884.0 5402.538773
World 2020 973429809.0 2.503269e+09 410348021.0 1.036306e+09 2.570127e+09 326724638.0 3.887047e+09 3.933158e+09 7.820206e+09 NaN ... 242130383.0 236434353.0 221422176.0 195883003.0 165529683.0 141960156.0 103925353.0 70455922.0 94006592.0 16212.428176
Yemen, Rep. 2020 6332541.0 9.131117e+06 499409.0 6.642065e+06 9.300719e+06 378195.0 1.596307e+07 1.632098e+07 3.228405e+07 2972988.0 ... 707582.0 532845.0 415312.0 317871.0 242974.0 185041.0 146187.0 95079.0 73103.0 NaN
Zambia 2020 4135806.0 5.252019e+06 201278.0 4.131891e+06 5.080577e+06 126145.0 9.589102e+06 9.338613e+06 1.892772e+07 2774133.0 ... 440597.0 337444.0 248462.0 183631.0 130350.0 87662.0 55432.0 32712.0 25472.0 3183.650773
Zimbabwe 2020 3234256.0 4.735472e+06 314719.0 3.214765e+06 3.956124e+06 214331.0 8.284447e+06 7.385220e+06 1.566967e+07 1529920.0 ... 414475.0 302336.0 206909.0 153503.0 146678.0 132292.0 85703.0 50958.0 45766.0 1990.319419

266 rows × 46 columns

Let's aggregate by year-country so that we have the correct data

In [162]:
wdi = wdi.groupby(['country', 'year']).max()
wdi.reset_index(inplace=True)
wdi
Out[162]:
country year SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 Afghanistan 2020 8332095.0 1.041297e+07 534863.0 8.740826e+06 1.054440e+07 407078.0 1.927993e+07 1.969230e+07 ... 768023.0 633335.0 507800.0 397048.0 302270.0 218619.0 160542.0 93547.0 62155.0 1968.341002
1 Africa Eastern and Southern 2020 141247235.0 1.921395e+08 12503161.0 1.437468e+08 1.862840e+08 9192275.0 3.458899e+08 3.392231e+08 ... 15395923.0 12524782.0 10625713.0 8505660.0 6613495.0 5007892.0 3491077.0 2131772.0 1872421.0 3467.484700
2 Africa Western and Central 2020 99662713.0 1.249775e+08 7237377.0 1.022874e+08 1.257562e+08 6267919.0 2.318776e+08 2.343115e+08 ... 10646395.0 8403723.0 6673270.0 5395584.0 4188538.0 3055219.0 2030466.0 1263113.0 888582.0 3960.847898
3 Albania 2020 228549.0 9.597280e+05 230308.0 2.388870e+05 9.616770e+05 218700.0 1.418585e+06 1.419264e+06 ... 80028.0 87422.0 96652.0 101778.0 94385.0 74597.0 58711.0 45242.0 51758.0 13278.434516
4 Algeria 2020 6514109.0 1.344434e+07 1360314.0 6.802559e+06 1.407596e+07 1254377.0 2.131877e+07 2.213290e+07 ... 1489438.0 1255755.0 1067267.0 857017.0 682176.0 532977.0 351505.0 232070.0 243762.0 10844.770764
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 West Bank and Gaza 2020 932281.0 1.384569e+06 91559.0 9.638950e+05 1.357662e+06 73304.0 2.408409e+06 2.394860e+06 ... 112345.0 91777.0 76235.0 62183.0 47547.0 35414.0 24746.0 16514.0 14884.0 5402.538773
262 World 2020 973429809.0 2.503269e+09 410348021.0 1.036306e+09 2.570127e+09 326724638.0 3.887047e+09 3.933158e+09 ... 242130383.0 236434353.0 221422176.0 195883003.0 165529683.0 141960156.0 103925353.0 70455922.0 94006592.0 16212.428176
263 Yemen, Rep. 2020 6332541.0 9.131117e+06 499409.0 6.642065e+06 9.300719e+06 378195.0 1.596307e+07 1.632098e+07 ... 707582.0 532845.0 415312.0 317871.0 242974.0 185041.0 146187.0 95079.0 73103.0 NaN
264 Zambia 2020 4135806.0 5.252019e+06 201278.0 4.131891e+06 5.080577e+06 126145.0 9.589102e+06 9.338613e+06 ... 440597.0 337444.0 248462.0 183631.0 130350.0 87662.0 55432.0 32712.0 25472.0 3183.650773
265 Zimbabwe 2020 3234256.0 4.735472e+06 314719.0 3.214765e+06 3.956124e+06 214331.0 8.284447e+06 7.385220e+06 ... 414475.0 302336.0 206909.0 153503.0 146678.0 132292.0 85703.0 50958.0 45766.0 1990.319419

266 rows × 48 columns

Let's merge this data with the original wbcountries dataframe, so that we can use it to plot.

In [163]:
wdi = wbcountries.merge(wdi, left_on='name', right_on='country')
wdi
Out[163]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.51670 ... 3997.0 4172.0 4566.0 4728.0 4401.0 3461.0 2476.0 1587.0 1593.0 29236.047264
1 AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates NaN NaN ... 15395923.0 12524782.0 10625713.0 8505660.0 6613495.0 5007892.0 3491077.0 2131772.0 1872421.0 3467.484700
2 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.52280 ... 768023.0 633335.0 507800.0 397048.0 302270.0 218619.0 160542.0 93547.0 62155.0 1968.341002
3 AFW ZI Africa Western and Central Aggregates Aggregates Aggregates NaN NaN ... 10646395.0 8403723.0 6673270.0 5395584.0 4188538.0 3055219.0 2030466.0 1263113.0 888582.0 3960.847898
4 AGO AO Angola Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IBRD Luanda 13.2420 -8.81155 ... 726370.0 587548.0 490029.0 391684.0 291906.0 202270.0 134874.0 88053.0 67534.0 6029.691895
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 XKX XK Kosovo Europe & Central Asia Europe & Central Asia (excluding high income) Upper middle income IDA Pristina 20.9260 42.56500 ... 61495.0 60089.0 54038.0 46897.0 38522.0 31598.0 24589.0 18694.0 22073.0 10706.510925
262 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.35200 ... 707582.0 532845.0 415312.0 317871.0 242974.0 185041.0 146187.0 95079.0 73103.0 NaN
263 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.74600 ... 1619533.0 1501805.0 1698176.0 1321598.0 969617.0 828009.0 642663.0 408430.0 446059.0 12866.568986
264 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IDA Lusaka 28.2937 -15.39820 ... 440597.0 337444.0 248462.0 183631.0 130350.0 87662.0 55432.0 32712.0 25472.0 3183.650773
265 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.83120 ... 414475.0 302336.0 206909.0 153503.0 146678.0 132292.0 85703.0 50958.0 45766.0 1990.319419

266 rows × 58 columns

Plot Male vs Female population in each country in 2020¶

In [164]:
PersistencePlot(wdi, var0='SP.POP.TOTL.FE.IN', var1='SP.POP.TOTL.MA.IN', xlabel='Number of Females',
                ylabel='Number of Males', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.1, dy=0.1, filename='Female-Male-2017.pdf')
No description has been provided for this image

Let's take $log$s so we see this better

In [165]:
wdi['lpop_fe'] = np.log(wdi['SP.POP.TOTL.FE.IN'])
wdi['lpop_ma'] = np.log(wdi['SP.POP.TOTL.MA.IN'])
PersistencePlot(wdi, var0='lpop_fe', var1='lpop_ma', xlabel='Log[Number of Females]',
                ylabel='Log[Number of Males]', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.01, dy=0.01, filename='Female-Male-2020.pdf')
No description has been provided for this image

Seems like the gender ratio, i.e., the number of males per female is quite different from 1. Let's plot the histogram of the gender ratio across countries to see this better.

In [166]:
(np.exp(wdi['lpop_ma'] - wdi['lpop_fe'])).hist()
Out[166]:
<Axes: >
No description has been provided for this image
In [167]:
wdi['gender_ratio'] = (wdi['SP.POP.TOTL.MA.IN'] / wdi['SP.POP.TOTL.FE.IN'])
wdi.gender_ratio.hist()
Out[167]:
<Axes: >
No description has been provided for this image
In [168]:
print('Maximum gender ratio = ', wdi.gender_ratio.max())
wdi.loc[wdi.gender_ratio>=1.05][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio', ascending=False)
Maximum gender ratio =  2.656168128488438
Out[168]:
iso3c name region gender_ratio
200 QAT Qatar Middle East & North Africa 2.656168
8 ARE United Arab Emirates Middle East & North Africa 2.302827
22 BHR Bahrain Middle East & North Africa 1.652362
182 OMN Oman Middle East & North Africa 1.637503
127 KWT Kuwait Middle East & North Africa 1.581085
205 SAU Saudi Arabia Middle East & North Africa 1.382297
152 MDV Maldives South Asia 1.367235
164 MNP Northern Mariana Islands East Asia & Pacific 1.160089
225 SXM Sint Maarten (Dutch part) Latin America & Caribbean 1.144341
32 BTN Bhutan South Asia 1.127538
88 GNQ Equatorial Guinea Sub-Saharan Africa 1.122699
91 GRL Greenland Europe & Central Asia 1.114567
226 SYC Seychelles Sub-Saharan Africa 1.113009
183 OSS Other small states Aggregates 1.107204
208 SGP Singapore East Asia & Pacific 1.095805
188 PLW Palau East Asia & Pacific 1.090983
118 JOR Jordan Middle East & North Africa 1.078515
218 SST Small states Aggregates 1.077384
159 MLT Malta Middle East & North Africa 1.074322
7 ARB Arab World Aggregates 1.073889
153 MEA Middle East & North Africa Aggregates 1.073621
31 BRN Brunei Darussalam East Asia & Pacific 1.073359
78 FRO Faroe Islands Europe & Central Asia 1.072968
189 PNG Papua New Guinea East Asia & Pacific 1.069911
109 IND India South Asia 1.067528
245 TUV Tuvalu East Asia & Pacific 1.062418
114 ISL Iceland Europe & Central Asia 1.053227
In [169]:
print('Minimum gender ratio = ', wdi.gender_ratio.min())
wdi.loc[wdi.gender_ratio<=0.95][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')
Minimum gender ratio =  0.8217092819478253
Out[169]:
iso3c name region gender_ratio
10 ARM Armenia Europe & Central Asia 0.821709
25 BLR Belarus Europe & Central Asia 0.856668
96 HKG Hong Kong SAR, China East Asia & Pacific 0.857237
145 LVA Latvia Europe & Central Asia 0.859961
248 UKR Ukraine Europe & Central Asia 0.862351
202 RUS Russian Federation Europe & Central Asia 0.867508
143 LTU Lithuania Europe & Central Asia 0.881346
256 VIR Virgin Islands (U.S.) Latin America & Caribbean 0.883757
146 MAC Macao SAR, China East Asia & Pacific 0.885751
82 GEO Georgia Europe & Central Asia 0.888188
0 ABW Aruba Latin America & Caribbean 0.890692
265 ZWE Zimbabwe Sub-Saharan Africa 0.891456
194 PRT Portugal Europe & Central Asia 0.893635
192 PRI Puerto Rico Latin America & Caribbean 0.897965
71 EST Estonia Europe & Central Asia 0.900282
150 MDA Moldova Europe & Central Asia 0.904965
147 MAF St. Martin (French part) Latin America & Caribbean 0.906360
211 SLV El Salvador Latin America & Caribbean 0.909608
178 NPL Nepal South Asia 0.909701
12 ATG Antigua and Barbuda Latin America & Caribbean 0.913439
23 BHS Bahamas, The Latin America & Caribbean 0.917633
51 CUW Curacao Latin America & Caribbean 0.918326
101 HUN Hungary Europe & Central Asia 0.920135
30 BRB Barbados Latin America & Caribbean 0.920384
214 SRB Serbia Europe & Central Asia 0.920797
64 ECA Europe & Central Asia (excluding high income) Aggregates 0.922480
231 TEC Europe & Central Asia (IDA & IBRD countries) Aggregates 0.924504
120 KAZ Kazakhstan Europe & Central Asia 0.925032
138 LKA Sri Lanka South Asia 0.930419
255 VGB British Virgin Islands Latin America & Caribbean 0.930668
171 NAM Namibia Sub-Saharan Africa 0.933373
27 BMU Bermuda North America 0.934920
77 FRA France Europe & Central Asia 0.936113
190 POL Poland Europe & Central Asia 0.937093
201 ROU Romania Europe & Central Asia 0.937193
36 CEB Central Europe and the Baltics Aggregates 0.938474
250 URY Uruguay Latin America & Caribbean 0.938519
125 KNA St. Kitts and Nevis Latin America & Caribbean 0.939031
21 BGR Bulgaria Europe & Central Asia 0.941952
130 LBN Lebanon Middle East & North Africa 0.942925
65 ECS Europe & Central Asia Aggregates 0.943038
263 ZAF South Africa Sub-Saharan Africa 0.945721
168 MWI Malawi Sub-Saharan Africa 0.945820
119 JPN Japan East Asia & Pacific 0.945908
124 KIR Kiribati East Asia & Pacific 0.946543
233 THA Thailand East Asia & Pacific 0.947237
99 HRV Croatia Europe & Central Asia 0.947908
212 SMR San Marino Europe & Central Asia 0.949496

Gender ratio and development¶

In [170]:
wdi['lgdppc'] = np.log(wdi['NY.GDP.PCAP.PP.KD'])
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.scatterplot(x='lgdppc', y='gender_ratio', hue='region',
                hue_order=['East Asia & Pacific', 'Europe & Central Asia',
                           'Latin America & Caribbean ', 'Middle East & North Africa',
                           'North America', 'South Asia', 'Sub-Saharan Africa '],
                data=wdi.loc[wdi.region!='Aggregates'], alpha=1, style='incomeLevel', 
                style_order=['High income', 'Upper middle income', 'Lower middle income', 'Low income'],
                )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Log[GDP per capita]')
ax.set_ylabel('Gender Ratio')
plt.savefig(pathgraphs + 'Gender-Ratio-GDPpc.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [171]:
fig
Out[171]:
No description has been provided for this image

Use statistical and mathematical functions to analyze the data¶

Now let's import the statsmodels module to run regressions.

In [172]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex

Let's estimate the elasticity of the number of men with respect to the number of women.

In [184]:
mod = sm.OLS(wdi['lpop_ma'],sm.add_constant(wdi['lpop_fe']), missing='drop').fit()
mod.summary2()
Out[184]:
Model: OLS Adj. R-squared: 0.999
Dependent Variable: lpop_ma AIC: -408.8997
Date: 2024-02-20 12:57 BIC: -401.7403
No. Observations: 265 Log-Likelihood: 206.45
Df Model: 1 F-statistic: 2.004e+05
Df Residuals: 263 Prob (F-statistic): 0.00
R-squared: 0.999 Scale: 0.012420
Coef. Std.Err. t P>|t| [0.025 0.975]
const 0.0343 0.0352 0.9752 0.3304 -0.0350 0.1036
lpop_fe 0.9979 0.0022 447.6682 0.0000 0.9935 1.0023
Omnibus: 309.703 Durbin-Watson: 1.929
Prob(Omnibus): 0.000 Jarque-Bera (JB): 15080.148
Skew: 5.061 Prob(JB): 0.000
Kurtosis: 38.543 Condition No.: 81

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [185]:
print('The elasticity is %8.4f' % mod.params.iloc[1])
print(r'The $R^2$ is %8.3f' % mod.rsquared)
The elasticity is   0.9979
The $R^2$ is    0.999

Let's instead use the smf module, which allows us to run the regression wiritng the formula instead of having to pass the data and adding the constant as a new variable. Let's run a simple correlation between $\log(GDPpc)$ and the gender ratio.

In [186]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi[['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()
Out[186]:
Model: OLS Adj. R-squared: 0.020
Dependent Variable: lgdppc AIC: 730.1957
Date: 2024-02-20 12:57 BIC: 737.1736
No. Observations: 242 Log-Likelihood: -363.10
Df Model: 1 F-statistic: 6.005
Df Residuals: 240 Prob (F-statistic): 0.0150
R-squared: 0.024 Scale: 1.1868
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 8.3018 0.4385 18.9309 0.0000 7.4380 9.1657
gender_ratio 1.0493 0.4282 2.4504 0.0150 0.2058 1.8928
Omnibus: 14.043 Durbin-Watson: 1.805
Prob(Omnibus): 0.001 Jarque-Bera (JB): 7.647
Skew: -0.251 Prob(JB): 0.022
Kurtosis: 2.288 Condition No.: 12

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [187]:
mysummary=mod.summary2()
Latex(mysummary.as_latex())
Out[187]:
\begin{table} \caption{Results: Ordinary least squares} \label{} \begin{center} \begin{tabular}{llll} \hline Model: & OLS & Adj. R-squared: & 0.020 \\ Dependent Variable: & lgdppc & AIC: & 730.1957 \\ Date: & 2024-02-20 12:57 & BIC: & 737.1736 \\ No. Observations: & 242 & Log-Likelihood: & -363.10 \\ Df Model: & 1 & F-statistic: & 6.005 \\ Df Residuals: & 240 & Prob (F-statistic): & 0.0150 \\ R-squared: & 0.024 & Scale: & 1.1868 \\ \hline \end{tabular} \end{center} \begin{center} \begin{tabular}{lrrrrrr} \hline & Coef. & Std.Err. & t & P$> |$t$|$ & [0.025 & 0.975] \\ \hline Intercept & 8.3018 & 0.4385 & 18.9309 & 0.0000 & 7.4380 & 9.1657 \\ gender\_ratio & 1.0493 & 0.4282 & 2.4504 & 0.0150 & 0.2058 & 1.8928 \\ \hline \end{tabular} \end{center} \begin{center} \begin{tabular}{llll} \hline Omnibus: & 14.043 & Durbin-Watson: & 1.805 \\ Prob(Omnibus): & 0.001 & Jarque-Bera (JB): & 7.647 \\ Skew: & -0.251 & Prob(JB): & 0.022 \\ Kurtosis: & 2.288 & Condition No.: & 12 \\ \hline \end{tabular} \end{center} \end{table} \bigskip Notes: \newline [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [188]:
print('The semi-elasticity is %2.4f' % mod.params.iloc[1])
print(r'The $R^2$ is %1.3f' % mod.rsquared)
The semi-elasticity is 1.0493
The $R^2$ is 0.024

But of course we know correlation is not causation! Even more, from our figure we know that the positive association is driven by the rich oil producing countries of the Middle East & North Africa. To see this, let's replicate the analysis without those countries.

In [189]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi.loc[wdi.region!='Middle East & North Africa'][['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()
Out[189]:
Model: OLS Adj. R-squared: 0.016
Dependent Variable: lgdppc AIC: 677.1577
Date: 2024-02-20 12:58 BIC: 683.9721
No. Observations: 223 Log-Likelihood: -336.58
Df Model: 1 F-statistic: 4.499
Df Residuals: 221 Prob (F-statistic): 0.0350
R-squared: 0.020 Scale: 1.2090
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 12.0873 1.3064 9.2524 0.0000 9.5127 14.6619
gender_ratio -2.8007 1.3205 -2.1210 0.0350 -5.4030 -0.1984
Omnibus: 9.739 Durbin-Watson: 1.727
Prob(Omnibus): 0.008 Jarque-Bera (JB): 5.901
Skew: -0.231 Prob(JB): 0.052
Kurtosis: 2.351 Condition No.: 35

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [190]:
print('The semi-elasticity is %2.4f with a p-value of %1.4f' % (mod.params.iloc[1], mod.pvalues.iloc[1]))
print(r'The $R^2$ is %1.3f' % mod.rsquared)
print("Luckily we had plotted the data, right?!")
The semi-elasticity is -2.8007 with a p-value of 0.0350
The $R^2$ is 0.020
Luckily we had plotted the data, right?!

Homework¶

Using Pandas and Statsmodels write a Jupyter Notebook that:

  1. Uses the data from the Maddison Project to plot the evolution of total population across the world.
  2. Plots the evolution of the share of the world population by countries and WB regions.
  3. Downloads fertility, mortality and life expectancy data from the WB and plots its evolution in the last 60 years.
  4. Downloads mortality and life expectancy data (across regions and cohorts) from the Human Mortality Database and plots its evolution.
  5. Using this data analyze the convergence of life expectanty, mortality and fertility.

Submit your notebook as a pull request to the course's github repository.

Wages and Population In England 1200-1860¶

Let's get the population and wage series from Greg Clark's website for plotting.

In [191]:
import requests
from io import BytesIO

# File 1
url = 'http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/England%20NNI%20-%20Clark%20-%202015.xlsx'
# Disable SSL certificate verification for the request
response = requests.get(url, verify=False)
uk1 = pd.read_excel(BytesIO(response.content), sheet_name='Decadal')

# File 2
url = 'http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/Wages%202014.xlsx'
# Disable SSL certificate verification for the request
response = requests.get(url, verify=False)

uk2 = pd.read_excel(BytesIO(response.content), sheet_name='Decadal')
/Users/ozak/anaconda3/envs/EconGrowthUG/lib/python3.9/site-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'faculty.econ.ucdavis.edu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/Users/ozak/anaconda3/envs/EconGrowthUG/lib/python3.9/site-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'faculty.econ.ucdavis.edu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
In [192]:
uk1
Out[192]:
Decade Unnamed: 1 Pop England Share Males farm sector Male Farm Wage Male Non-Farm Wage Male average Wage Male Work Days per Year Total Wage Income Land rents ... All Capital Income Indirect Taxes Net National Income Unnamed: 15 Price Index - Domestic Expenditure Price Index - GDP Price Index - Cost of Living Unnamed: 19 Real Net National Income (DE) Real NNI/N
0 NaN NaN m. NaN d./day d./day d./day NaN (₤ m) (₤ m) ... (₤ m) (₤ m) (₤ m) NaN (1860s=100) (1860s=100) (1860s=100) NaN (1860s=100) (1860s=100)
1 1200.0 NaN 3.395946 0.555168 1.373647 2.282816 2.088783 300.0 3.078466 1.606036 ... 1.741253 0 6.425755 NaN 6.586338 7.126418 6.544197 NaN 14.897218 86.621351
2 1210.0 NaN 3.395946 0.575784 1.269451 1.84928 2.021137 300.0 3.200434 1.606036 ... 1.95638 0 6.76285 NaN 7.494729 8.109296 7.575843 NaN 14.042469 81.651332
3 1220.0 NaN 3.738005 0.626021 1.255379 2.135947 1.947335 300.0 3.394164 1.628947 ... 1.971441 0 6.994552 NaN 8.332736 9.016021 8.535567 NaN 13.143741 69.432007
4 1230.0 NaN 3.903905 0.652303 1.178929 NaN 1.848722 300.0 3.365295 1.331461 ... 2.04084 0 6.737596 NaN 8.265396 8.943159 8.40574 NaN 12.462355 63.034958
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 1820.0 NaN 11.982104 0.345313 20.333416 34.53486 34.327787 300.0 191.868124 38.191485 ... 78.77882 29.164632 338.003061 NaN 108.478968 112.086157 110.194354 NaN 48.128176 79.290718
64 1830.0 NaN 13.773176 0.308229 20.042939 35.383693 35.429759 300.0 227.645679 36.557278 ... 93.747986 25.876734 383.827677 NaN 100.892148 102.97158 101.268842 NaN 58.593182 84.035129
65 1840.0 NaN 15.636482 0.264763 21.096252 36.16764 37.016669 300.0 269.976598 39.165564 ... 101.875156 26.184313 437.201631 NaN 96.899076 97.81461 98.799054 NaN 69.558992 87.724658
66 1850.0 NaN 17.589614 0.246630 22.09969 37.840784 39.129929 300.0 321.386522 39.474329 ... 124.452112 28.390429 513.703392 NaN 93.317821 93.166374 95.128327 NaN 84.548996 94.905732
67 1860.0 NaN 19.722236 0.239390 23.625775 43.597919 44.659538 300.0 411.41326 43.176349 ... 168.819083 30.282961 653.691653 NaN 99.949265 99.955451 99.996226 NaN 100.343409 100.349161

68 rows × 22 columns

In [193]:
uk2
Out[193]:
Decade Farm Laborers, d/day Coal Miners, d./day Building Laborers, d/day Building Craftsmen, d/day Unnamed: 5 Cost of Living (1860s=100) Unnamed: 7 Real Farm Wage (1860s=100) Real Building Laborer Wage (1860s=100) Real Building Craftsman Wage (1860s=100)
0 1200 1.373647 NaN NaN 2.783922 NaN 6.544197 NaN 88.841573 NaN 80.673336
1 1210 1.262561 NaN NaN 2.078984 NaN 7.575843 NaN 72.045676 NaN 52.335306
2 1220 1.249455 NaN 1.625946 2.602945 NaN 8.535567 NaN 60.578574 51.791535 56.307104
3 1230 1.178929 NaN NaN NaN NaN 8.405740 NaN 59.258095 NaN NaN
4 1240 1.246828 NaN 1.878412 2.893921 NaN 8.871055 NaN 61.132054 58.464596 62.484216
... ... ... ... ... ... ... ... ... ... ... ...
62 1820 20.333416 32.226677 27.009300 42.060419 NaN 110.194354 NaN 78.081590 71.212912 72.500372
63 1830 20.042939 32.680000 28.021165 42.746221 NaN 101.268842 NaN 83.892814 80.390114 80.295861
64 1840 21.096252 30.920000 29.023687 43.311592 NaN 98.771980 NaN 90.604982 85.635493 83.439177
65 1850 22.099690 36.680000 30.103970 45.577598 NaN 95.128327 NaN 98.270928 92.231871 91.251668
66 1860 23.625775 41.760000 34.466257 52.729581 NaN 99.996226 NaN 100.013083 100.110361 100.049356

67 rows × 11 columns

Let's clean the data and merge it into a unique dataframe.

In [194]:
uk1 = uk1.loc[uk1.index.difference([0])].reset_index(drop=True)[[col for col in uk1.columns if col.find('Unnamed')==-1]]
uk2 = uk2[[col for col in uk2.columns if col.find('Unnamed')==-1]]
uk = uk1.merge(uk2)
uk.Decade = uk.Decade.astype(int)
uk['Pop England'] = uk['Pop England'].astype(float)
In [195]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='Decade', y='Pop England', data=uk.loc[uk.Decade<1730], alpha=1, label='Population', color='r')
ax2 = ax.twinx()
sns.lineplot(x='Decade', y='Real Farm Wage (1860s=100)', data=uk.loc[uk.Decade<1730], alpha=1, label='Real Wages', color='b')
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
handles, labels = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
ax2.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
nticks = 7
ax.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax2.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax.set_xlabel('Year')
ax.set_ylabel('Population (millions)')
plt.savefig(pathgraphs + 'UK-pop-GDPpc-1200-1730.pdf', dpi=300, bbox_inches='tight')
No description has been provided for this image
In [196]:
fig
Out[196]:
No description has been provided for this image
In [ ]: