GIS with and ¶
Getting some data¶
There are many sources of GIS data. Here are some useful links:
- WorldMap
- FAO's GeoNetwork
- IPUMS USA Boundary files for Censuses
- IPUMS International Boundary files for Censuses
- GADM database of Global Administrative Areas
- Global Administrative Unit Layers
- Natural Earth: All kinds of geographical, cultural and socioeconomic variables
- Global Map
- Digital Chart of the World
- Sage and Sage Atlas
- Caloric Suitability Index CSI: Agricultural suitability data
- Ramankutti's Datasets on land use, crops, etc.
- SEDAC at Columbia Univesrity: Gridded Population, Hazzards, etc.
- World Port Index
- USGS elevation maps
- NOOA's Global Land One-km Base Elevation Project (GLOBE)
- NOOA Nightlight data: This is the data used by Henderson, Storeygard, and Weil AER 2012 paper.
- Other NOOA Data
- GEcon
- OpenStreetMap
- U.S. Census TIGER
- Geo-referencing of Ethnic Groups
See also Wikipedia links
Set-up¶
Let's import the packages we will use and set the paths for outputs.
# Let's import pandas and some other basic packages we will use
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
import os, sys
Using matplotlib backend: <object object at 0x1164f0770> %pylab is deprecated, use %matplotlib inline and import the required libraries. Populating the interactive namespace from numpy and matplotlib
# GIS packages
import geopandas as gpd
from geopandas.tools import overlay
from shapely.geometry import Polygon, Point
import georasters as gr
# Alias for Geopandas
gp = gpd
# Plotting
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()
# Paths
pathout = './data/'
if not os.path.exists(pathout):
os.mkdir(pathout)
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
os.mkdir(pathgraphs)
Initial Example -- Natural Earth Country Shapefile¶
Let's download a shapefile with all the polygons for countries so we can visualize and analyze some of the data we have downloaded in other notebooks. Natural Earth provides lots of free data so let's use that one.
For shapefiles and other polygon type data geopandas
is the most useful package. geopandas
is to GIS what pandas
is to other data. Since gepandas
extends the functionality of pandas
to a GIS dataset, all the nice functions and properties of pandas
are also available in geopandas
. Of course, geopandas
includes functions and properties unique to GIS data.
Next we will use it to download the shapefile (which is contained in a zip archive). geopandas
extends pandas
for use with GIS data. We can use many functions and properties of the GeoDataFrame
to analyze our data.
import requests
import io
#headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}
url = 'https://naturalearth.s3.amazonaws.com/10m_cultural/ne_10m_admin_0_countries.zip'
r = requests.get(url, headers=headers)
countries = gp.read_file(io.BytesIO(r.content))
#countries = gpd.read_file('https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip')
Let's look inside this GeoDataFrame
countries.head(10)
featurecla | scalerank | LABELRANK | SOVEREIGNT | SOV_A3 | ADM0_DIF | LEVEL | TYPE | TLC | ADMIN | ... | FCLASS_TR | FCLASS_ID | FCLASS_PL | FCLASS_GR | FCLASS_IT | FCLASS_NL | FCLASS_SE | FCLASS_BD | FCLASS_UA | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Admin-0 country | 0 | 2 | Indonesia | IDN | 0 | 2 | Sovereign country | 1 | Indonesia | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((117.70361 4.16341, 117.70361 4... |
1 | Admin-0 country | 0 | 3 | Malaysia | MYS | 0 | 2 | Sovereign country | 1 | Malaysia | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((117.70361 4.16341, 117.69711 4... |
2 | Admin-0 country | 0 | 2 | Chile | CHL | 0 | 2 | Sovereign country | 1 | Chile | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((-69.51009 -17.50659, -69.50611... |
3 | Admin-0 country | 0 | 3 | Bolivia | BOL | 0 | 2 | Sovereign country | 1 | Bolivia | ... | None | None | None | None | None | None | None | None | None | POLYGON ((-69.51009 -17.50659, -69.51009 -17.5... |
4 | Admin-0 country | 0 | 2 | Peru | PER | 0 | 2 | Sovereign country | 1 | Peru | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((-69.51009 -17.50659, -69.63832... |
5 | Admin-0 country | 0 | 2 | Argentina | ARG | 0 | 2 | Sovereign country | 1 | Argentina | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((-67.19390 -22.82222, -67.14269... |
6 | Admin-0 country | 3 | 3 | United Kingdom | GB1 | 1 | 2 | Dependency | 1 | Dhekelia Sovereign Base Area | ... | Admin-0 dependency | None | Admin-0 dependency | Admin-0 dependency | Admin-0 dependency | Admin-0 dependency | Admin-0 dependency | None | Admin-0 dependency | POLYGON ((33.78094 34.97635, 33.76043 34.97968... |
7 | Admin-0 country | 1 | 5 | Cyprus | CYP | 0 | 2 | Sovereign country | 1 | Cyprus | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((33.78183 34.97622, 33.78094 34... |
8 | Admin-0 country | 0 | 2 | India | IND | 0 | 2 | Sovereign country | 1 | India | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((77.80035 35.49541, 77.81533 35... |
9 | Admin-0 country | 0 | 2 | China | CH1 | 1 | 2 | Country | 1 | China | ... | None | None | None | None | None | None | None | None | None | MULTIPOLYGON (((78.91769 33.38626, 78.91595 33... |
10 rows × 169 columns
Each row contains the information for one country.
Each column is one property or variable.
Unlike pandas
DataFrame
s, geopandas
always must have a geometry
column.
Let's plot this data
%matplotlib inline
fig, ax = plt.subplots(figsize=(15,10))
countries.plot(ax=ax)
ax.set_title("WGS84 (lat/lon)", fontdict={'fontsize':34})
Text(0.5, 1.0, 'WGS84 (lat/lon)')
We can also get some additional information on this data. For example its projection
countries.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
We can reproject the data from its current WGS84 projection to other ones. Let's do this and plot the results so we can see how different projections distort results.
fig, ax = plt.subplots(figsize=(15,10))
countries_merc = countries.to_crs(epsg=3857)
countries_merc.loc[countries_merc.NAME!='Antarctica'].reset_index().plot(ax=ax)
ax.set_title("Mercator", fontdict={'fontsize':34})
Text(0.5, 1.0, 'Mercator')
countries_merc.crs
<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
cea = {'datum': 'WGS84',
'lat_ts': 0,
'lon_0': 0,
'no_defs': True,
'over': True,
'proj': 'cea',
'units': 'm',
'x_0': 0,
'y_0': 0}
cea = 'ESRI:54034'
fig, ax = plt.subplots(figsize=(15,10))
countries_cea = countries.to_crs(crs=cea)
countries_cea.plot(ax=ax)
ax.set_title("Cylindrical Equal Area", fontdict={'fontsize':34})
Text(0.5, 1.0, 'Cylindrical Equal Area')