{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 Joining left, right and centre"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s take stock for a moment. There’s the original, unchanged table (with full country names) about the life expectancy:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "life"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Russia</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">India</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">66</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">81</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "… and a table with the GDP in millions of pounds and also full country names.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "gdp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>GDP (£m)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1711727</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">10716029</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5905202</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Brazil</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1435148</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">South Africa</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">233937</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Both tables have a common column with a common name (‘Country name’). I can __join__ the two tables on that common column, using the "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__merge()__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " function. Merging basically puts all columns of the two tables together, without duplicating the common column, and joins any rows that have the same value in the common column.\n",
    "\n",
    "There are four possible ways of joining, depending on which rows I want to include in the resulting table. If I want to include only those countries appearing in the GDP table, I call the "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__merge()__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " function like so:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "merge(gdp, life, on='Country name', how='left')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1711727</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">81</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">10716029</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5905202</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Brazil</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1435148</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">South Africa</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">233937</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "The first two arguments are the tables to be merged, with the first table being called the ‘left’ table and the second being the ‘right’ table. The "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__on__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " argument is the name of the common column, i.e. both tables must have a column with that name. The "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__how__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " argument states I want a __left join__ , i.e. the resulting rows are dictated by the left (GDP) table. You can easily see that India and Russia, which appear only in the right (expectancy) table, don’t show up in the result. You can also see that Brazil and South Africa, which appear only in the left table, have an undefined life expectancy. (Remember that ‘NaN’ stands for ‘not a number.)\n",
    "\n",
    "A __right join__ will instead take the rows from the right table, and add the columns of the left table. Therefore, countries not appearing in the left table will have undefined values for the left table’s columns:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "merge(gdp, life, on='Country name', how='right')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1711727</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">81</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">10716029</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5905202</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Russia</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">India</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">66</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "The third possibility is an __outer join__ which takes all countries, i.e. whether they are in the left or right table. The result has all the rows of the left and right joins:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "merge(gdp, life, on='Country name', how='outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1711727</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">81</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">10716029</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5905202</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Brazil</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1435148</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">South Africa</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">233937</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__5__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Russia</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__6__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">India</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">66</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "The last possibility is an __inner join__ which takes only those countries common to both tables, i.e. for which I know the GDP *and* the life expectancy. That’s the join I want, to avoid any undefined values:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "gdpVsLife = merge(gdp, life, on='Country name', how='inner')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>Country name</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United Kingdom</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1711727</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">81</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">United States</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">10716029</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">China</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5905202</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Now it’s just a matter of applying the data transformation and combination techniques seen so far to the real data from the World Bank."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 5 Joining left, right and centre"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question\n",
    "\n",
    "Put your learning into practice by completing Exercise 5 in the Exercise notebook 3.\n",
    "\n",
    "Remember to run the existing code in the notebook before you start the exercise. When you’ve completed the exercise, save the notebook.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 Constant variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may have noticed that the same column names appear over and over in the code.\n",
    "\n",
    "If, someday, I decide one of the new columns should be called ‘GDP (million GBP)’ instead of ‘GDP (£m)’ to make clear which currency is meant (because various countries use the pound symbol), I need to change the string in every line of code it occurs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](https://www.open.edu/openlearn/ocw/pluginfile.php/1393338/mod_oucontent/oucontent/71687/ou_futurelearn_learn_to_code_fig_1053.jpg)\n",
    "__Figure 1__\n",
    "\n",
    "An abstract image of different coloured vertical strips with a column of numbers through each. The strips are distorted by an arrow moving horizontally through them \n",
    "Laziness is the mother of invention. If I assign the string to a variable and then use the variable everywhere instead of the string, whenever I wish to change the string, I only have to edit one line of code, where it’s assigned to the variable. A second advantage of using names instead of values is that I can use the name completion facility of Jupyter notebooks by pressing ‘TAB’. Writing code becomes much faster…\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "\n",
    "gdpInGbp = 'GDP (million GBP)'\n",
    "gdpInUsd = 'GDP (US$)'\n",
    "country = 'Country name'\n",
    "gdp[gdpInGbp] = gdp[gdpInUsd].apply(usdToGbp)\n",
    "headings = [country, gdpInGbp]\n",
    "gdp = gdp[headings]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Such variables are meant to be assigned once. They are called __constants__ , because their value never changes. However, if someone else takes my code and wishes to adapt and extend it, they may not realise those variables are supposed to remain constant. Even I may forget it and try to assign a new value further down in the code! To help prevent such slip-ups the Python convention is to write names of constants in uppercase letters, with words separated by underscores. Thus, any further assignment to a variable in uppercase will ring an alarm bell (in your head, the computer remains silent).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "\n",
    "GDP_GBP = 'GDP (million GBP)'\n",
    "GDP_USD = 'GDP (US$)'\n",
    "COUNTRY = 'Country name'\n",
    "gdp[GDP_GBP] = gdp[GDP_USD].apply(usdToGbp)\n",
    "headings = [COUNTRY, GDP_GBP]\n",
    "gdp = gdp[headings]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using constants is not just a matter of laziness. There are various advantages. First, constants stand out in the code.\n",
    "\n",
    "Second, when making changes to the repeated values throughout the code, it’s easy to miss an occurrence. Using constants means the code is always consistent throughout.\n",
    "\n",
    "Third, the name of the constant can help clarify what the value means. For example, instead of using the number 1995 throughout the code, define a constant that makes clear whether it’s a year, the cubic centimetres of a car engine or something else.\n",
    "\n",
    "To sum up, using constants makes the code clearer, easier to change, and less prone to silly (but hard to find) mistakes due to inconsistent values.\n",
    "\n",
    "Any value can be defined as a constant, whether it’s a string, a number or even a dataframe. For example, you could store the data you have loaded from the file into a constant, as a reminder to not change the original data. In the rest of the week, I’ll use constants mainly for the column names."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 6 Constants"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question\n",
    "\n",
    "To practise using constants, rewrite your exercises in the Exercise notebook 3 using them.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 Getting real"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Having tried out the data transformations and combination on small tables, I feel confident about using the full data from the World Bank, which I pointed you to in Life expectancy project.\n",
    "\n",
    "Open a new browser window and go to the World Bank’s [data page](http://data.worldbank.org/). Type ‘GDP’ (without the quote marks) in the ‘Find an indicator’ box in the centre of the page and select ‘GDP current US$’. Click ‘Go’. This will take you to the data page you looked at earlier. Look at the top of your browser window. You will notice the URL is [http://data.worldbank.org/indicator/NY.GDP.MKTP.CD](http://data.worldbank.org/indicator/NY.GDP.MKTP.CD). Every World Bank dataset is for an indicator (in this case GDP in current dollars) with a unique name (in this case NY.GDP.MKTP.CD).\n",
    "\n",
    "Knowing the indicator name, it’s a doddle to get the data directly into a dataframe, by using the "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__download()__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " function of the "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__wb__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " (World Bank) module, instead of first downloading a CSV or Excel file and then loading it into a dataframe. (Note that CoCalc’s free plan doesn’t allow connecting to other sites, so if you are using CoCalc you’ll need to download the data as a CSV or Excel file from the World Bank and upload it to CoCalc.)\n",
    "\n",
    "Here’s the code to get the 2013 GDP values for all countries. It may take a little while for the code to fetch the data.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "from pandas.io.wb import download \n",
    "YEAR = 2013\n",
    "GDP_INDICATOR = 'NY.GDP.MKTP.CD'\n",
    "data = download(indicator=GDP_INDICATOR, country='all',\n",
    "              start=YEAR, end=YEAR)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th></th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<th>__country__</th>\n",
    "<th>year</th>\n",
    "<th></th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__Arab World__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2013__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.843483e+12</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__Caribbean small states__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2013__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">6.680344e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__Central Europe and the Baltics__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2013__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.418166e+12</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__East Asia &amp; Pacific (all income levels)__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2013__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.080794e+13</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__East Asia &amp; Pacific (developing only)__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2013__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.168563e+13</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "This table definitely has an odd shape. The three columns don’t have their headings side by side, and the row numbering (0, 1, 2, etc) is missing. That’s because the first two ‘columns’ are in fact the dataframe index. You saw a similar table in Changing a dataframe’s index , when the index of the weather dataframe was set to be the ‘GMT’ column, with values of type "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__datetime64__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ". There’s a dataframe method to do the inverse, i.e. to transform the row names into column values and thereby reinstate the default dataframe index.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "gdp = data.reset_index()\n",
    "gdp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Arab World</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.843483e+12</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Caribbean small states</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">6.680344e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Central Europe and the Baltics</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.418166e+12</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">East Asia &amp; Pacific (all income levels)</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.080794e+13</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">East Asia &amp; Pacific (developing only)</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.168563e+13</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "I repeat the whole process for the life expectancy:\n",
    "* search for ‘life expectancy’ on the World Bank sitesearch for ‘life expectancy’ on the World Bank site\n",
    "* choose the ‘total’ dataset, which includes both female and male inhabitantschoose the ‘total’ dataset, which includes both female and male inhabitants\n",
    "* note down its indicator (SP.DYN.LE00.IN)note down its indicator (SP.DYN.LE00.IN)\n",
    "* use it to get the datause it to get the data\n",
    "* reset the dataframe index.reset the dataframe index.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "LIFE_INDICATOR = 'SP.DYN.LE00.IN'\n",
    "data = download(indicator=LIFE_INDICATOR, country='all',\n",
    "              start=YEAR, end=YEAR)\n",
    "life = data.reset_index()\n",
    "life.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Arab World</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">70.086392</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Caribbean small states</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.966306</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Central Europe and the Baltics</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">76.127583</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">East Asia &amp; Pacific (all income levels)</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">74.893439</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">East Asia &amp; Pacific (developing only)</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">73.981255</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "By defining the year as a constant, it’s very quick to change the code to load both datasets for any other year. If you wish to get GDP data for an earlier year than for life expectancy, then you need to define a second constant."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 7 Getting real"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question\n",
    "\n",
    "The approach described above requires an internet connection to download the data directly from the World Bank. That may require some time, or sometimes not even work if the connection fails. Moreover, the World Bank sometimes changes its data format, which could break the code in the rest of this week.\n",
    "\n",
    "Therefore, the Exercise notebook 3 loads instead the GDP and life expectancy data from files WB GDP 2013.csv and WB LE 2013.csv and Exercise 7 uses the file WB POP 2013.csv , which you should add to your disk folder or CoCalc project. All files are in the normal tabular format and need no resetting of the indices.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 Cleaning up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may have noticed that the initial rows are not about countries, but groups of countries. Such aggregated values need to be removed, because we’re only interested in individual countries.\n",
    "\n",
    "The expression "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__frame[m:n],__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " with "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__n__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " an integer bigger than "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__m__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " , represents the ‘sub-table’ from row "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__m__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " to row "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__n-1__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ". In other words, it is a slice of frame with exactly "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__n__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " minus "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__m__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " rows. The expression is equivalent to the more convoluted expression "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__frame.head(n).tail(n-m)__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ".\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "gdp[0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Arab World</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.843483e+12</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Caribbean small states</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">6.680344e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Central Europe and the Baltics</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.418166e+12</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "To slice all rows from "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__m__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " onwards, you don’t have to count how many rows there are beforehand, just omit "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__n__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ".\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "gdp[240:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__240__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Uzbekistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">5.679566e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__241__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Vanuatu</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">8.017876e+08</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__242__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Venezuela, RB</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.713366e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__243__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Vietnam</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.712220e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__244__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Virgin Islands (U.S.)</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__245__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">West Bank and Gaza</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.247600e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__246__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Yemen, Rep.</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.595450e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__247__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Zambia</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.682081e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__248__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Zimbabwe</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.349023e+10</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "By trying out "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__head(m)__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " for different values of "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "__m__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " , I find that the list of individual countries starts in row number 34, with Afghanistan. Hence, I slice from row 34 onwards, and that’s my new dataframe.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "gdp = gdp[34:]\n",
    "gdp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__37__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">American Samoa</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">NaN</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__38__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Andorra</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.249101e+09</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Unsurprisingly, there is missing data, so I remove those rows, as shown in Missing values in Week 4.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "gdp = gdp.dropna()\n",
    "gdp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>year</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__38__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Andorra</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.249101e+09</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__39__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2013</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Finally, I drop the irrelevant year column.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "COUNTRY = 'country'\n",
    "headings = [COUNTRY, GDP_INDICATOR]\n",
    "gdp = gdp[headings]\n",
    "gdp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__38__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Andorra</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.249101e+09</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__39__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "And now I repeat the whole cleaning process for the life expectancy table.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "headings = [COUNTRY, LIFE_INDICATOR]\n",
    "life = life[34:].dropna()[headings]\n",
    "life.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">60.931415</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">77.537244</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.009659</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__39__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">51.866171</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__40__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75.829293</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Note how a single line of code can chain a row slice, a method call and a column slice, because each takes a dataframe and returns a dataframe."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 8 Cleaning up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question\n",
    "\n",
    "Clean up the population data from Exercise 7, in Exercise 8 in the exercise notebook 3.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.4 Joining and transforming"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the little tables, I first transformed the columns and then joined the tables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](https://www.open.edu/openlearn/ocw/pluginfile.php/1393338/mod_oucontent/oucontent/71687/ou_futurelearn_learn_to_code_fig_1054.jpg)\n",
    "__Figure 2__\n",
    "\n",
    "An image of a bride and groom holding hands with the minister between them in the background \n",
    "As you may be starting to realise, there’s often more than one way to do it. Just for illustration, I’ll do the other way round for the big tables. Here are the tables, as a reminder.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "life.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">60.931415</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">77.537244</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.009659</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__39__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">51.866171</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__40__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75.829293</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "gdp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__34__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__35__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__36__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__38__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Andorra</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">3.249101e+09</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__39__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "First, an inner join on the common column to combine rows where the common column value appears in both tables.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "gdpVsLife = merge(gdp, life, on='country', how='inner')\n",
    "\n",
    "gdpVsLife.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">60.931415</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">77.537244</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.009659</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">51.866171</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.200588e+09</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75.829293</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Second, the dollars are converted to millions of pounds.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "GDP = 'GDP (£m)'\n",
    "column = gdpVsLife[GDP_INDICATOR]\n",
    "\n",
    "gdpVsLife[GDP] = column.apply(usdToGbp).apply(roundToMillions)\n",
    "\n",
    "gdpVsLife.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "<th>GDP (£m)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">60.931415</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">12980</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">77.537244</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">8255</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.009659</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">134322</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">51.866171</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79349</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.200588e+09</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75.829293</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">767</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Third, the life expectancy is rounded to the nearest integer, with a by now familiar function.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "LIFE = 'Life expectancy (years)'\n",
    "\n",
    "gdpVsLife[LIFE] = gdpVsLife[LIFE_INDICATOR].apply(round)\n",
    "\n",
    "gdpVsLife.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>NY.GDP.MKTP.CD</th>\n",
    "<th>SP.DYN.LE00.IN</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.031088e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">60.931415</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">12980</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">61</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.291667e+10</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">77.537244</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">8255</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">78</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">2.101834e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71.009659</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">134322</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.241632e+11</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">51.866171</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79349</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">52</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">1.200588e+09</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">75.829293</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">767</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">76</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "Lastly, the original columns are discarded.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "__In []:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "headings = [COUNTRY, GDP, LIFE]\n",
    "gdpVsLife = gdpVsLife[headings]\n",
    "gdpVsLife.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "__Out[]:__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table xmlns:str=\"http://exslt.org/strings\">\n",
    "<caption></caption>\n",
    "<tbody>\n",
    "<tr>\n",
    "<th></th>\n",
    "<th>country</th>\n",
    "<th>GDP (£m)</th>\n",
    "<th>Life expectancy (years)</th>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__0__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Afghanistan</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">12980</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">61</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__1__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Albania</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">8255</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">78</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__2__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Algeria</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">134322</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">71</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__3__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Angola</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">79349</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">52</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">__4__</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">Antigua and Barbuda</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">767</td>\n",
    "<td class=\"highlight_\" rowspan=\"\" colspan=\"\">76</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "For the first five countries there doesn’t seem to be any relation between wealth and life expectancy, but that might be just for those countries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 9 Joining and transforming"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Question\n",
    "\n",
    "Have a go at merging dataframes with an inner join in Exercise 9 in the Exercise notebook 3.\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  },
  "kernelspec": {
   "display_name": "",
   "language": "",
   "name": ""
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}