4 Summary

Figure 7

An images of an elderly black couple riding their bicycles in a park This week you transformed and combined databy:

  • computing a correlation coefficient between two series of values and checking whether the correlation is statistically significantcomputing a correlation coefficient between two series of values and checking whether the correlation is statistically significant
  • generating scatterplots to ook for other relationshipsgenerating scatterplots to ook for other relationships
  • using a logarithmic scale when an indicator had a wide range of values.using a logarithmic scale when an indicator had a wide range of values.

Next week you’ll learn how to group, export and import data to generate pivot table style reports.

4.1 Weeks 5 and 6 glossary

Here are alphabetical lists, for quick look up, of what this week introduced.


Concepts

A conditional statement is of the form


if condition1:
     statements1
elif condition2:
     statements2
...
else:
     statements

The computer evaluates the conditions from top to bottom and executes only the statements for the first condition that is true. If all conditions are false, it executes the

__else__

statements. If there is no

__else__

part nothing happens. The

__elif__

parts are optional too. Each block of statements must be indented, usually by four spaces.

A constant is a variable that is assigned only once, i.e. its initial value never changes. Constant names are conventionally written in uppercase, with underscores to separate multiple words.

A function definition is typically of the form



def functionName (argumentName1, argumentName2,...):

statements using arguments to compute the result
return result

All statements in the body of the function must have the same indentation, usually four spaces. The statements use the arguments like normal variables. The execution of the function ends when a return statement is encountered.

A join is the merging of two tables on a common column. The resulting table has all columns of both tables (the common column isn’t duplicated), and the rows are determined by the type of join. Rows in the two tables that have the same value in the common column become a joined row in the resulting table.

In a logarithmic scale , each major tick represents a value that is the multiplication by some constant (usually 10) of the value of the previous major tick.

A method chain is an expression like

__ context.method1(args1).method2(args2).method3(args3) __

where each method has and returns the same type of context, except possibly the last method, which can return any type of value.

The p-value is an indication of the significance of the result. Usually a p-value below 0.05 is taken to mean the result is statistically significant.

A return statement is of the form

__return expression__

and passes the value of the expression back to the code that called the function to which the return statement belongs.

The Spearman rank correlation coefficient of two series of values (e.g. two columns) is a number from -1 (perfect inverse correlation) to 1 (perfect direct correlation), with 0 meaning there is no rank correlation. Correlation doesn’t imply causation. A rank correlation of 1 merely states that both values increase and decrease together, while a correlation of -1 states that if one value increases, the other decreases.

A test is some code that checks whether some other code works as expected, e.g. a boolean expression that compares the return value of a function call with the expected value.



Reserved Words

__def, elif, else,__

if and

__return__

cannot be used as names.



Functions and methods

__col.apply(functionName)__

returns a new column, obtained by applying the given one-argument function to each cell in column

__col__

.

__DataFrame(columns=listOfStrings, data=listOfLists)__

returns a new dataframe, given the data as a list of rows, each row being a list of values in column order.

__ download(indicator=string, country='all', start=number, end=number) __

is a function in the pandas.io.wb module that downloads the World Bank data for the given indicator and all countries and country groups from the given start year to the given end year.

__ merge(left=frame1, right=frame2, on=columnName, how=string) __

returns a new dataframe, obtained by joining the two frames on the columns with the given common name. The

__how__

argument can be one of

__‘left’, ‘right’, ‘inner’__

and

__'outer’.__
__print()__

is a Python function that takes one or more expressions and prints their values on the screen in a single line.

__frame.reset_index()__

returns a new dataframe in which rows are labelled from 0 onwards.

__spearmanr()__

is a function in the scipy.stats module that takes two columns and returns a pair of numbers: the Spearman rank correlation coefficient of the two series of values, and its p-value.