4.1 Week 4 glossary
Here is an alphabetical list of the terms introduced this week, for quick look-up.
Programming and data analysis concepts
The bitwise operators
__&__
(and) and
__|__
(or) are used in pandas to build more complicated expressions from two comparison expressions (typically involving column comparisons).
A Boolean has one of two possible values:
__True__
or
__False__
.
A Comma Separated Values (CSV) file is a plain text file that is used to hold tabular data.
A list is a sequence of values, separated by commas, and written within square brackets.
There are six comparison operators that can be used to compare number, string and date values. Expressions composed of these operators evaluate to
__True__
or
__False__
. These operators can also be used to compare every value in a column, row by row, against some number, string or date value. When used in this manner the operators return a series of Boolean values.
The ‘dot’ notation is used to access a dataframe’s methods and attributes.
The
__Series__
data type is a collection of values with an integer index that starts from zero. Each column in a dataframe is an example of the
__Series__
data type. The
__Series__
data type has many of the same methods as the
__DataFrame__
data type.
The
__object__
data type is how pandas represents strings.
The
__datetime64__
data type is how pandas represents dates.
The
__int64__
data type is how pandas represents integers (whole numbers).
The
__float64__
data type is how pandas represents floating point numbers (decimals).
Functions and methods
__asType(aType)__
when applied to a dataframe column, the method changes the data type of each value in that column to the type given by the string
__aType__
.
__datetime(yyyy, mm, dd)__
the function takes three arguments,
__yyyy__
a four digit integer representing a year,
__mm__
a two digit integer representing a month and
__dd__
a two digit integer representing a day. From these arguments the function creates and returns a value of
__datetime64__
.
__dropna()__
when applied to a dataframe returns a new dataframe without the rows that have at least one missing value.
__head()__
gets and displays the first five rows of a dataframe. Optionally the method can take an integer argument to specify how many rows (from and including row 0) to get and display.
__iloc[index]__
gets and displays the row in the dataframe indicated by the integer argument
__index__
.
__isnull()__
is a series method that checks which rows in that series have a missing value.
__fillna(value)__
is a series method that returns a new series in which all missing values have been filled with the given value.
__plot()__
when applied to a dataframe column of numeric values, the method displays a graph of those values. The x-axis shows the dataframe’s index and the y-axis the range of the column’s values. Before the method is called you first need to execute
__%matplotlib inline__
.
__read_csv(csvFile)__
creates a dataframe from the dataset in the CSV file.
__rename(columns={oldName : newName})__
renames the column
__oldName__
to
__newName__
.
__str.rstrip(suffix)__
when applied to a dataframe column of string values, the method removes the argument
__suffix__
from the end of each string value in the column.
__tail()__
gets and displays the last five rows of a dataframe. Optionally the method can take an integer argument to specify how many rows (until and including the last row) to get and display.
__to_datetime(aSeries)__
when applied to a series, typically a column from a dataframe, this function returns a new series in which each value in
__aSeries__
has been changed to type
__datetime64__
.