code-for-a-living February 23, 2021

Level Up: Mastering statistics with Python – part 2

Investigate a dataset with summary statistics and some basic data visualizations using the Python libraries NumPy, pandas, matplotlib, and Seaborn.

Welcome back! This is the second class in our Level Up series. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.

In this session, we’ll continue to investigate a dataset with summary statistics and some basic data visualizations. We’ll be using the Python libraries NumPy, pandas, matplotlib, and Seaborn. 

We’re using a fun new dataset for this session—New York City housing data. We begin by looking at summary statistics for a quantitative variable, like rent. What does the mean rent for an apartment in New York City tell us compared to the median rent? How about the trimmed mean? How does a histogram relate to those statistics? We’ll also look into the spread of the data. What can we learn from looking at the minimum, maximum, 25th percentile, and 75th percentile?

This session is particularly fun as we get to do some data investigation on the fly. As we begin plotting our data, we see some surprising irregularities. Why is there a block of apartments in New York that are 40 minutes away from the nearest subway station? Follow along as we try to solve this mystery!

