Welcome back! This is the second class in our Level Up series. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.
In this session, we’ll continue to investigate a dataset with summary statistics and some basic data visualizations. We’ll be using the Python libraries NumPy, pandas, matplotlib, and Seaborn.
We’re using a fun new dataset for this session—New York City housing data. We begin by looking at summary statistics for a quantitative variable, like rent. What does the mean rent for an apartment in New York City tell us compared to the median rent? How about the trimmed mean? How does a histogram relate to those statistics? We’ll also look into the spread of the data. What can we learn from looking at the minimum, maximum, 25th percentile, and 75th percentile?
This session is particularly fun as we get to do some data investigation on the fly. As we begin plotting our data, we see some surprising irregularities. Why is there a block of apartments in New York that are 40 minutes away from the nearest subway station? Follow along as we try to solve this mystery!
Here are some Stack Overflow questions related to the work we did in today’s session:
Every Tuesday from now until March 2nd, we’ll be streaming a new session at 4PM EST. You can set a reminder for the stream for February 23rd here.
Finally, if you want even more stats content, you can sign up for the interactive course this series was based on here. This course was developed by Sophie and has many more quizzes, projects, and helpful nuggets that we can’t fit into our streams!Tags: codecademy, data science, statistics