code-for-a-living February 23, 2021

# Level Up: Mastering statistics with Python – part 2

Investigate a dataset with summary statistics and some basic data visualizations using the Python libraries NumPy, pandas, matplotlib, and Seaborn.

Welcome back! This is the second class in our Level Up series. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.

In this session, we’ll continue to investigate a dataset with summary statistics and some basic data visualizations. We’ll be using the Python libraries NumPy, pandas, matplotlib, and Seaborn.

We’re using a fun new dataset for this session—New York City housing data. We begin by looking at summary statistics for a quantitative variable, like rent. What does the mean rent for an apartment in New York City tell us compared to the median rent? How about the trimmed mean? How does a histogram relate to those statistics? We’ll also look into the spread of the data. What can we learn from looking at the minimum, maximum, 25th percentile, and 75th percentile?

This session is particularly fun as we get to do some data investigation on the fly. As we begin plotting our data, we see some surprising irregularities. Why is there a block of apartments in New York that are 40 minutes away from the nearest subway station? Follow along as we try to solve this mystery!

Here are some Stack Overflow questions related to the work we did in today’s session:

Finding the average of a dataframe column

Creating a histogram using matplotlib

Customizing the output from pandas describe function

If you enjoyed this lesson, you can catch up on the rest of the series on YouTube. If you’d like to watch a session live, follow the Codecademy YouTube channel.

Every Tuesday from now until March 2nd, we’ll be streaming a new session at 4PM EST. You can set a reminder for the stream for February 23rd here.

Finally, if you want even more stats content, you can sign up for the interactive course this series was based on here. This course was developed by Sophie and has many more quizzes, projects, and helpful nuggets that we can’t fit into our streams!

Tags: , , The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

## Related code-for-a-living May 22, 2021

## Level Up: Linear Regression in Python – Part 1

Linear regression is a machine learning technique for modeling continuous outcomes. It is used for both prediction and data analysis in a variety of different fields. It is also the basis for a number of other machine learning models, including logistic regression and poisson regression. For anyone who is interested in learning more about data… code-for-a-living May 29, 2021

## Level Up: Linear Regression in Python – Part 2

In the second lesson of the series, we’ll learn how to fit and interpret a simple linear regression with a categorical predictor. We’ll use a simulated dataset to predict the amount of time someone will spend on a website based on the browser they are using. We’ll also predict the rental prices of NYC apartments… code-for-a-living June 26, 2021

## Level Up: Linear Regression in Python – Part 6

In the sixth lesson of the series we’ll discuss some methods for data transformation to improve a linear regression model. In the process, we’ll learn to simulate data with known properties, review some of the assumptions of linear regression, and continue to practice our Python skills. Here are some Stack Overflow questions related to the… code-for-a-living June 6, 2021

## Level Up: Linear Regression in Python – Part 3

In the third lesson of the series, we’ll implement our first linear regression model with multiple predictors (this is called “multiple linear regression”). As an example, we’ll use a simulated dataset to predict student quiz scores. In the process, we’ll again practice our graphing and Python skills. Here are some Stack Overflow questions related to…