code-for-a-living March 1, 2021

# Level Up: Mastering statistics with Python – part 3

Comparing summary statistics like the mean and median can help us understand how these variables are related, but we can learn even more by using visualizations.

Welcome back! This is the third class in our Level Up series on statistics with Python. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.

In this session, we’re continuing our investigation of our New York City apartment dataset and looking at the relationships between different sets of variables. For example, we’ll look at how NYC rent relates to the borough you live in. How does an in-unit washer/dryer change the equation? There are a lot of interesting features that we can explore in this dataset.

Comparing summary statistics like the mean and median can help us understand how these variables are related, but we can learn even more by using visualizations. We’ll look at how histograms, box plots, and scatter plots can help answer different questions about relationships between variables.

Finally, we’ll show you how to create and interpret a heatmap of a correlation matrix to simultaneously understand relationships between all quantitative variables in a dataset.

Here are some StackOverflow questions related to the work we did in today’s session:

Plotting A Correlation Matrix With Pandas

Changing The Colors Of The Graphs In A Pairplot

Creating A Cross-Tabulation With Percentages Rather Than Frequencies

If you enjoyed this lesson, you can catch up on the rest of the series on YouTube. If you’d like to watch a session live, follow the Codecademy YouTube channel.

Every Tuesday from now until March 2nd, we’ll be streaming a new session at 4PM EST. You can set a reminder for the stream for March 2nd here.

Finally, if you want even more stats content, you can sign up for the interactive course this series was based on here. This course was developed by Sophie and has many more quizzes, projects, and helpful nuggets that we can’t fit into our streams!

Tags: , The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

## Related code-for-a-living May 22, 2021

## Level Up: Linear Regression in Python – Part 1

Linear regression is a machine learning technique for modeling continuous outcomes. It is used for both prediction and data analysis in a variety of different fields. It is also the basis for a number of other machine learning models, including logistic regression and poisson regression. For anyone who is interested in learning more about data… code-for-a-living May 29, 2021

## Level Up: Linear Regression in Python – Part 2

In the second lesson of the series, we’ll learn how to fit and interpret a simple linear regression with a categorical predictor. We’ll use a simulated dataset to predict the amount of time someone will spend on a website based on the browser they are using. We’ll also predict the rental prices of NYC apartments… code-for-a-living June 26, 2021

## Level Up: Linear Regression in Python – Part 6

In the sixth lesson of the series we’ll discuss some methods for data transformation to improve a linear regression model. In the process, we’ll learn to simulate data with known properties, review some of the assumptions of linear regression, and continue to practice our Python skills. Here are some Stack Overflow questions related to the… code-for-a-living June 19, 2021

## Level Up: Linear Regression in Python – Part 5

In the fifth lesson of the series we’ll learn how to build more flexible linear models by adding interaction and polynomial terms. We’ll fit and inspect our models both mathematically and visually to understand how they work. In the process, we’ll continue to practice our Python skills and discuss some of the merits (and drawbacks)…