code-for-a-living March 1, 2021

# Level Up: Mastering statistics with Python – part 3

Comparing summary statistics like the mean and median can help us understand how these variables are related, but we can learn even more by using visualizations.

Welcome back! This is the third class in our Level Up series on statistics with Python. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.

In this session, we’re continuing our investigation of our New York City apartment dataset and looking at the relationships between different sets of variables. For example, we’ll look at how NYC rent relates to the borough you live in. How does an in-unit washer/dryer change the equation? There are a lot of interesting features that we can explore in this dataset.

Comparing summary statistics like the mean and median can help us understand how these variables are related, but we can learn even more by using visualizations. We’ll look at how histograms, box plots, and scatter plots can help answer different questions about relationships between variables.

Finally, we’ll show you how to create and interpret a heatmap of a correlation matrix to simultaneously understand relationships between all quantitative variables in a dataset.

Here are some StackOverflow questions related to the work we did in today’s session:

Plotting A Correlation Matrix With Pandas

Changing The Colors Of The Graphs In A Pairplot

Creating A Cross-Tabulation With Percentages Rather Than Frequencies

If you enjoyed this lesson, you can catch up on the rest of the series on YouTube. If you’d like to watch a session live, follow the Codecademy YouTube channel.

Every Tuesday from now until March 2nd, we’ll be streaming a new session at 4PM EST. You can set a reminder for the stream for March 2nd here.

Finally, if you want even more stats content, you can sign up for the interactive course this series was based on here. This course was developed by Sophie and has many more quizzes, projects, and helpful nuggets that we can’t fit into our streams!

Tags: ,
The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

## Related

code-for-a-living July 24, 2021

## Level Up: Linear Regression in Python – Part 8

In the eighth and final lesson of the series we’ll practice everything we’ve learned to date by loading and preparing some data, fitting a few different models, and comparing them. This is an open-ended session where we’ll examine some questions from our audience and demonstrate what a data analysis workflow could look like. If you…
code-for-a-living November 15, 2021

## Building a QA process for your deep learning pipeline in practice

Deep learning models still need testing, but many of the common testing approaches don't apply. But with the right methods, you can still make sure your pipeline produces good results.
code-for-a-living July 3, 2021

## Level Up: Linear Regression in Python – Part 7

In the seventh lesson of the series we’ll discuss some methods for comparing linear regression models. In the process, we’ll learn about the problem of overfitting and investigate some of the pros and cons of various evaluation methods (such as R-squared, adjusted R-squared, log likelihood, AIC, and BIC). We’ll also continue to practice our Python…
code-for-a-living July 19, 2021

## Privacy is an afterthought in the software lifecycle. That needs to change.

The key to combining privacy and innovation is baking it into the SDLC. Analogous to application security's (AppSec) upstream shift into the development cycle, privacy belongs at the outset of development, not as an afterthought. Here's why.