Welcome back! This is the third class in our Level Up series on statistics with Python. If you’re just tuning in, you can catch up on what we’re doing and review the first lesson here.
In this session, we're continuing our investigation of our New York City apartment dataset and looking at the relationships between different sets of variables. For example, we'll look at how NYC rent relates to the borough you live in. How does an in-unit washer/dryer change the equation? There are a lot of interesting features that we can explore in this dataset.
Comparing summary statistics like the mean and median can help us understand how these variables are related, but we can learn even more by using visualizations. We'll look at how histograms, box plots, and scatter plots can help answer different questions about relationships between variables.
Finally, we'll show you how to create and interpret a heatmap of a correlation matrix to simultaneously understand relationships between all quantitative variables in a dataset.
Here are some StackOverflow questions related to the work we did in today's session:
Plotting A Correlation Matrix With Pandas
Changing The Colors Of The Graphs In A Pairplot
Creating A Cross-Tabulation With Percentages Rather Than Frequencies
If you enjoyed this lesson, you can catch up on the rest of the series on YouTube. If you’d like to watch a session live, follow the Codecademy YouTube channel.
Every Tuesday from now until March 2nd, we’ll be streaming a new session at 4PM EST. You can set a reminder for the stream for March 2nd here.
Finally, if you want even more stats content, you can sign up for the interactive course this series was based on here. This course was developed by Sophie and has many more quizzes, projects, and helpful nuggets that we can’t fit into our streams!