Loading…

Why is the Migration to Python 3 Taking So Long?

For some companies who have already made the change years ago, it won’t be an issue. However, there’s a whole range of companies who won’t be making the change anytime soon, for a number of reasons. What does this change mean for companies heavily utilizing the language, particularly those who may not be ready to migrate?

Article hero image

In Pycon in 2014, Guido van Rossum, the creator of Python and, at the time, the Benevolent Dictator For Life of the language, stood on stage in a shirt that had a large 2.8 written on it in block letters, with a big red no entry sign through it. “It’s time to move on to Python 3,” he said, telling the audience that they should start adopting the new version of the language into their workflows.

After many years of hard work towards that goal from the core committers, and surrounding community of libraries, Python 2 is finally at end of life. January 1, 2020, according to pythonclock.org, is the drop-dead date for support of Python 2.

For some companies who have already made the change years ago, it won’t be an issue. However, there’s a whole range of companies who won’t be making the change anytime soon, for a number of reasons.

What does this change mean for companies heavily utilizing the language, particularly those who may not be ready to migrate? To understand the entire context of what’s going on, let’s take a stroll back through Python history.

A brief history of Python

The idea behind developing Python 3 was to implement a single big change that got rid of a legacy problem in Python: rendering all strings as Unicode behind the scenes. As Brett Cannon, one of the core developers of Python, writes,

People sometimes forget how old Python is; Guido started coding Python in December 1989 and was first released as open source in February 1991. This means that Python itself predates the first volume of the Unicode standard which came out in October 1991. Over the intervening years, languages created after Unicode standardized chose to base their implementation for strings on encodings that could support Unicode.

….

Supporting Unicode and text from any written language is important. Python is a language for the world, not just for those languages that support the Roman alphabet that ASCII covers. This is why Python 3 makes it "Unicode or bust" when it comes to text; it guarantees that all Python 3 code will support everyone in the world whether the developer who wrote the code explicitly meant for it to or not.

Unfortunately, the team assumed that everyone would make the big switchover immediately, and made Python 3 backwards incompatible, and set 2 as a maintenance branch. However, many people didn’t want to switch, because, as the PEP for the improvement said, Python 3 was “a relatively mild improvement on Python 2.” Many people didn’t switch for what they perceived to be as mostly an inconvenience. At that time, the largest difference was changing of the print statement to Python function syntax, which broke a lot of code.

As a result, Python 2 continued to be in active development. In 2019, though, Python 3 has finally (mostly) become the default version of the language for new Python development, and many companies and projects are using the top features of Python 3: f-strings, Path, type hints, asyncio, and, of course, Unicode rendering.

A slow adoption process

It’s been a long road to get to Python 3 adoption since the new major version was announced all the way back in 2008. Dustin shows just how long adoption has taken:

At first, there were a lot of good reasons for not adopting Python 3: Most importantly, it wasn’t backwards compatible with Python 2. As a result, major libraries were hesitant to move to the platform, and in a self-fulfilling prophecy, and it was hard to port the code with a lack of supporting tools (eventually solved with things like 2to3 and six).

The tipping point for conversion occurred somewhere around 2016 or so with the release of Python 3.5, which featured matrix multiplication, the introduction of asyncio, speed improvements to OrderedDict, and an implementation of type hints that brought some static language-like features to Python.

Later versions include even more features, like the Pathlib library and f-string manipulation. With these changes, many libraries that people use (like scikit-learn for machine learning) started their migrations to Python 3.

As more and more dependencies started upgrading, companies started moving, too.

So now that we’re close to the end, what does the cutoff of Python 2 from development mean for the ecosystem of developers dependent on it?

Judging by the state of things on the internet, you’d have thought that everyone completed their migrations. In a survey from Jetbrains, who makes IDEs like IntelliJ and PyCharm, 75% of individual respondents had indicated that they’ve already migrated. A flurry of blog posts have shown the same. For example, Dropbox detailed their migration in the fall of 2018. Instagram migrated in 2017. Facebook started in 2014. Splunk, at the urging of their customers, also did so recently.

However, just because Python 2 is reaching end of life doesn’t mean companies will stop using it overnight. How do we know there’s still significant energy being invested into Python 2? We can check out what’s going on directly with PyPI, the Python package library. In 2016, the core developers behind PyPI started sending logs to Google’s BigQuery, for the ability to run SQL against them, which makes it much easier to make architectural decisions based on usage.

For example, if you want to see which libraries have been downloaded, by Python version, over the last 30 days, you can create a new project in BigQuery (the first 1 TB queried per month is free), and run:

 SELECT
 REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)") as python_version,
 COUNT(*) as download_count,
FROM
 TABLE_DATE_RANGE(
  [the-psf:pypi.downloads],
  DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"),
  DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
 )
GROUP BY
 python_version,
ORDER BY
 download_count DESC
LIMIT 100

Even though Python 3 has been the dominant version in the community for at least a year, the latest count of individual package downloads from PyPI shows that at least 40% of all packages for September of 2019 are 2.7 downloads. Granted, this is down from 60% at the beginning of the year, but is still significant given that EOL is only several months away.

On a per-library basis, it gets a little trickier: Most Flask downloads are completed using the Python 3 version, but only 26% of botocore downloads (the AWS SDK for Python) are using Python 3.

And, there are several libraries that are going to hold off with the migration: Twisted, a web framework, which has only partially been ported, and PyPy, a frequently-used JIT compiler, which will keep version 2 around indefinitely.

End of life for any given piece of software usually doesn’t mean that software is no longer available. It does mean that it’s no longer updated against any security vulnerabilities or adding any further bug fixes. The Python PEP regulating the end of life (the language spec) specifies that,

This declaration does not guarantee that bugfix releases will be made on a regular basis, but it should enable volunteers who want to contribute bugfixes for Python 2.7 and it should satisfy vendors who still have to support Python 2 for years to come.

But, there are a lot of risks with not updating to Python 3—most importantly, the risk of losing security updates, not taking advantage of new features like type hints, and speed gains.

Why the adoption rate is so slow

So why aren’t we at a higher adoption rate this close to the deadline?

In a tongue-in-cheek post, I wrote that IT runs on Java 8.(which is ancient by today’s standards)

Java 8 is still the dominant development environment, according to the JVM ecosystem report of 2018.

This holds the answer: most large organizations, outside of the hype cycle of technical news posts, move much more slowly than the press or blogs would have you think. Most major banks are still running some variation of FORTRAN and COBOL under the covers, for example.

So while many companies are outlining their migration strategies, just the same or larger amount will stay on Python 2 for a long time. Why is this the case?

In reading the accounts of people who have already migrated, it’s easy to see: migrating codebases takes a long time, is a highly political decision, and experiences inertia, even in the companies that are the highest tech, with the best intentions.

For example, in order to move to Python 3 at Facebook, Jason Fried started by rewriting a service in 2014. Along the way, he made a lot of mistakes, changed a lot of code, and did a lot of finagling to make it known that people were moving to Python 3 at Facebook by doing things like including himself in on new developer trainings. He then teamed up with Łukasz Langa, who had done the Instagram conversion to Python 3:

In 2016, he and Langa formed a brand new team in Facebook to shepherd Python within the company, which they dubbed "The Ministry of Silly Walks." Because they were "the Python team," the "perceived authority" he mentioned earlier worked; people assumed they could make decisions about Python at Facebook.

Instagram’s move itself took 10 months. Dropbox, where Guido and Langa now work, took three years, and as of Guido’s retirement several weeks ago, is still in progress. Granted, all of these are enormous codebases, but you have to wonder: if it takes that long with the top people in Python working on it, how long would it take for a regular company, maybe one where Python is not even the primary language?

In all the cases, politics played just as important of a role as technical direction.

Second, security concerns are a problem. Ironically, you would assume that not upgrading would be the bigger risk, but in larger organizations, many people are not allowed to upgrade Python by themselves: the admin or security team pushes updates to them. In some cases, PIP downloads are also not allowed. If Python 2 is the default agreed-upon by the security team, it can take a monumental effort to convince people to make the switch to 3, particularly in settings that are heavily regulated (such as healthcare or finance), and government.

This brings us to the third reason: inertia. Although many versions of Linux, such as RHEL, for example, are including Python 3 alongside Python 2, it is by no means the default, and in switching between 2 and 3, some bugs are constantly being found, especially with pointers to system versions of Python, for example, at Debian.

Python’s been through a long path to move from 2 to 3, and individuals and forward-looking startups have adopted it. Now the second great migration will occur when large enterprises start their migrations away from 2. With regards to Python 2, we’ll see that 40% number shrink further in 2020, but the changes will be incremental, and there will be companies running Python 2.7 well into the future.

Login with your stackoverflow.com account to take part in the discussion.