Why is the Migration to Python 3 Taking So Long?
In Pycon in 2014, Guido van Rossum, the creator of Python and, at the time, the Benevolent Dictator For Life of the language, stood on stage in a shirt that had a large 2.8 written on it in block letters, with a big red no entry sign through it. “It’s time to move on to Python 3,” he said, telling the audience that they should start adopting the new version of the language into their workflows.
After many years of hard work towards that goal from the core committers, and surrounding community of libraries, Python 2 is finally at end of life. January 1, 2020, according to pythonclock.org, is the drop-dead date for support of Python 2.
For some companies who have already made the change years ago, it won’t be an issue. However, there’s a whole range of companies who won’t be making the change anytime soon, for a number of reasons.
What does this change mean for companies heavily utilizing the language, particularly those who may not be ready to migrate? To understand the entire context of what’s going on, let’s take a stroll back through Python history.
A brief history of Python
The idea behind developing Python 3 was to implement a single big change that got rid of a legacy problem in Python: rendering all strings as Unicode behind the scenes. As Brett Cannon, one of the core developers of Python, writes,
People sometimes forget how old Python is; Guido started coding Python in December 1989 and was first released as open source in February 1991. This means that Python itself predates the first volume of the Unicode standard which came out in October 1991. Over the intervening years, languages created after Unicode standardized chose to base their implementation for strings on encodings that could support Unicode.
Supporting Unicode and text from any written language is important. Python is a language for the world, not just for those languages that support the Roman alphabet that ASCII covers. This is why Python 3 makes it “Unicode or bust” when it comes to text; it guarantees that all Python 3 code will support everyone in the world whether the developer who wrote the code explicitly meant for it to or not.
Unfortunately, the team assumed that everyone would make the big switchover immediately, and made Python 3 backwards incompatible, and set 2 as a maintenance branch. However, many people didn’t want to switch, because, as the PEP for the improvement said, Python 3 was “a relatively mild improvement on Python 2.” Many people didn’t switch for what they perceived to be as mostly an inconvenience. At that time, the largest difference was changing of the print statement to Python function syntax, which broke a lot of code.
As a result, Python 2 continued to be in active development. In 2019, though, Python 3 has finally (mostly) become the default version of the language for new Python development, and many companies and projects are using the top features of Python 3: f-strings, Path, type hints, asyncio, and, of course, Unicode rendering.
A slow adoption process
It’s been a long road to get to Python 3 adoption since the new major version was announced all the way back in 2008. Dustin shows just how long adoption has taken:
At first, there were a lot of good reasons for not adopting Python 3: Most importantly, it wasn’t backwards compatible with Python 2. As a result, major libraries were hesitant to move to the platform, and in a self-fulfilling prophecy, and it was hard to port the code with a lack of supporting tools (eventually solved with things like 2to3 and six).
The tipping point for conversion occurred somewhere around 2016 or so with the release of Python 3.5, which featured matrix multiplication, the introduction of asyncio, speed improvements to OrderedDict, and an implementation of type hints that brought some static language-like features to Python.
Later versions include even more features, like the Pathlib library and f-string manipulation. With these changes, many libraries that people use (like scikit-learn for machine learning) started their migrations to Python 3.
As more and more dependencies started upgrading, companies started moving, too.
So now that we’re close to the end, what does the cutoff of Python 2 from development mean for the ecosystem of developers dependent on it?
Judging by the state of things on the internet, you’d have thought that everyone completed their migrations. In a survey from Jetbrains, who makes IDEs like IntelliJ and PyCharm, 75% of individual respondents had indicated that they’ve already migrated. A flurry of blog posts have shown the same. For example, Dropbox detailed their migration in the fall of 2018. Instagram migrated in 2017. Facebook started in 2014. Splunk, at the urging of their customers, also did so recently.
However, just because Python 2 is reaching end of life doesn’t mean companies will stop using it overnight. How do we know there’s still significant energy being invested into Python 2? We can check out what’s going on directly with PyPI, the Python package library. In 2016, the core developers behind PyPI started sending logs to Google’s BigQuery, for the ability to run SQL against them, which makes it much easier to make architectural decisions based on usage.
For example, if you want to see which libraries have been downloaded, by Python version, over the last 30 days, you can create a new project in BigQuery (the first 1 TB queried per month is free), and run:
SELECT REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)") as python_version, COUNT(*) as download_count, FROM TABLE_DATE_RANGE( [the-psf:pypi.downloads], DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"), DATE_ADD(CURRENT_TIMESTAMP(), -1, "day") ) GROUP BY python_version, ORDER BY download_count DESC LIMIT 100
Even though Python 3 has been the dominant version in the community for at least a year, the latest count of individual package downloads from PyPI shows that at least 40% of all packages for September of 2019 are 2.7 downloads. Granted, this is down from 60% at the beginning of the year, but is still significant given that EOL is only several months away.
On a per-library basis, it gets a little trickier: Most Flask downloads are completed using the Python 3 version, but only 26% of botocore downloads (the AWS SDK for Python) are using Python 3.
And, there are several libraries that are going to hold off with the migration: Twisted, a web framework, which has only partially been ported, and PyPy, a frequently-used JIT compiler, which will keep version 2 around indefinitely.
End of life for any given piece of software usually doesn’t mean that software is no longer available. It does mean that it’s no longer updated against any security vulnerabilities or adding any further bug fixes. The Python PEP regulating the end of life (the language spec) specifies that,
This declaration does not guarantee that bugfix releases will be made on a regular basis, but it should enable volunteers who want to contribute bugfixes for Python 2.7 and it should satisfy vendors who still have to support Python 2 for years to come.
But, there are a lot of risks with not updating to Python 3—most importantly, the risk of losing security updates, not taking advantage of new features like type hints, and speed gains.
Why the adoption rate is so slow
So why aren’t we at a higher adoption rate this close to the deadline?
In a tongue-in-cheek post, I wrote that IT runs on Java 8.(which is ancient by today’s standards)
Java 8 is still the dominant development environment, according to the JVM ecosystem report of 2018.
This holds the answer: most large organizations, outside of the hype cycle of technical news posts, move much more slowly than the press or blogs would have you think. Most major banks are still running some variation of FORTRAN and COBOL under the covers, for example.
So while many companies are outlining their migration strategies, just the same or larger amount will stay on Python 2 for a long time. Why is this the case?
In reading the accounts of people who have already migrated, it’s easy to see: migrating codebases takes a long time, is a highly political decision, and experiences inertia, even in the companies that are the highest tech, with the best intentions.
For example, in order to move to Python 3 at Facebook, Jason Fried started by rewriting a service in 2014. Along the way, he made a lot of mistakes, changed a lot of code, and did a lot of finagling to make it known that people were moving to Python 3 at Facebook by doing things like including himself in on new developer trainings. He then teamed up with Łukasz Langa, who had done the Instagram conversion to Python 3:
In 2016, he and Langa formed a brand new team in Facebook to shepherd Python within the company, which they dubbed “The Ministry of Silly Walks.” Because they were “the Python team,” the “perceived authority” he mentioned earlier worked; people assumed they could make decisions about Python at Facebook.
Instagram’s move itself took 10 months. Dropbox, where Guido and Langa now work, took three years, and as of Guido’s retirement several weeks ago, is still in progress. Granted, all of these are enormous codebases, but you have to wonder: if it takes that long with the top people in Python working on it, how long would it take for a regular company, maybe one where Python is not even the primary language?
In all the cases, politics played just as important of a role as technical direction.
Second, security concerns are a problem. Ironically, you would assume that not upgrading would be the bigger risk, but in larger organizations, many people are not allowed to upgrade Python by themselves: the admin or security team pushes updates to them. In some cases, PIP downloads are also not allowed. If Python 2 is the default agreed-upon by the security team, it can take a monumental effort to convince people to make the switch to 3, particularly in settings that are heavily regulated (such as healthcare or finance), and government.
This brings us to the third reason: inertia. Although many versions of Linux, such as RHEL, for example, are including Python 3 alongside Python 2, it is by no means the default, and in switching between 2 and 3, some bugs are constantly being found, especially with pointers to system versions of Python, for example, at Debian.
Python’s been through a long path to move from 2 to 3, and individuals and forward-looking startups have adopted it. Now the second great migration will occur when large enterprises start their migrations away from 2. With regards to Python 2, we’ll see that 40% number shrink further in 2020, but the changes will be incremental, and there will be companies running Python 2.7 well into the future.
Tags: bulletin, python, stackoverflow
Good analysis, but you only just touched on the other major way python is installed – from OS distributions.
PyPi won’t have any data on how many times apt or rpm install a python2 package vs. python3. All the major distributions current LTS versions include python2, and will continue to be supported next year.
Ehm, mercurial? (I am not sure whether they have finally at least passing test suite)
Just wanted to quickly let you know that Łukasz Langa does not work at Dropbox, nor has he ever to my knowledge.
I think you didn’t mean to refer to Jason Fried in this article (wrt the Facebook migration.)
s/Jason Fried/Jake Edge
Whadayamean? Jason Fried is exactly the right person to mention wrt to naming the person that kickstarted the 2to3 migration at FB. There is no mention of a Jake Edge in the article?
My organisation has no plans to rewrite internal tools as doing so will generates no “alpha” (income) at best and at worst will most likely create new bugs.
One thing to add: The download statistics for Python 2 are not representative in the sense that modern pip versions employ much heavier local caching. As such, Python 2 downloads are overrepresented relative to their use.
Ah, here we go again
The Debian issue you link is a non-issue. It was talking about packages relying on Python 3 packages which are non-existent. The people working for Debian packages are normal human being, they couldn’t do everything given such short period of time.
For adoption, all companies contribute the whole team to open source in the Linux server space for necessary packages. Problem solved.
If the package is not present at the OS level, compile it yourself. People are busy solving important problems.
You want to talk about security issue? That’s funny. You can code everything in security in mind by yourself too.
Do the logs for these PyPI statistics include information about where they were downloaded from?
It might be interesting to know how much of that 40% of 2.7 downloads are for automated tests (a good indicator of which would be the download client being a Gitlab/Travis CI/etc server). I certainly leave 2.7 tests around longer than I probably should in code bases I work on. If everyone else does that too, would that skew the figure substantially?
Just a small data point here: when I write a new script at work (even if it’s just for in-house stuff), I choose a language and version that is supported on all platforms our software currently runs on. That’s RHEL 6, 7, 8. RHEL 6 doesn’t even have Python 3 in its official repos. RHEL 7 doesn’t install it by default. So if I ever want to use that script on a RHEL 6 or 7 system, I would face a battle with Python 3 that I don’t face with Python 2. And since Python 2 is good enough for most tasks, there’s no incentive to move (except that Python 2 goes EOL).
I guess the general lesson is: if you want your language to spread, make sure it’s available in the official repos of the main distros. That’s the main way into large, slow-moving companies.
This. This, is exactly what people don’t realize.
Personally, I’m just waiting for Python 4 to release so I can rewrite everything again.
What you quickly learn is just because developers put out what they perceive as the “next-best-thing-since-sliced-bread”, does not mean the user base will follow. Why? There are real-world costs to implement change and unless the changes provide a benefit to the user, by definition, the cost will always outweigh the benefit. Take for example Gtk+2 to Gtk+3 (now 4). There are improvements, but there are huge downsides and a complete lack of styles compared to Gtk+2 requiring extensive porting costs. Python 2/3 has similar costs. A balance of “if it ain’t broke — don’t fix it” against “gee look at all the new tricks it can do”…
Where is the tool that upgrades python 2 to python 3 syntax? Angular has this tool. VB has this as you upgrade it through versions 6 to 2005 to 2008 to 2013 to 2017…
Without a tool to automatically upgrade code, better linking to identify code smell, etc, why would people upgrade. The issue isn’t the companies but the lousy toolset.
2to3, it’s mentioned in this blog post
The tool is called 2to3 and it’s been included in Python 3 since it was released back in 2008.
Python 3 is an object lesson in how language designers DON’T have the option of redacting features. And changing the syntax of the ‘print’ statement (!) was one of the most invasive changes possible.
The punchline to the joke is, changing the syntax has nothing to do with Unicode adoption as such, they just took that occasion to “clean things up”.
Lukash Langa never worked at Dropbox; he left Facebook when he moved to Poland (I know because I used to work with him at Facebook). Guido van Rossum has retired from Dropbox, see https://blog.dropbox.com/topics/company/thank-you–guido.
The Python 2.7 download stats are largely driven by CI tests for packages that are actually already Python 3 compatible, but still need to validate that things still work on Python 2, plus a certain number of packages that are still Python 2 only. It is really hard to tease apart CI downloads from actual use, but the Python 3 side of the equation doesn’t have a single 3.x version that everyone tests against like on the 2.x side of the scales.
> So why aren’t we at a higher adoption rate?
Because Python 3 sucks.
The responsible developers could have left the traditional print-statement without brackets in. They still can put it in again. It won’t take them long to write the necessary interpreter code at all.
But they don’t. Why? Probably because they say: “I have decided, that people all over the world have to write millions of useless brackets again, and I will keep that decision forever, because I made it, and I’m the greatest”.
So you can do with your Python 3 whatever you want. I won’t use it, ever. And there are more people like me. We are many. That’s why Python 2.7 won’t die, no matter what you people declare.
Adding brackets isn’t a problem just because you have to put the 2 extra brackets. It is a problem because it is not backward compatible. BTW, i like bracketed version of print and there are many like me.
A refusal to upgrade because you have to type some brackets is the most childish thing I’ve ever heard professionally.
On Fedora 30 python3 is the default. So since a year you can get default Python3 in an OS
Until the burden of re-writing and re-testing half million lines of code in Python 2.7 to Python 3.something will be much larger, than burden of having repository for Python2.7 source on central server, nobody in our company would even start thinking about migration. There is a lot of other work, that has direct bussiness impact.
And as there is even no use of Pypi then it would not affect your statictics. (And even if it would, how much means one dowload to local server, from where it is distributed to many hundreds other computers?)
So you statistics are nice, but inacurate.
There is something to be said for having a script that runs on every version of an OS for 15 years without modification.
MacOS X has bundled python2 since forever, and there are still parts of the OS using the same v2 scripts since Tiger in 2004, even though Apple itself has deprecated it! Catalina now comes with python3, though future versions will have no bundled python, requiring the user and devs to bundle or install their own.