EuroPython 2018: PyPI: Past, Present and Future
Nicole Harris is the lead designer and HTML/CSS developer on the Warehouse project - the new codebase powering the Python Package Index (PyPI).
PyPI: The Ministry of Installation
The Python Packaging Working Group is a group looking for funding for PyPI, pip, virtualenv, plus community funds. There's also the Python Packaging Authority (PyPA) deciding on the future of Python packaging.
PyPI is THE place for Python programmers to publicly share code, so that other people can use that code, as supported and recommended by the PSF. This is the place you use by default when using pip.
PyPI served 11.2 billion HTTP requests last month. This is an immense load. In addition three is a web interface at pypi.org, which had about another 12 million visitors last month.
The infrastructure costs about 118000 USD/months, or 1.4 million USD/year, which comes out of donations.
History of PyPI
In the early 2000s, back when we rocked Tamagochi, Hotmail, and founded the PSF in 2001. Many indexes started sprouting up, one of the first indexes was the Index of Parnassus.
In PEP 301 a central package index server was proposed, and and then implemented – check the web archive for early 2003 versions (273 projects!) of PyPI. This version was not directly searchable, but you could browse tags. You could then read the README and figure out where the project was actually hosted and how to get it. This was also the time of the first PyCon (200 attendees), so the community was kicked off.
Then in 2004, we got Easy Install, which tried to automate the "follow links and download sources" workflow.
In 2005 we got file uploads to the index server, so that nobody had to host their own sources anymore. In 2007 the PyPI web interface reached a state it kept for about 10 years. But that's not to say nothing happened behind the scenes: Python grew much more popular, so people had to work to scale the system, keep up with spam, malicious attacks, and put out fires everywhere.
Before 2012, everything was hosted on Dinsdale, a single hard disk. In 2013, DRDB was used for hard drive replication, and a CDN was added (fastly). Then in 2014, the system was used to Rackspace with GlusterFS, a cluster file system.
But the PyPI code was still difficult to maintain – naturally it predates most stuff on PyPI, including any web frameworks. This made it also hard to set up, hard to find contributors, and just an overall hurdle. Therefore there was no significant new feature development, and a poor bus factor.
Donanld Stuff then created crate.io in 2011, but shut it down again in 2014, and made a lot of proof of concepts and started efforts to change PyPI to be more maintainable. Finally, in 2015, warehouse stuck. So in June 2015, Nicole got involved as a designer.
Warehouse
Warehouse uses modern tech (pyramid etc), it uses modern tools (docker, continuous deployment). It's more stable and more secure, it has way better user experience, and it is much easier to contribute to. Contributors are looked out for, and made welcome.
Despite optimistic estimates, work took some time: there was a lot to do, no project management, and the old PyPI was still requiring care. Then in 2017, the Mozilla Open Source Program awarded 170000 USD to work on this project, and bring it up to feature parity and set it up in the place of PyPI. This allowed to get a team of six people working on the project for five months.
Work included authentication workflows, account administration, management of projects, releases, and files, as well as UI improvements, bug fixes, and documentation. This will help introduce newcomers to the Python community. There was also time for an infrastructure overhaul. They merged 425 PRs, closed 302 issues, and supported 26 new contributors.
They came in on time, on spec, and on budget.
On March 26th 2018, the beta was launched, and by April 30th the old code base shut down. Yay.
Features
- Markdown support: search for 'markdown description' to see how to enable it in an example project
- Vastly improved search via elastic
- It's fully responsive, including administration tasks, regarding mobile interfaces
- Lots of help resources were introduced
- An easily visible history of each package
- For the crew: scalable, extendable, usable, maintainable.
Next up?
- Accessibility improvements: there was an audit, and the points should and need to be addressed.
- Localisation and internationalisation
- Design research, studies, and UX improvements
- 2 Factor Authentication (a spec is there already)
- Improving the audit trail on projects, e.g. for releases, deletion, team management
But how do we get there? First we need to figure out how much we care about PyPI? Can we carry the maintenance costs? The development costs? The hosting costs? There are really only a handful of people to maintain and improve PyPI currently. How should PyPI evolve to meet the changing needs of the python community? Are we prepared to allow commercial interests to fill the gap.
How can you help?
- Verify your email! (to reduce spam ratings)
- Engage in PyPA IRC and issue trackers
- Contribute on GitHub
- Sprint! There are tagged issues for sprints
- Donate at donate.pypi.org
- Thank the sponsors: Mozilla, Rackspace,