Glyph – Shipping Software To Users With Python – PyCon 2016

Glyph – Shipping Software To Users With Python – PyCon 2016


[applause] Thanks for coming, everybody. So, I’m Glyph.
There are all of the identifiers that you can find me
on the Internet with. I work at Rackspace
on open source research, including maintaining my main
claim to fame, which is Twisted, an event-driven
networking system for Python. And today, I am here
to talk to you about how to ship software
to users with Python. Everybody feeling pretty good?
Pretty excited about PyCon? Yeah! All right. Well, don’t worry —
if you are feeling good, I am definitely going to
fix that with this topic, which you can see
from the alternate title: “distutils:
A Tragedy in Three Acts”. [laughter] So what am I going to
tell you in this talk? The fact that my distutils joke
got a laugh, and I put that in the speaker notes
so I knew it was a certainty, is an important hint. People feel like getting
Python software installed is really difficult
and they want to know how to do it. I want to communicate
two important things here. First of all, it’s mostly
just confusing, not difficult, to get software shipped
with Python, if you know how. And please, for the love
of all that is good, do not rewrite
your entire application in Go because you couldn’t
figure out distutils. Do not rewrite your entire
Python application in Go just because you saw
one weird error message when you were trying to make
an EXE for your Windows users. However, the second thing
I want to get across is that while the problems aren’t
insurmountable, there are problems. We should aspire to do
much better as a community, and we can do a lot better
without much work. So who’s “we,”
if we have work to do? There are three audiences
for this talk and I’m going to ask
each of you to do some stuff. The first audience,
which is the position that I most often find myself in,
is library developers. For the purposes of this talk,
library developers are people who make stuff
and upload it to PyPI. I’m going to ask you
to build some wheels for all the platforms you support
and put them on PyPI, and I’m going to tell you
why you should. And I’m going to be honest, this
crowd has it the easiest right now. The tooling in this part
of the ecosystem has improved tremendously
in recent years. The second audience, which is
probably the biggest audience, is application developers. Application developers are people
with some code in Python that they want to either run
in production themselves or ship to Python developers
for them to run as part of their toolchain. What I’m going to ask of you is that you stop making
your users care about Python. I’ll tell you why
you should want to do that, how you can do that today,
and what general conceptual knowledge you can use
to do it better in the future, assuming that our third audience
listens to me. But before I move on
to that audience: broadly speaking, there are two types
of application developers. There are service developers,
who write code that runs on computers that they
themselves control. Given the current Python
ecosystem, the majority of us are service developers
at this point. The second type is of course
client application developers, who write code that runs on computers
owned and controlled by end users. Although i’m going to talk a lot
about server-side tooling, because that’s what’s
the most mature at this point, one thing I’m hoping
to convince you of is that more of us should
become client-side developers since there’s some pretty cool
stuff going on there and it’s very nearly
within our reach. The third audience
is distribution tool developers. I’m not talking about
the people who make pip and PyPI. I know Donald. If I want him
to do something, I just ask on IRC. I don’t have to, like,
do a 45-minute Python presentation. I’m talking about the authors
of tools that distribute for client-side
applications, mostly. Distribution tool developers
are the people who make the tools that convert the code that application developers
and library developers write into artifacts that can be
installed on the user’s platform, whether that platform is servers
or mobile or whatever. Let’s also define some terms. In an operating system,
like Debian or Red Hat, a package is a collection of files
with a name and a version in an archive
that get installed together. They can be installed with a tool
that comes with the operating system. Packages can also refer
to other packages, depend on them, recommend them, and so on. In Python, though, just to get us
off to a really good start in getting our applications
over to users, the word “package”
is differently defined and refers to the collection
of files within a namespace. In Python, the kind of thing
that you would call a package in an operating system
is called a “distribution” — specifically,
a distutils distribution. This distinction in terminology
is important because knowing what you need
to do with packages versus knowing what you need
to do with distributions becomes very important
when you’re using certain types of packaging tools. And of course,
for maximum clarity, the institution set up to deal
with distributions in Python is called
the Python Packaging Authority. So, to a programmer, a distutils
distribution looks like a directory containing a bunch of Python code
next to a setup.py which describes that code and adds metadata
to it like name and version. I am not going to explain how
to write a setup.py in this talk, partially because we don’t really
have enough time for a full tutorial on all of the different ways
that setup.py can be composed, but also because
this is the sort of thing you can learn just by reading
the packaging documentation. To someone preparing to distribute
some Python code, though, either users or other programmers,
a distutils distribution may come in lots of other forms,
including the — and most especially a wheel,
which is an archive that contains the code
in a nearly ready-to-use form. And I say “nearly,”
and that’s important. Distributions can be described
with a requirement which gives a name
and optionally a version. A requirement which typically
goes in a requirements.txt looks like this.
This requirement says, “I need Twisted,
exactly version 16.1.1.” This is not true, of course;
you need any version of Twisted. But that’s how you would
describe it that way. So, let me begin
our story now in a dream. What I’m going to tell you here
is how things should be. Let me set the scene for you. A ten-year-old programmer
is seated at a laptop typing away
at their kitchen table. They’re making a little game
for their friends. Of course, just like you do
when you want to do this, they pop open pip.app; they decide
what environment they want to work in; they click the
“New Environment…” button because they’re going to be
working on a new project; they make a game
in that environment; they create a new distribution;
they specify the version number; they go write some Python code,
which we all know how to do, so I’m going to skip that part
in the interest of time; and finally, they export their
Python distribution as an application. Of course, their friends might not
have the same kind of computer that they do, so they can
export their application as many different types of script. Of course, you’re all
probably wondering where you can get
this miraculous tool. It’s just an idea. That doesn’t
exist anywhere. It’s a dream. But this is where we want to be: seamless publishing of applications
to multiple platforms from a simple user interface
that you can understand. In this act,
I’ve just described my wish for a tool that could solve
all these problems for us and shown what it might look like, but now let us descend
into the desert of the real. As I said when I was defining
my audience earlier, the most popular way to ship
a Python application today is to deploy it
on a Unix-like application — er, sorry,
a Unix-like server of some kind. So what I’m going to tell you now
is the right way to produce artifacts
for deployment in the cloud. Once again, not going to tell you
how to write the code for the cloud. You’ve got Flask and Django
and Twisted and whatever, so you can take care of that part. Once you’ve written
some Python code and you’ve put it in a distribution
by writing a setup.py, now it’s time to get the
distribution installed somewhere. So where do you install
a distribution? When you install any Python code, you install it
into a Python environment. APython environment is a collection
of file system locations where you can put files that will later be found
by an import statement in some Python code that’s being
executed in that environment. The default Python environment
is sort of the intersection of a few different things: the system standard
library directory, the system site packages directory,
the user packages directory, and directories on the
PYTHONPATH environment variable. In the bad old days, PYTHONPATH
as an environment variable was used quite extensively to glom
together multiple code locations. But in the modern Python ecosystem, that is thankfully
no longer necessary. In fact, it’s usually considered
quite a bad idea, so don’t do that. You shouldn’t be deploying
Python code with PYTHONPATH. And speaking of things
from the bad old days, you should never install code
into the system Python environment. [applause] Did not see that coming,
but awesome. I’m glad that message
is getting out there. So apparently for many of you
this will be old hat, but any Python code installed
into your operating system is going to expect your
operating system versions of code. This means that if you ever
sudo pip install a package, you’ve potentially broken
your entire operating system. Your OS vendor has hopefully,
although not really, done integration testing across
all of the packages that they ship. But when you start making
import MyLib do something arbitrarily different,
you are now responsible for testing every single tool
in /usr/bin that might ever import MyLib. In addition, if you start writing
stuff into system Python directories, you are creating files,
but those files don’t have matching database entries in your
system package manager’s database. So not only have you created
a Frankenstein package environment in the import namespace for Python,
you’ve also potentially broken the ability to upgrade,
and ruined any checks that the operating system
has in place for integrity. So once again, never install
anything with sudo pip install. Never tell users that they should
use sudo pip install, because you’re potentially
overwriting files from your platform with a tool that doesn’t know
how your platform is managed. Another thing you should avoid doing
is invoking setup.py install directly without going through pip. If you do use setup.py install, it won’t make a record
of what files it created. It won’t necessarily
import setuptools because pip forces setuptools
to be loaded before it loads your setup.py, even if
it just imports distutils. That means your version information
may be inaccurate or absent. It might not be
in the right format. So, there’s also no such thing
as setup.py uninstall, and if you install
with setup.py, pip won’t know what files you’ve installed,
so it can’t uninstall them. One way you can create
your own Python environment is that you can compile your own
version of Python from scratch. This might seem extreme, but in many cases
it actually makes sense. If you build your own Python,
you can bundle up that whole directory
and be fairly sure that it’ll work, but only relatively sure;
your build of Python might accidentally depend on stuff
from the operating system and you would need to discover
which things those are. The other problem with building
your own Python environment is that it might be
a little heavyweight. A Python build is approximately
a hundred megabytes of overhead if you include all of the stuff
that gets generated and is included in the source.
In our modern era of terabyte hard drives,
this isn’t prohibitive for a final artifact,
but for development, where you might be working on
two or three different projects, each of which might be
using a dozen or so tools, that’s more than a gigabyte
of overhead, not to mention the expense in time,
of compiling an entire Python VM any time you want
to change anything. So this brings up a popular tool for
addressing this problem, virtualenv. One of the reasons I started off
with a long description of what a Python environment is is virtualenv is one of the
worst-named projects out there. Nobody gets out of bed
their first day of learning how to use Python and says,
“I need a virtualenv.” Virtualenv creates
lightweight Python environments by sharing most of the common bits,
most of that hundred megabytes, with your system’s
Python installation. So this offers
a 90% reduction in size. A single virtualenv has the
overhead of about ten megabytes, which means you can have
ten times as many. Perhaps more significantly,
it takes only about a second rather than minutes
to produce a new virtualenv, so it’s something you can easily do
over and over again while you’re debugging. So, is virtualenv
what you should use to distribute software
to all your end users? Sadly, no. Virtualenv
achieves these optimizations by creating an incomplete
environment which points it at and shares resources with
your main Python environment. So you can’t move a virtualenv
between computers without moving the entire
main Python environment at the same time,
which can be extremely tricky. But this is a qualified “no.” There are cases where virtualenv
is a totally reasonable solution, as long as you build the virtualenv
on the target machine where the software
is being deployed. If your user is a server
that you need to ship code to for deployment, then your deployment target
isn’t really the target OS, it’s a target Python environment
on that server. You have to prepare that server
by ensuring it has a Python environment ready. Once you’ve done that,
you can make sure that virtualenv is part
of that root environment, ideally by using
system package tools, although as a last resort,
you could just use pip to install virtualenv
into root’s home directory, if that’s what it comes to. Then you need to create
and populate a virtualenv for your application
on that server. This can be done
very fast and reliably starting with a requirements.txt. If you use the same requirements.txt
in build and production, you can have a high degree
of confidence that the environment on the server will look very much
like the one you used in development, and you can create
the environment on your build — you can create all the resources
you need to construct the environment on your build server,
then construct the environment itself on the deployment server with
virtualenv and populate it with pip. Once you’ve done that, though,
a requirements.txt file isn’t quite enough,
for one major reason. It’s fine to have C code
as part of your project, but when you’re done
building your code and it’s time to install it,
you really don’t want to have a C compiler on the server. Having a C compiler on your server
is bad because it means you get potentially different compiled
bits in each deployment. [laughter] A couple Russian speakers,
I guess, in the audience? There’s a principle in statistics
and systems theory — this is a real thing — it’s called
the Anna Karenina principle. In Tolstoy’s book from which
the principle takes its name, he formulates it as,
“Happy families are all alike. “Every unhappy family
is unhappy in its own way.” This is also true of servers:
every happy server is the same, but every unhappy server
is unhappy in its own way. If you think about administering
a server in production, have you ever had
a pleasant surprise when you found something unusual? [laughter] This is what
that surprise looks like. [laughter] These differences can make —
between different build environments on different servers can make
debugging really difficult. Furthermore, this kind of issue, when it crops up,
it’s cropping up in C, and now you’re a C developer
and not a Python developer, and this is not C Con. If you need to invoke
a C compiler on your server, it also slows down deployments,
sometimes dramatically. Although running compiles on all
of your servers at the same time does parallelize,
it means that you add a fixed amount of latency
to your entire build pipeline. There’s also the fact
that development tooling can considerably increase the size
of each virtual machine or container. The basic set of Debian packages
you need to compile almost any C module
on a Debian drive system such as Ubuntu, Python-dev,
libffi-dev, libssl-dev takes up about 180 megabytes
of install space and requires twice that to unpack
and get installed in the first place. This means you need
half a gigabyte of space just to get ready
to install your application, before the first part of it
is even installed. That’s a lot of overhead,
even more so in the container-oriented world
we are moving towards where each image
for a small service is expected to cram
into a few tens of megabytes, already a tall order for the
hundred-megabyte Python interpreter. I hope I’ve made the case
that you don’t want to build toolchain in your production servers. This means that you don’t really
want to ship source packages to your production machines
for installing. Even if some of those source packages
only include pure Python code, the fact remains that the format
of [random source archive] could always contain extension modules
and it’s better not take that chance. And if it helps you to remember this,
just remember that when you see a tool for building in the right place,
it just looks like a tool. But when you see a tool
for building in the wrong place, it raises ominous questions
like, “Why is that there?” [laughter] Before we get around
to what you should do, there’s another thing that you
shouldn’t do in production which is to do any network I/O. You should be able to push
your code into production as a whole unit
without having to worry that your production machines
are then going to need to go back out to the Internet
to retrieve more information. Why don’t you want your install
process to do network I/O? Well, the main reason
is reliability. You want to ensure that your
deploys are all or nothing. Of course, building
software is complicated: it might fail, networks are flaky,
and copying the code to the production machines
might fail in the first place, but once everything
has made it all the way there, you don’t want any special checks
and retries or halfway states between “it’s deployed”
and “it’s not deployed.” If one of your dependencies
goes missing due to a temporary network outage,
you don’t want your service to be unable to start up again
until that retry finally succeeds. You also want to be able to restrict
your networks as much as possible. If you’ve got two internal services
that are part of an application which should only be able
to talk to each other, you should be able to cut off
their network access entirely once the packages are
transferred to those machines. Not everything needs to be
hardened quite this much, but then again,
not many applications really have a need
to talk to PyPI at runtime. So this should be
the sort of thing you can leave up to your network administrator. There’s also the question
of availability. Now, PyPI hardly ever
goes down these days. In fact, even to take this screenshot,
I had to just make a fake 404 because there’s nowhere
that PyPI’s down from. But your network connectivity
to PyPI is probably less reliable than the service itself,
and you should be able to complete your deploys
even if you have a routing problem. So when you build
your application, which should be
a distutils distribution, and all the libraries
that it depends on, each of which should be
a distutils distribution, you want to build each one
into something that contains your Python code and any
compiled extension modules in such a form that it’s
going to put the files into place and not do anything dynamic
at installation time. Does the Python packaging
ecosystem have such a thing? Why, yes it does: wheels. A wheel is the representation
of a single Python distribution, which can be installed on
a platform or set of platforms. It includes in its name metadata about which platforms
it’s suitable for. When I say “platform” here,
I’m describing a general concept which can be either a Python version
(which could be 2, 3, or both), a specific runtime
(CPython, PyPy, or either), a set of operating systems
(OS X, Linux, or Windows) and so on. Now when pip is presented
with a repository containing multiple wheels that
represent the same distribution, it can figure out which one
it needs based on that metadata and get it installed for you. So if you’re a library developer, making wheels and uploading them
shouldn’t be too hard. Mostly you just need to run
this exact code. This creates a single wheel
for your library for the platform that you are running this build on,
and then uploads it to PyPI. However — there’s no way to make
this asterisk big enough — it’s not quite that easy or everyone
would have done it already. But if you’re a library developer,
please do this for all major platforms: Windows, Mac, and manylinux1. Since you probably already know
how to deal with any C libraries you’re developing
as a library developer, your users don’t. I don’t have time to cover
all the nuances of that process, so you’ll have to stay for Paul
Kehrer’s talk right after this one. It’s really kind of a non-optional
extension of the same topic. But I will say
that you have two choices: either you deal with the subtle
nuances of any C code you’re using, or you force every single
developer who uses your library and then probably most of those
developer’s users as well to deal with those problems. If you’re a service application
developer, it’s even easier, assuming that the library
developers have done their jobs and built wheels for you
and uploaded them to PyPI. If not, you’ll want to stay
for Paul’s talk too. You need to have a requirements.txt
which describes the exact versions of everything
that your application needs. If you are security-conscious,
you should also put hashes of all of those things
into your requirements.txt, and you can go back and watch
Ying Li’s talk from yesterday to see how to do that with some
pretty cool signing technology from Docker. So once you’ve got that, though,
you just need to run this. You need to run pip wheel
with your requirements.txt, and it will output your built wheels
into a directory of your choosing. The best part
for application developers is that the more
of your dependency vendors have uploaded binary wheels,
the less work this step is. If everybody’s done their job,
then this just downloads a bunch of wheels
and compiles your one thing. I’m going to leave it to you
as an exercise for the listener as to how to best get these wheels
from your build host to your production host.
There are a number of different ways. That’s at least two or three
different talks in its own right. But once you’ve done so, all you need to do
is create an environment, tell pip to install only
the wheels you just built, make a virtualenv and run pip
to install the wheels into it. So this here shows
a zero downtime blue-green deploy that’s — hopefully you’ve got
a load balancer in front of it. But this is how you make
a new virtualenv, install stuff into it,
and then blow away the old one. To recap the workflow, to tell you
what I just told you again, in order to get a Python service
into production, what we need to do is describe the
service as a distutils distribution, saying what code it contains
and a requirements.txt that carefully describes
what each version of its dependencies
it’s going to use; build the software
and all of its dependencies into wheels
for our target platform; and finally, ship the software
to the production system where we install the software
into a virtualenv with pip. I don’t want to talk too much
about containerization in this talk, but at this point, it’s important
to correct a common misconception, and this misconception was reflected
in a talk just yesterday, so apparently
it’s still out there. Docker is great,
you should definitely use it, but many people in their exuberance
hear about Docker and assume that this workflow I just described
no longer applies to them. “It’s a container, it’s isolated.
Root can’t break the operating system. “Just ship images everywhere.
Sudo pip install everything. “You don’t need virtualenv,
you don’t need wheels. “Everything can just run
in one big container.” Containers do improve
this situation somewhat, and there’s less
that you can break. They certainly make it
harder to make mistakes. But, fundamentally,
the same general rules hold because, fundamentally,
the thing you’re running in the container
is a Linux distribution. Instead of “you don’t want
to have build tools installed “on your production host,”
it’s “you don’t want to have “build tools installed
in your service containers.” And instead of
“you don’t want to use sudo pip “to break your operating system,”
it’s “you don’t want to use “sudo pip to break your base image.” This last point bears repeating, I think. Debian and Red Hat are fantastically
complex engineering projects integrating billions
of lines of C code. For example, you can just
apt install libavcodec, or yum install ffmpeg.
Writing a working build system for one of those things
is a PhD thesis. They integrate thousands
of Python packages simultaneously into one working environment.
They don’t always tell you whether their tools
use Python or not. And so, you might
want to docker exec some tools inside a container.
They might be written in Python. If you sudo pip installed
your application in there, now it’s all broken.
You can’t use any of those tools. So even in containers,
isolate your application code from the system’s Python tooling,
or build all of your containers using from scratch without
the benefit of apt or yum. And if you do the latter,
that may help you appreciate all that stuff I just said
about being hard. So far the story is pretty good.
We can build software repeatably, we can ship it to production,
and we can assemble it into a working system
on the deployment host without any build tools
except for pip. However, you’ll notice
that I’ve been talking almost exclusively
about server-side developers. So, in this section,
hopefully I’ve covered all the best practices
for server-side deployment. As we turn our eye to shipping
Python software to end users, though, this is where things
begin to fall apart. In this segment, I’m going to tell you
about the sometimes rocky road to deploying to end user
devices in Python. The problem begins
with the caveat I gave at the end of the last section,
“except for pip,” because pip itself
is a tool written in Python, which means that if you just upload
your software to PyPI for end users, you have to tell your users
to get a working Python and a working pip
before they can install anything. Many projects tell their end users
to pip install various things, so this is a common anti-pattern.
Don’t do that. Let’s assume that getting a working
Python on your system is trivial. It’s already there on OS X.
It’s on — on Windows, let’s pretend that every user
can instantly intuit the difference between Python 2
and 3, download the right thing, figure out how to start up cmd.exe,
or maybe install Git Bash. Let’s also install that pip
is just installed correctly. If you just pip install
after having done that, you will get a “permission denied”
error because you aren’t root. It’s going to try to write
to a directory owned by root. If you have the option to sudo pip
install, which you might not, congratulations, you just
broke your operating system, like I just spent
many minutes describing. There is an issue in pip
to fix this, ticket 1668. It was opened in March of 2014,
has over 100 comments on it, a half a dozen open dependencies,
and it shows no sign of being fixed anytime soon. But even if that issue were fixed, or you somehow in advance
can convince users to do pip install –user
to put their — to put your application
into their home directory, and your application —
and we’re still assuming here that your application’s, like,
a nerdy command-line thing — is generally documented as
a command-line, like, awesome app. And pip install –user
will put the script in a place that isn’t available
on their command line by default. In order to fix that problem,
now we have to tell the user to learn how to configure
their shell. We all like Python.
We like Python because we get to write code in Python.
Writing code in Python’s fun. But users do not like Python.
Users don’t like Python because users by definition
are not writing code in Python. They’re just trying to run
our code that’s written in Python. But instead of our code actually
running on their systems, their phones, their desktops,
their servers, instead, if they try to just
follow the instructions they find on the web that say
“sudo pip install everything,” they’re going to run into
error after error. So for us, Python is a language that
lets us do amazing and fun things, but for our users
it is a hollow, imperious voice telling that their system
has been judged and found wanting. It is telling them
that they are unworthy to run the software
that they think is so cool because they do not have
the all-important vcvarsall.bat. [laughter] So now that I’ve dug this hole for us,
it’s time to claw our way out of it. Think for a second about
how you get software when you actually want
to accomplish a task other than programming in Python. You might download it
from an app store, or you might go to somebody’s website
and double-click an installer, or you might use a package manager
like apt, dnf, or brew. And as a brief aside here, I’d like to just talk about
the Go build toolchain. Go has a great build toolchain.
It produces a single file executable. That’s great, and that’s a much
simpler jumping-off point for what we’re trying to get to
here than what Python offers us, but it’s not the full story. The problem is
that one file isn’t enough. Every app store and package
manager requires an archive of many files to install, usually mandating at least
a few pieces of metadata. If you want your software
to be automatically updated, if you want it to have a GUI,
if you want it to generally be a thing that you would use,
you don’t get most of your software by just curling some random URL
chmod a+x and then you’re good to go. You get it out of these systems
that have multiple pieces of metadata. So, there have been
a lot of efforts to prioritize single file executables
coming out of Python, and while those efforts are cool
and people should work on them by all means,
we don’t need to do that and we shouldn’t really
prioritize it in the service of getting an easier
deployment workflow. We should target these stores,
these package managers, these operating systems
as our deployment targets because that’s where these
executables are going to go anyway. So before we get into
more self-contained options, for your more nerdier users,
you might want to consider the benefits of integrating
with system package managers. After all, if brew install
is good enough for you, maybe it’s good enough
for your users who are like you. Obviously this is not
a great way to get started on mass market application,
but HomeBrew can automate ensuring that you have
the other pieces in place that you need
for your application to work. So, to get started on this,
if you want to build something that can be brew installed,
you can take a look at the docker-compose formula
which is already available in HomeBrew. And there’s a pretty straightforward
collection of Python packages that are all already hosted on PyPI, and it takes something which is
sort of like a requirements.txt and converts it into
a usable command-line tool. It’s kind of odd
in that it uses a combination of a launcher shell script
and PYTHONPATH to achieve partial isolation
from your system Python, but it is a really straightforward
example which gets the job done. However, there’s still no simple
automated way to get from “I have a Python tool available
in the distutils distribution” to “I have a pull request
to open in HomeBrew,” so that’s something that an enterprising
distribution developer could work on. But there are tools that go
from a Python source distribution to an operating system package.
One tool from Spotify can convert a Python package
into a virtualenv that is created
as part of a Debian package. This is the best of both worlds.
You can participate in Debian’s package database,
which, in the class of sort of client
Python distribution, if a desktop Debian user
is using your thing, you need to install desktop files and other bits
of operating system metadata, yet you can still
isolate your dependencies so they don’t conflict
with Debian’s Python environment. There’s also one built into Python
called bdist_rpm which converts your distribution
into a Red Hat package. Much as with HomeBrew, however,
neither of these give us a place to upload rpms
or Debian packages so that users can simply
apt install or dnf install. One gap that could really
easily be filled here would be to have twine
or a tool like it start to automate the part
where you create a package archive for users to actually
get the package installed from. But enough with nerds.
We’re mostly muddling along OK with our Python installations anyway.
So what about regular people? Let’s say you’ve gone to the trouble
to write a GUI application. How do you get that to people? There are a number of tools
that can start with a Python script and produce a usable
application for users. However, you will notice that I said
“script” and not “distribution.” Starting again with OS X,
a pretty good tool for creating redistributable
applications is py2app. Starting with a standard setup.py, you add a special app keyword
argument to setup, and then you run
Python setup.py py2app. You need to specify a main point
free application to run since double-clicking
can only do one thing, but you can also specify any
info.plist values that you want, so you can set an icon as well as
interact with any OS X platform APIs that require metadata
to interact with. It does basically the right thing
with shared libraries, you can include frameworks for native dependencies,
and it can bundle in a Python interpreter,
or optionally use the system one if you want a really
lightweight executable. Py2app builds what is effectively
a self-contained Python environment that is an application by itself. I’m not going to include a full
walkthrough of the docs here, but in many ways, you’re in
good shape out of the box. Py2exe and py2installer are tools that can do pretty much
the same thing on Windows. It has a similar mechanism
where you put a few extra annotations
in your setup.py. Or there’s a cross-platform
version, cx_Freeze, which may be more to your taste
since it requires only one specification metadata. In many ways, you’re in
good shape now with these tools. However, in one key way, you’re not. This is really the inspiration
for this talk. I’ve talked a lot about overhead
and optimization so far, but one almost universal problem
shared between almost every tool for end user distribution is that
it attempts to optimize too much. One of the things we all love
about Python is its dynamism. You can make all kinds of dynamic
decisions at runtime, about what to do or not to do,
how objects behave, and whether or not
to import modules. This is one of the reasons
we have setup.py to describe what modules to include
in a distribution. Just because a .py file
is present in a Git repo doesn’t mean it’s something
we actually want to import, and Python could do
basically anything at runtime. So we describe what modules
make up a package to instruct it what to install
as a way of making a promise about how we expect a particular
installed package to behave. However, there’s this module inside
that’s a dependency of py2app and a few other things:
modulegraph. And what modulegraph and its cousins
in py2exe and cx_Freeze do is to parse the main script
that you provide, statically read
all the import statements in that file,
then all of the import statements in the files identified
by those imports and so on, so that it can create
a stripped-down executable. This is useful functionality
to have eventually because you often do care
about overhead when distributing to end users. But before you care
about that performance, you really want to care
about a working thing. And by eliminating imports
that are not done statically, various pieces of built-in
Python functionality break horribly
when you try to use them. Namespace packages, for example,
which you may be depending upon without realizing it, because
the limited static import analyzer doesn’t know how to parse them,
just won’t import. If you pip install any packages
with a dot in their name like zoap.interface or flufl.enum,
you’re probably using those. This is another
“broken by default” scenario, like pip’s insistence on installing
into the operating system by default even though it knows that that’s
going to fail if it — by default, and a bad idea
even if it works. A small change to default here could
be a huge improvement to usability. But things aren’t all that bad.
In particular, if you hit an issue
with one of these tools where it doesn’t include
a module that you expect it to, once again, don’t rewrite
your whole application in Go. It’s only going to take you
five more minutes. It’s not an 18-month
re-engineering project. So it’s easier to work around
these packaging tools than to rewrite
all of your application code. One workaround
which fairly universally works across all of these
different tools is to just take every possible import that your
application might be doing anywhere and statically include it
in your initial file. Every one of these tools
takes a main script as input because, again, you’re building
a single executable. It can only do one thing
when it starts up. So the debugging cycle
for working through all of these packaging tools is:
build the binary, run it, watch something like this happen,
then once you’ve got an import error that says what module
couldn’t be imported, you go back to your main script
and then you add the import statement. Do that 30 or 40 times and you’ll
generally get a working executable. Now it’s really unfortunate
that we have to do this, but it’s, in the grand scheme
of things, not that much work. Of course, who could forget
everyone’s favorite client environment: the browser.
And there are several mostly working runtime environments
for Python in the browser. I was hoping to have a whole section
of this talk dedicated to them. But exactly zero of them
have any mechanism for distributing anything
to end users at all. You just have to kind of
build a script yourself to somehow glom them together. So all I can say here is:
good luck. [laughter] There are, however, some trends
in the right direction, people who are starting to adopt
the right design principles around this problem. The PyBee project in particular
I’d like to call out because they’ve been doing a bunch
of work in the area of generating redistributable applications
for a variety of platforms. Now they’ve done work in the area
of graphical user interfaces and other types of tools
to help with this process which are out of the scope
of this talk of actually doing the build,
like foreign function interfaces. But the specific library
I’d like to call your attention to is briefcase,
which implements the conversion of a setuptools distribution
into several different formats. And you pretty much just do this
to produce OS X, iOS, and Android applications
out of whatever code you happen to have
in your project. There’s one major limitation
to this, which is: PyPI does not support wheels
for iOS or Android, because most of us Python
developers aren’t there yet. This is something I hope to talk
about in a future year, but if there are any enterprising
mobile developers out there, this would make a huge impact
if you could get this working. Another tool that I’d like
to mention is pynsist, which is Thomas Kluyver’s
Python wrapper around the Windows
install program NSIS. And that can convert
your Python code into kind of a traditional wizard-based
Windows installer. It’s also moving
in the right direction here. Thanks to some of the
discussions that we had in preparation
for this talk, in fact, Thomas had the idea
to make pynsist work by downloading collections
of wheels from PyPI, and those collections
of wheels now include things like piglet and PyCute which allow you to build graphical
applications out of the box, no compiler or anything, that just
install seamlessly onto Windows. So there are definitely
some rays of hope. And I may have painted
a somewhat bleak picture earlier when describing everything
as broken by default, but many people manage
to do interesting things with Python and distribute them
to end users all the time. This here is a screenshot
of MCEdit, which is an editor for Minecraft
written in Python that millions of users
use successfully without knowing Python had anything to do with it.
There are Mac and Windows versions. So this is all eminently possible
with today’s tooling, and it’s worth trying it. In closing, I’d like to issue
a call to action. We should have tools
that target each platform. I could ask you to write that tool,
but of course that’s a lot of work. However, the hard part
of writing that tool is almost always done. Each of the tools
that I’ve referenced today are amazing technological
feats of mastery. They do things like rewriting
executable files in place to point at the correct
shared libraries. They automate multiplatform builds.
They sign and secure software. In many — most of the shortcomings
in the build toolchain have to do with really simple bugs
like not taking those amazing files they just constructed
and putting them in the right place, or trying to optimize them
too aggressively and skipping something
that they needed to do. So these tools don’t need
massive technical innovation, they just need basic fixes. The situation in browser-side
Python is a testament to this general systematic problem. Multiple teams have written
entire implementations of the Python language,
transpilers, interpreters, automated test suites,
even a just-in-time compiler that compiles C code
into JavaScript code and then has C code in there
that emits dynamic JavaScript code to JIT your Python code
into the browser’s JIT. PyPy.js actually does that. But the thing that’s missing
is a script that you run over a zip file full of Python code to, like, run the same command
over it a few times. That’s — that is the missing piece. So if the world of packaging
feels intimidating, just remember that it’s all about
putting the right files in the right place
at the right time, and if you’re savvy enough
to be watching this talk or the video of it later,
chances are you’re savvy enough to have moved a file or two
in your day, so you can fix this. Finally, if I missed it, I apologize for not covering
your favorite packaging tool that’s one epsilon closer
to the desirable goal here. There’s already
a lot of stuff in this talk. The ecosystem’s already
bursting at the seams. So, go forth and package. [applause] (host)
All right, we have a few minutes for questions, so I’ll come around. (audience member)
Just a quick correction: the py installer is cross-platform,
not only Windows. It runs on Mac and Linux also. (Glyph)
Thank you for the correction. (audience member)
If we should never, ever use the pip that’s shipped by Debian,
then why does Debian ship pip? [laughter] (Glyph)
No comment. [laughter, applause] (audience member)
So we were having an internal discussion at our company about
vendoring pip requirements for, you know,
for our production machines so that we know we always have a —
like a solid, repeatable pip environment
for our production packages. Please give me a reason
to go back to them and say, “No, we definitely
don’t want to do that.” (Glyph)
I’m sorry, I didn’t catch — (audience member)
Please give me a reason to not do that. (Glyph)
Yeah, vendoring is kind of an odd thing that people do. Like,
you should be making a mirror of the upstream packages
and building them, but vendoring by, like,
changing all of their names in the Python import hierarchy just makes it harder
to do security updates. There’s no real benefit
to doing it that I’m aware of. Pip does it, for example,
because of some very specific kind of bootstrap requirements
for being the tool that you use
to get other stuff installed, but unless you’re writing pip,
you probably don’t need to do it. (audience member)
Hi. So, wheels are awesome, all the binary packages are awesome,
but please also still recommend people ship a source tarball
to the Python Package Index because there are a lot of platforms
that that’s the only way the packagers on that platform
can build the binary for it. (Glyph)
Sorry, yes, I should have mentioned that. You should absolutely
upload your source alongside any wheels
that you support, and you should build all of your
wheels by building that source. You shouldn’t make a bunch
of source tarballs that you never look at and then
build straight out of your Git repo. It’s important that there be
correspondence between those two things. (audience member)
As sort of a halfway between, like, packaging a complete EXE
in the example desired program that you put up at the beginning,
do you feel like shipping a requirements file
and a virtualenv inside a Docker container
is a pretty good halfway point? Or what else would you like to see
with that to make that pretty viable? (Glyph)
So, I think that shipping Docker images, like on Docker Hub or something,
that are virtualenvs inside a container,
that’s a great way to go. That is a complete deployment
artifact for a platform. That platform is Docker,
so you have to — you’re targeting Docker users
if you’re doing that. We do need to target other platforms,
but that’s totally a valid way to go. (audience member)
So I have an answer for my own question. The Debian package maintainers
for Python are in the room, and they don’t want to identify
themselves for fear of being assaulted. [laughter] But the Debian pip that’s shipped
with the operating systems installs to /usr by default. (Glyph)
Since when? That — I — Wow, that’s awesome. Thank you! Yes. (host)
On that happy note, that concludes Q&A. [applause] Yes, thank you again, Glyph. (Glyph)
So yeah, thank you. And thank you for everybody who asked questions.
Obviously this ecosystem moves fast, so don’t trust anything I just said.
Go read the docs and stay up to date.

7 thoughts to “Glyph – Shipping Software To Users With Python – PyCon 2016”

  1. Amen! Great talk. Definitely think this is a part of Python that could improve, but wheels are a step in the right direction.

  2. I really liked the talk. There's a cool approach to all of this, a lot like the py2anything described in the outro, which is the Buildozer tool that we use at Kivy. The idea is to make an easy to use frontend to all platform-specific packaging tools. It's a WIP, but definitely a step in the right direction. Stay tuned 🙂

  3. Ever since kde-misc/pyrad is in the Gentoo portage tree, it got much easier for me to get it on other computers. That’s my own package. ← heavily supports the point of Glyph about getting into the distribution.

  4. A year ago I spent a day trying to make pyinstaller build a PyQt application for Windows from my GNU/Linux system. It did not work. Did the situation change?

Leave a Reply

Your email address will not be published. Required fields are marked *