Six years ago, I described my system for
configuring a reverse proxy for docker containers. It involved six containers including a key-value store and
a webserver. Nothing in that system has persisted to this day. Don’t get me wrong – it worked – but there were
a lot of rough edges and areas for improvement.
Microservices and their limitations
My goal was to follow the UNIX philosophy of “do one thing and do it well”. Unfortunately, that doesn’t really
work when applied to network services that have to interact with one and other. UNIX tools are built upon a
common file system and simple data passed over STDIN. Microservices don’t have that shared foundation. You
could make one: companies that use microservices in anger often have a team that deals with the “developer
experience” of creating and using microservices. But as a solo developer that’s not something I wanted to
spend my time doing.
Adventures in IPv6 routing in Docker
Published on Oct 24, 2022
One of the biggest flaws in Docker’s design is that it wasn’t created with IPv6 in mind. Out of the box Docker
assigns each container a private IPv4 address, and they won’t be able to reach IPv6-only services. While
incoming connections might work, the containers won’t know the correct remote IP address which can cause
problems for some applications. This situation is obviously suboptimal in the current day and age. It’s a bit
like not supporting HTTPS on a website – you might not have any issues because of it immediately, but you’re
fighting against the currents of progress and are making life worse for your users.
Thankfully, it’s now relatively easy to make Docker behave a lot nicer. The
docker-ipv6nat project has been around since 2016,
and uses an IPv6 overlay network and some iptables magic to route traffic to and from containers in a sensible
fashion. It uses NAT to emulate the behaviour Docker employs for IPv4 traffic; while using NAT with IPv6 is an
anathema, I think it makes sense for containers. You could give each container a publicly routable IPv6
address, but that brings with it a lot of headaches: you’re basically going to be forced to implement service
discovery and some kind of DNS management to deal with the fact that your containers will be popping up on
randomly assigned IP addresses. That is completely overkill for people running a small number of services on
one or two physical boxes; and if it’s not overkill for you then you’re probably already looking at more
complicated orchestration solutions like Kubernetes.
More recently, similar functionality has been built into the Docker daemon itself. You can now
edit the config file to enable ipv6 and each
container will be assigned an address in the range specified when it uses the default bridge network. This
gives more-or-less the same functionality of docker-ipv6nat – you lose a little flexibility as you can’t
disable IPv6 on the default bridge, but that’s a very worthy trade for having the functionality built-in.
So far this all seems very simple. Hardly worthy of being called an “adventure”. Enter stage left: the wicked
witch of destination address selection…
Reproducible Builds and Docker Images
Published on Feb 18, 2022
Reproducible builds are builds which you are able to reproduce
byte-for-byte, given the same source input. Your initial reaction to that statement might be “Aren’t nearly
all builds ‘reproducible builds’, then? If I give my compiler a source file it will always give me the same
binary, won’t it?” It sounds simple, like it’s something that should just be fundamentally true
unless we go out of our way to break it, but in reality it’s actually quite a challenge. A group of Debian
developers have been working on reproducible packages for the best part of a decade and while they’ve made
fantastic progress, Debian still isn’t reproducible. Before
we talk about why it’s a hard problem, let’s take a minute to ponder why it’s worth that much effort.
On supply chain attacks
Suppose you want to run some open-source software. One of the many benefits of open-source software is that
anyone can look at the source and, in theory, spot bugs or malicious code. Some projects even have sponsored
audits or penetration tests to affirm that the software is safe. But how do you actually deploy that software?
You’re probably not building from source - more likely you’re using a package manager to install a pre-built
version, or downloading a binary archive, or running a docker image. How do you know whoever prepared those
binary artifacts did so from an un-doctored copy of the source? How do you know a
middle-man hasn’t decided to add malware to the binaries to make money?
Artisanal Docker images
Published on Feb 5, 2022
I run a fair number of services as docker containers. Recently, I’ve been moving away from pre-built images
pulled from Docker Hub in favour of those I’ve hand-crafted myself. If you’re thinking “that sounds like a lot
of effort”, you’re right. It also comes with a number of advantages, though, and has been a fairly fun
journey.
The problems with Docker Hub and its images
Rate limits
For the last few years, I’ve been getting increasingly unhappy with Docker Hub itself. Docker-the-technology
is wonderful, but Docker-the-company has been making some rather large missteps. The biggest and most
impactful of these has been introducing “pull rate” limits. At the time of writing, if you want to just pull a
public image without logging in then you are limited to 100 pulls every 6 hours. If you log in then you’re
limited to 200 pulls per 6 hours, but it’s account wide. This might seem like a big enough number, but I
repeatedly hit it and there is no way to actually audit what is causing it. I have various containers that may
all pull images at arbitrary times (e.g. continuous integration build agents), and the only information you
get back from Docker Hub is the number of pulls remaining.
On the utility of user stories
Published on Oct 16, 2021
User stories are a staple of most agile methodologies. You’d be hard-pressed to find an experienced software
developer that’s not come across them at some point in their career. In case you haven’t, they look something
like this:
As a frequent customer,
I want to be able to browse my previous orders,
So that I can quickly re-order products.
They provide a persona (in this case “a frequent customer”), a goal (“browse my previous orders”) and a reason
(“so that I can quickly re-order products”). This fictitious user story would probably rank among one of the
better ones I’ve seen. More typically you end up with something like:
As a user,
I want to be able to login,
So that I can browse while logged in.
This doesn’t really provide a persona or any proper reasoning. It’s just a straight-forward task pretending to
be a user story. If this is written in an issue then it provides no extra information over one that simply
says “Allow users to login”. In fact, because it’s expressed so awkwardly I’d argue that it’s worse.