or “Why the ther team will make your job with Docker harder”
During my internship, I got to play with Docker, which is quite impressive in terms of ease of use, but also in terms of performances. The technology behind it feels rock solid (but maybe that’s just marketing! 😉 ).
Please check out this post if you don’t know Docker already.
There are some Docker images containerizing OS’s (Ubuntu, Debian) but also for software, like MySQL or OpenVPN. Users can use any of these images as a base for their own custom images. It means that the user will be able to add stuff to an existing environment to create their own microservice.
To create an image, Docker users have to write a Dockerfile which contains a list of instructions to perform to rebuild the image on top of a base image. That’s a powerful feature, and also the standard way of building maintainable Docker images.
But one of the biggest issues with Docker I see at this time is the lack of a convenient way to propagate an update from a base image to the chains of images based on it.
With an example
Let’s say, you have 3 Docker images. image1 is built on top of the base image, and image2 is built on the top of image1.
base image -> image1 -> image2.
The corresponding Dockerfiles would be:
The problem is that, even when specifying the ‘latest‘ version in the Dockerfile of image1, you will only get the ‘latest’ version of baseimage at the build time.
Then, if I build image2 a few weeks after that image1 has been built, baseimage (on which image1 is based) may have received some important updates meanwhile that image2 won’t be able to get.
image2 won’t be able to get the update until the maintainer of image1 performs a rebuild. So you better choose wisely on which images your own images depends upon, and be sure that there’s a team that will maintain during all your own image lifespan.
It’s not a bug, it’s a feature
If you try to decipher the philosophy behind all the official documentation, this approach is actually quite explainable, and most of all this is about build reproducibility. I will soon publish an other post, diving into more details about the Docker philosophy and explaining why the update propagation isn’t a real Docker problem.
But let’s say you are working on a project and your team is maintaining its own chain of Docker image as part of their workflow. This way, you can provide images containing multiple (and incremental) layers of software to other teams, having still a simple way of managing each layer (= image) independently.
Then, one day, your team fixes a critical security problem in the the root image all other images are based on.
To propagate the patch to the child images (used the other teams all over the company), the team would have to rebuild all the layers in the right order, which mean every single image.
And at this point, I hope for your team that their Docker image building process is fully automated, otherwise, they’ll have a batch time.
The “Docker” solution to this problem would be to have multiple containers running separately each “layer” of your software (core(s), DB, storage, etc…) and link them up.
And here comes the ther team…
Then, why not linking the containers together instead of wrapping all of them in a single image?
First, because I’m trying to make a point! 😉
Then, even if Docker makes it really convenient to wire together multiple containers to , it’s could still go wrong if not done properly by the other team. And as everybody know, the ther team always screws up things, so your team prefers to keep full control over the images!
Thus, it is actually a pretty plausible workflow choice to provide other teams with “black box” images, working out-of-the-box, even if it does not match the Docker philosophy, that would make us creating many images and linking all the containers together, instead of relying on long dependency chain.
(What is disputable though is the fact that the latest tag in the FROM instruction of an image is resolved in a way that results in a loss of information for sub-images, totally preventing any update propagation in any way, afaik. To me, by using latest, base image developers already expect the build of their image not to be reproducible. Why not giving sub-images developers the same freedom of choice about the base image they’re using?)
So, what’s really missing in the end?
If the lack of updates propagation has to be considered as a design choice, then what’s missing to Docker?
Even if you can’t control the update chain when depending on third-party images, you should still be able to control your internal workflow, when working with your own images.
Because having to rebuild all the images of dependency chain one by one (and in the right order!) is somewhat cumbersome, Docker should provide with a convenient way of rebuilding an image chain for people that are so inclined.
A solution to this problem could be to reference other Dockerfiles when building your image. The FROM instruction is currently used to define the name of the repository your image is based on.
eg: FROM ubuntu:14.04
But now, you could also point to another Dockerfile, which would make Docker rebuild the entire chain before packing everything in your image.
Then, with our example, you would end up with the following Dockerfiles:
Is it an acceptable way of using Docker?
As I said above, even if that might not perfectly match the Docker philosophy, that’s still a use case that seems realistic in some environments.
Actually, I think the reactions to this post will be some kind of acceptability indicator!
If you have any thought about the topic, please leave a comment!