Major docker build refactor#3995
Conversation
|
Test env:
container build from top-level directory with command: docker build -t netdata:latest -f docker/Dockerfile ./container run with command: docker run -d --cap-add=SYS_PTRACE --hostname=netdata --name netdata -p 19999:19999 -v /sys:/host/sys -v /proc:/host/proc -v /var/run/docker.sock:/var/run/docker.sock:ro -e PGID=113 netdata |
|
Done. All Dockerfiles are refactored to be similar to each other and use builder stage. Everything builds, I tested alpine and debian versions to check if #3972 is solved and it is. Sizes (uncompressed - DO NOT COMPARE with ones shown on docker hub as those are compressed): @Ferroin @ktsaou @pgassmann please review. |
|
This could also be a base for creating netdata packages since it has a clear separation of build and runtime stages. |
|
Why are the images double the size of the current images? https://hub.docker.com/r/firehol/netdata/tags/ |
|
| Dockerfile.alpine \ | ||
| Dockerfile.aarch64 \ | ||
| Dockerfile.armv7hf \ | ||
| docker |
| ln -sf /dev/stdout /var/log/netdata/debug.log && \ | ||
| ln -sf /dev/stderr /var/log/netdata/error.log | ||
|
|
||
| RUN [ "cross-build-end"] |
There was a problem hiding this comment.
why is this here, but not in the alpine Dockerfile?
There was a problem hiding this comment.
Because alpine is not a cross-arch build
| # Install runtime dependencies | ||
| RUN apt-get -qq update && \ | ||
| apt-get install --no-install-recommends -y python lm-sensors nodejs python python-mysqldb python-yaml jq curl fping netcat-openbsd && \ | ||
| rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* |
There was a problem hiding this comment.
add
apt-get -y autoremove
apt-get clean
There was a problem hiding this comment.
What for? I don't have anything to remove or clean since rm -rf /var/lib/apt/lists/* cleans it.
There was a problem hiding this comment.
I'm not sure. Saw that in other jobs. Is there something in /var/cache/apt that is cleaned by this?
There was a problem hiding this comment.
This is based on kubernetes build system (here) and on hackernoon blog post (here). I have also checked what is the size of /var/cache/apt in a container:
root@ea5f76d07045:/var/cache/apt# du -hcd1
0 ./archives
0 .
0 total
So there is no need to add apt-get clean.
As for apt-get -y autoremove, there is no unnecessary artifacts so there is also no need to add that command. In some cases it can also cause more harm than good.
|
I would also like to have msmtp that can be configured to send alerts via mail. Have a look also at @titpetric s scripts. But this could be discussed separately. I did not yet configure alerts. |
This is a matter for another PR, not for this one. |
|
Note: to build this, at least docker 17.06 CE is needed. It can run on older docker versions, but it cannot build there. |
|
I have pushed images to docker hub here: https://hub.docker.com/r/paulfantom/netdata/tags/ Size comparission to official images: |
|
@paulfantom nice work! May I invite the people that have contributed dockerfiles in the past to have their views? I see the contributors on these files are: @paulfantom did a marvelous job to reduce the container sizes and allow netdata use the docker socket (working around the mapping of GIDs between container and host). Could you please review his work? Your comments are important to make sure this will not break anything once merged. |
justin8
left a comment
There was a problem hiding this comment.
Changes look good.
It'd be great to have a common docker build file with just the changes per-OS, at least for all the debian based ones for multi-arch. But not necessarily in this change.
| #set -e | ||
|
|
||
| if [ ${PGID+x} ]; then | ||
| echo "Adding user netdata to group with id ${PGID}" |
There was a problem hiding this comment.
Is this documented somewhere for users to know when/how to use this? If not, then we should.
There was a problem hiding this comment.
Probably the best place would be in wiki page regarding docker installation. I am planing on simplifing that page and promote this installation method by adding links in netdata installation page.
Probably will do it today or tomorrow.
There was a problem hiding this comment.
Awesome! sounds good.
Might be worth adding it in to the readme file too, since that is what gets shown on the dockerhub page automatically I believe.
There was a problem hiding this comment.
How is the "Full Description" on dockerhub managed? https://hub.docker.com/r/firehol/netdata/ Is this just the Readme of the source repo? It contains a log of generic information but no USAGE information for the docker images.
The dockerhub description and the Wiki page should also clearly show the limitations of this installation method.
There was a problem hiding this comment.
I hadn't tried it before, but just did a quick test here [0] and it seems that docker will read a README.md in the directory the dockerfile is in, then the root one only if that is not found. (source is at [1]).
So.... We should be able to add a readme for the docker image in the /docker folder
[0] https://hub.docker.com/r/justin8/dockertest/
[1] https://github.com/justin8/dockertest
With this PR all dockerfiles are aligned with each other so there might be a possiblity to unify them with some clever scripting in the future. However as long as we have an alpine build it will introduce more hacks than necessary. Just to be clear. I am a strong opponent of adding new docker images for x86 platform (let's have one, but a good one). I am also trying to get debian image to around 40-50MB so we can get rid of alpine and have nice, unified Dockerfiles. |
|
That is true. But we could get it down to 2 files, the cross-build stuff should really only need the cross-build-start/end lines. A little jinja template or similar would make it pretty easy. I'd also be in favour of reducing it to one x86 image. The alpine image was a hack to make a smaller images before docker support the more modern build/run environment stuff. |
|
Cross-build Jinja for a Dockerfile, that's a neat idea! 👍 I need to investigate this. |
|
Just a note: Guys from resin.io created this nice tool to do dockerfile templating: https://www.npmjs.com/package/dockerfile-template |
|
I can only speak for the alpine one: it looks ok, but I haven't tested it. I have a different opinion. Personally, I care a lot about image sizes. 99.9 % of the time, a container runs a single process anyway. All that stuff debian comes with - not needed at runtime. That's why there is alpine in the first place. So I'm more of a minimalist guy, we can add stuff in derived images or something ;) That doesn't mean that netdata deps like python should be excluded too. Most of the plugins require python. whether node is needed is debatable. I'm not sure whether it's worth the effort to jinja templating the dockerfiles. If we have to make changes to dockerfile, debug a cross build one or troubleshoot something else, it will only make things complicated with lots of if-else-constructs. I'd say lets keep it simple. It'll get complicated or unreadable enough just considering there are different package managers that require different arguments, differently named packages etc. Let's leave these problems to Ansible and other IaC tools. Sometimes a little code duplication is ok to me ;) But I won't stand in the way if you guys absolutely want it. TL;DR: I favor a Debian-slim for a "normal" linux distro, an Alpine for small images and the non-x86 images as they are. I agree that we do not need other distros such as centos or redhat. If ppl need red hat subscriptions or other specific stuff, they can build their own. |
|
I'd disagree. I think having the same code duplicated by copy+pasting in 6 locations is what causes fragmentation and further issues. If someone is going to make a change only to the x86 alpine image, why are they not making it to the others? We shouldn't be accepting PRs that only fix things in a single environment and ignore all others in the first place. I'd also point out that once you have a single image not based on alpine, you're pulling in the shared layers for e.g. debian already. |
Couldn't agree more.
No one talks here about excluding netdata functionalites.
Pleas run |
|
Anyway, lets get this good change out, and then worry about templating in a different issue/PR. |
I'm well aware. But I believe that you also know that if you use an image that hasn't been rebuilt in while, that it will point out to some "older" layers of debian. Then you add an image that runs on a bleeding edge debian. You end up having multiple debians anyway, just on different patch levels. It's the same with alpine. The difference in MBs/GBs is measurable though. |
|
Oh! We could probably go the other way and delete debian image 😄 |
|
Have you also considered supporting OpenShift? It's basically Kubernetes based, but comes with additional security stuff, such as running in arbitrary user IDs. Docker containers, that run as root (almost every docker hub image) will not run in Openshift, and starting as root and then switching to another in the startup script doesn't work either. Most of the time, it's just a permission issue though. Unsufficient permissions for files and directories. Many people have to build their own docker images of their favorite apps just to run them in openshift. In this PR, setting the UID and adding the user with "useradd" etc. won't matter/work in openshift. I don't know if netdata actually requires an existing user/group or if permissions are enough. Openshift is becoming more and more popular in companies. If netdata could support Openshift I think it will further boost the reputation and popularity of netdata. Openshift is shifting towards prometheus as its default monitoring tool. Maybe netdata won't be used to monitor the host itself, but for app specific monitoring (mongodb, postgres, mysql, http). with netdata's 1-second resolution this would be a real nice combination, since netdata already supports prometheus I believe. I don't know prometheus yet. Maybe it's already monitoring db and web servers already. |
|
I'm against a templating tool for the Dockerfiles. espially against a npm package for templating. Please don't introduce a nodejs dependency to build the docker images. The shell stuff in the Dockerfiles could be moved to a script: |
|
I'm against dropping the alpine image if the primary reason is that it makes maintenance harder. Alpine builds are significantly lighter, and considering that this project gets updated really frequently, that adds up to a lot of saved bandwidth globally for end users. On top of that, you get a much smaller disk footprint and marginally better runtime performance. That's important to me cause I run this on pretty slimmed down/resource constrained machines. I think we should only drop alpine if we are:
I don't know much about using Docker outside of x86 but if we can get the other platforms to switch to alpine instead, that might be an even better way to unify them |
|
This discussion came to something different than discussion about this PR. I think all of us agree to merge this.
I am an OpenShift team member, how could I not consider it, especially when I am sitting next to monitoring guys? 😛
That is why I won't implement any support for OpenShift. I won't block any development towards it, but I cannot implement it since it can be seen as a conflict of interest. Sorry.
And how do you plan on using different
That is not a technical argument.
Why not? We already use nodejs for some plugins. Having different builds for alpine and for debian makes code maintenance harder. In turn harder code maintanance leads to slower introduction of new features and increases number of test which are needed to be done before new version roll out. This cost developer time. And time is money, which is especially true in open source projects since right now everyone here pays with time.
Yeah, whole 100MB.
I need proof.
Thats why I proposed usage of Guys, we cannot support EVERYTHING. I don't know how constrained are your boxes, but I am running netdata in containers on couple of raspberry pi 2 with 16GB SD card and 1Mbit ADSL connection, unless you have greater resource constraints then this argument is totally invalid for me. If someone cares about space, then run cleaning jobs. |
The Node.js plugins are not anywhere near as widely used as the Python plugins, and there are a lot of people who don't want to deal with the monstrosity that is Node.js (and there's whole argument against JS in general, but I don't want to get into that here as it's really OT). Python on the other hand is almost always installed on any arbitrary Linux system (except Alpine, but they're obsessive about minimalism and even there it's not hugely unlikely to have Python except on purpose-built systems that would not likely be building Netdata Docker images anyway), and is generally easy to install on pretty much every other UNIX system. |
|
I would vote to drop shiny docker thingies and build images in CI system. This way maybe we could get docker manifests working at some point. This would allow downloading images by version, independent from cpu architecture. So one could just run If we want to have trustworthy images maybe we should look into Notary? |
|
That is true. If we do go with option 2 we'd also be able to push tagged images, not just latest. Personally I'm indifferent between options 2 and 3. They'll be similar amounts of maintenance in the long term. Notary sounds interesting, but would be even more moving parts and things to go wrong than option 3, but does have some additional benefits. Either way, neither of us are long term maintainers, so it's probably best for someone like @ktsaou to weigh in on where the project should go for this. |
|
I don't think that notary would introduce much more moving parts. It can be quite easily integrated in With that whole process of building trusted images would come down to running modified version of current |
|
@paulfantom is there anything pending on this? Is it ready to be merged? If we are going to merge it, can you identify what do I need to do at docker hub to still provide smooth experience to existing users? |
|
@paulfantom @ktsaou The Docker group id could be set to 999 by default for backwards compatibility. @paulfantom Why is the Env variable named |
|
@pgassmann it is not named
I don't agree. This will cause a security problem on systems where
I think that best we can do is document it. @ktsaou I am on vacation till the end of this week so this will need to wait. To finish this I need to add container building stage to |
|
@ktsaou this is finished. Docker images now can be build and pushed to remote registry using only Steps you need to do to activate automation:
|
|
ok, I am on vacations too. I'll be back by Sunday. So, I'll merge it then. |
| sed -i "s/${PGID}:$/${PGID}:netdata/g" /etc/group | ||
| fi | ||
|
|
||
| /usr/sbin/netdata -u netdata -D -s /host -p "${NETDATA_PORT}" "$@" |
There was a problem hiding this comment.
Note to self: This probably should be wrapped into exec as per https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#entrypoint
|
@paulfantom when would it be a good time to merge this? |
|
I would wait until I get back from my vacations, so till 6.09. |
|
ok, so when you are back, ping me here. |
|
@ktsaou lets do it! |
|
@paulfantom this is merged now! Let my now try to update docker hub. |
|
@ktsaou this won't be an automated build. Building process will happen on travis, not in docker hub infra. Those entries should be deleted and automated build system should be deleted as per https://success.docker.com/article/how-do-i-disable-automated-build-of-a-repository-on-docker-cloud |
|
Wait a moment. People install netdata directly from docker hub. So, instead of building them from source, they download pre-built binaries from docker hub. The ARM versions for example, are not compiled on the target computer. They are just installed. So, the above are jobs that instruct docker hub which versions to build. I don't think I should delete them. |
|
hm... I also suggest to move |
WHAT? This PR implements building docker images on travis. This way we have more control over build process. We don't need automatic builds on docker hub anymore with that as we build and push it ourselves.
It is. In simple projects and netdata isn't one. Storing everything docker-related in docker directory makes everything clean from project management perspective. Especially when Dockerfile now is not enough and you need |
|
To fully enable this PR, someone with owner rights need to do what I wrote in #3995 (comment) |
|
Check this too: https://github.com/firehol/netdata/wiki/Install-netdata-with-Docker
So, we push these on docker hub? |
ok, I missed that.... sorry! I disabled the docker builds: Let me now set these variables.... |
|
ok, added them. |
|
hit "retry" in travis on last job on master branch |
|
Yeap, figured it out already. |
|
ok, this is done. So, the final status on docker hub is:
|


dockerrun.shscript as a container entrypointnetdatawith static UID of 201 and/usr/sbin/nologinas shellmultiarch/alpineas a base for all images.Initially I've got uncompressed image size reduction from 276MB to 111MB and also size reduction for other images:
Images are built with
./docker/build.shcommandResolves #3972