Skip to content

Reproducible builds#4334

Merged
ethomson merged 2 commits intolibgit2:masterfrom
pks-t:pks/reproducible-builds
Sep 20, 2017
Merged

Reproducible builds#4334
ethomson merged 2 commits intolibgit2:masterfrom
pks-t:pks/reproducible-builds

Conversation

@pks-t
Copy link
Copy Markdown
Member

@pks-t pks-t commented Aug 30, 2017

Reproducible builds have the aim of generating the exact same binary files for the same input files, thus giving an actual verifiable path from source code to binary code. So this is actually a security feature.

I've set out to make our build system fully deterministic in order to enable reproducible builds. Unforunately, the expected epic journey was more of a small trip out of the door, as most stuff is already built in a deterministic way. There were only two small outliers to this.

The first one is our test suite. The "generate.py" script, which generates our test suite definitions, dumped the modules in a non-deterministic way. As such, our clar test suite was compiled with differently ordered structs and was thus not deterministic.

The second one was how we generate static libraries. The tools ar(1) and ranlib(1) both are non-deterministic by default because they do inlike information like UID, GID and timestamps into the resulting static archive. This can be turned off by enabling the deterministic mode via a simple flag. While this sounds rather simple, I don't really like the solution for the CMake build system, as there is no simple way to just pass in additional flags to these commands. Instead, we have to override the complete commands as defined by three variables. We could hide this behind a simple build-time option "DETERMINISTIC_BUILD" or similar.

All in all, this leaves us with three files which are not reproducible in the build directory (assuming the path to the build directory does not change): two of them are log files and the third is the clar cache. The first two are indeterministic by definiton and should stay so, the third is too unimportant to care. As it is a simple serialization of Python objects via pickle, there's also no easy fix here (I think, though I may be mistaken).

The script I've used to test:

#!/bin/sh

set -e

for i in 1 2
do
    rm -rf /tmp/build /tmp/build${i}.sha1sum
    mkdir -p /tmp/build
    cd /tmp/build
    cmake /home/pks/Development/libgit2
    make -j5

    find . -type f |
    sort |
    while read f
    do
        sha1sum "$f" >>/tmp/build${i}.sha1sum
    done
done

diff -u /tmp/build1.sha1sum /tmp/build2.sha1sum | grep '^+'

pks-t added 2 commits August 30, 2017 21:56
The script "generate.py" is used to parse all test source files for unit
tests. These are then written into a "clar.suite" file, which can be
included by the main test executable to make available all test suites
and unit tests.

Our current algorithm simply collects all test suites inside of a dict,
iterates through its items and dumps them in a special format into the
file. As the order is not guaranteed to be deterministic for Python
dictionaries, this may result in arbitrarily ordered C structs. This
obviously defeats the purpose of reproducible builds, where the same
input should always result in the exact same output.

Fix this issue by sorting the test suites by name previous to dumping
them as structs. This enables reproducible builds for the libgit2_clar
file.
By default, both ar(1) and ranlib(1) will insert additional information
like timestamps into generated static archives and indices. As a
consequence, generated static archives are not deterministic when
created with default parameters.

Both programs do support a deterministic mode, which will simply zero
out undeterministic information with `ar D` and `ranlib -D`.
Unfortunately, CMake does not provide an easy knob to add these command
line parameters. Instead, we have to redefine the complete command
definitons stored in the variables CMAKE_C_ARCHIVE_CREATE,
CMAKE_C_ARCHIVE_APPEND and CMAKE_C_ARCHIVE_FINISH.

Introduce a new build option `ENABLE_REPRODUCIBLE_BUILDS`. This option
is available on Unix-like systems with the exception of macOS, which
does not have support for the required flags. If the option is being
enabled, we add those flags to the invocation of both `ar` and `ranlib`
to enable deterministically building the static archive.
@pks-t pks-t force-pushed the pks/reproducible-builds branch from 8294a2a to d630887 Compare September 15, 2017 06:21
Comment thread CMakeLists.txt
OPTION( ENABLE_WERROR "Enable compilation with -Werror" OFF )
IF (UNIX AND NOT APPLE)
OPTION( ENABLE_REPRODUCIBLE_BUILDS "Enable reproducible builds" OFF )
ENDIF()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why UNIX AND NOT APPLE? Are these GNU-only settings? If so, what about (say) FreeBSD? I wonder if there's a better way to detect GNUness...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those options are not available on all implementations of "ar" and "ranlib", unfortunately. I think being GNU is not even sufficient here, as those options were introduced not that long in the past. I even think Ubuntu 14.04 does not have the ability to build it like this.

So originally, I intended to do implement this mode as the default, such that all builds are deterministic. But as I saw that it wasn't available on quite a lot of platforms, I simply made it an option such that the distributor can decide for himself if he needs reproducible builds or not. And in case he knows what a reproducible build is and what it is for, he probably also has enough knowledge to fix his toolchain.

So I bet that some BSDs have the ability to have reproducible builds. I'd at least expect OpenBSD to have them, regarding their focus on security. But as far as I know, this is not at all available on macOS. So yeah, we could probably come up with something which just tests if those tools support the required flags. But in the end, I don't think it's really gaining us much, as it is only an option for experts who know what they are doing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That seems reasonable...

@ethomson ethomson merged commit 524c1d3 into libgit2:master Sep 20, 2017
@pks-t pks-t deleted the pks/reproducible-builds branch November 11, 2017 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants