Skip to content

Commit 6586977

Browse files
committed
Use a multi-stage Docker build
This build allows us to split the Docker build process into multiple stages, some of which can be run in parallel. * It uses cargo chef to cache and build only the rust dependencies in a separate layer first. This allows changes to the code without new dependencies to be incrementally built from the Docker image, speeding up dev-test cycles. * The rust and java builds happen in parallel, given that they are independent of each other. * The final target is a release build that puts together only the required artifacts from the rust and Java builds. * It adds an optional dev target which also bundles rpk, the python API, and the demo projects. Signed-off-by: Lalith Suresh <lsuresh@vmware.com>
1 parent 20fd150 commit 6586977

3 files changed

Lines changed: 66 additions & 28 deletions

File tree

.dockerignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Ignore target and other large directories.
22
target
3+
**/deploy
34
**/Dockerfile
45
**/pipeline_data
56
**/ldbc-graphalytics-data
67
**/fraud_data
8+
**/.vscode

deploy/Dockerfile

Lines changed: 54 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,72 @@
1-
FROM ubuntu:22.04
2-
3-
# Skip past interactive prompts during apt install
1+
# The base image contains tools to build the code given that
2+
# we need a Java and Rust compiler to run alongside the pipeline manager
3+
# as of now. This will change later.
4+
FROM ubuntu:22.04 AS base
45
ENV DEBIAN_FRONTEND noninteractive
5-
66
RUN apt update && apt install libssl-dev build-essential pkg-config \
77
git gcc clang libclang-dev python3-pip python3-plumbum hub numactl cmake \
88
curl openjdk-19-jre-headless maven netcat jq \
99
adduser libfontconfig1 unzip -y
10+
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
1011

11-
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
12+
# Use cargo-chef to produce a recipe.json file
13+
# to cache the requisite dependencies
14+
FROM base as chef
15+
RUN /root/.cargo/bin/cargo install cargo-chef
16+
RUN /root/.cargo/bin/cargo install cargo-make
17+
WORKDIR app
1218

13-
RUN ~/.cargo/bin/cargo install --no-default-features --force cargo-make
19+
# Cache dependencies from rust
20+
FROM chef AS planner
21+
COPY . .
22+
RUN /root/.cargo/bin/cargo chef prepare --recipe-path recipe.json
1423

15-
# Install rpk
16-
RUN arch=`dpkg --print-architecture`; \
17-
curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-$arch.zip \
18-
&& unzip rpk-linux-$arch.zip -d /bin/ \
19-
&& rpk version \
20-
&& rm rpk-linux-$arch.zip
21-
22-
COPY . /database-stream-processor
24+
# Use the recipe.json file to build dependencies first and cache that
25+
# layer for faster incremental builds of source-code only changes
26+
FROM chef AS builder
27+
COPY --from=planner /app/recipe.json recipe.json
28+
RUN /root/.cargo/bin/cargo chef cook --release --recipe-path recipe.json --bin=dbsp_pipeline_manager
29+
COPY . .
30+
RUN rm /app/crates/dbsp/benches/ldbc-graphalytics.rs \
31+
&& rm /app/crates/dbsp/benches/gdelt.rs \
32+
&& rm /app/crates/nexmark/benches/nexmark.rs \
33+
&& rm /app/crates/nexmark/benches/nexmark-gen.rs
34+
RUN /root/.cargo/bin/cargo build --release --bin=dbsp_pipeline_manager
2335

36+
# Java build can be performed in parallel
37+
FROM base as javabuild
38+
RUN mkdir sql
39+
COPY .git /sql/
40+
COPY sql-to-dbsp-compiler /sql/sql-to-dbsp-compiler
2441
# Update SQL compiler submodule to the version specified in the repo, unless
2542
# the submodule is _not_ in detached head state, which indicates that the user
2643
# is working on the submodule and wants to build a container with their modified
2744
# SQL compiler version.
28-
RUN cd /database-stream-processor && \
29-
if [[ ! -e sql-to-dbsp-compiler/.git || -z $(cd sql-to-dbsp-compiler && git branch --show-current) ]]; \
45+
RUN if [[ ! -e sql-to-dbsp-compiler/.git || -z $(cd sql-to-dbsp-compiler && git branch --show-current) ]]; \
3046
then git submodule update --init; fi
47+
RUN cd sql/sql-to-dbsp-compiler/SQL-compiler && mvn -DskipTests package
3148

32-
RUN cd /database-stream-processor/crates/pipeline_manager \
33-
&& ~/.cargo/bin/cargo make openapi_python \
34-
&& ~/.cargo/bin/cargo install --path . \
35-
&& rm -rf /database-stream-processor/target .git
36-
37-
RUN cd /database-stream-processor/sql-to-dbsp-compiler/SQL-compiler && mvn -DskipTests package
38-
49+
# Minimal image for running the pipeline manager
50+
FROM base as release
3951
ENV PATH="$PATH:/root/.cargo/bin"
52+
COPY --from=builder /app/target/release/dbsp_pipeline_manager dbsp_pipeline_manager
53+
COPY --from=javabuild /sql/sql-to-dbsp-compiler sql-to-dbsp-compiler
54+
COPY . /database-stream-processor
55+
CMD ./dbsp_pipeline_manager --bind-address=0.0.0.0 --working-directory=/working-dir --sql-compiler-home=/sql-to-dbsp-compiler --dbsp-override-path=/database-stream-processor
4056

41-
RUN /root/.cargo/bin/dbsp_pipeline_manager --bind-address=0.0.0.0 --working-directory=/working-dir --sql-compiler-home=/database-stream-processor/sql-to-dbsp-compiler --dbsp-override-path=/database-stream-processor --unix-daemon \
42-
&& /database-stream-processor/demo/create_demo_projects.sh
57+
# The dev target adds an rpk client and demo projects
58+
FROM builder as dev
59+
COPY --from=javabuild /sql/sql-to-dbsp-compiler /sql-to-dbsp-compiler
60+
RUN arch=`dpkg --print-architecture`; \
61+
curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-$arch.zip \
62+
&& unzip rpk-linux-$arch.zip -d /bin/ \
63+
&& rpk version \
64+
&& rm rpk-linux-$arch.zip
65+
RUN cd crates/pipeline_manager && /root/.cargo/bin/cargo make openapi_python
66+
# RUN /app/target/release/dbsp_pipeline_manager --bind-address=0.0.0.0 --working-directory=/working-dir --sql-compiler-home=/sql-to-dbsp-compiler --dbsp-override-path=/app/ --unix-daemon \
67+
# && /app/demo/create_demo_projects.sh
68+
# RUN /app/demo/create_demo_projects.sh
69+
CMD bash
4370

44-
CMD /root/.cargo/bin/dbsp_pipeline_manager --bind-address=0.0.0.0 --working-directory=/working-dir --sql-compiler-home=/database-stream-processor/sql-to-dbsp-compiler --dbsp-override-path=/database-stream-processor
71+
# By default, only build the release version
72+
FROM release

deploy/README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,17 @@ Open your browser and you should now be able to see the pipeline manager dashboa
1616
If you don't, double check that there are no port conflicts on your system (you can view and modify
1717
the port mappings in `deploy/docker-compose.yml`).
1818

19-
If you're a developer and want to bring up a local instance of DBSP from sources, run the following from the
20-
`deploy/` folder:
19+
If you want to bring up a local instance of DBSP from sources, run the following from the `deploy/` folder:
2120

2221
```
22+
docker build -f Dockerfile -t dbspmanager ../
23+
docker compose -f docker-compose-dev.yml up
24+
```
25+
26+
If you'd like a "dev" image which has additional utilities installed (like rpk,
27+
the DBSP python API and some demo projects), build the dev target for the Docker image:
28+
29+
```
30+
docker build -f Dockerfile --target=dev -t dbspmanager ../
2331
docker compose -f docker-compose-dev.yml up
2432
```

0 commit comments

Comments
 (0)