flink-kafka

Running Nexmark Benchmarks on Flink with Kafka

It's useful to be able to run the same benchmark on Feldera and other systems. These instructions explain how to run the Nexmark benchmarks on Flink in a reproducible manner. The benchmarks run on a single physical node using multiple containers using Docker Compose. One container runs the Flink jobmanager, and additional containers (by default 8) run Flink taskmanager.

These instructions run Flink with Kafka as a data source. These instructions are unsatisfactory because they will use all your disk space quickly (1+ TB in 30 minutes). Presumably that is fixable but it is not yet fixed.

cd to the directory that contains this README.md, because the files in this directory are useful for building the Docker container and running Docker Compose.
Download and unpack the Nexmark benchmark:

git clone https://github.com/nexmark/nexmark.git

Apply a couple of patches:

(cd nexmark && patch -p1 < ../avoid-jps-dependency.patch)
(cd nexmark && patch -p1 < ../omit-q10.patch)

Build Nexmark:

(cd nexmark/nexmark-flink && ./build.sh)

If it doesn't build, it might mean you've got the wrong version of the JDK installed. Nexmark and Flink only work with JDK 1.8 and 1.11, not newer versions.

This should output nexmark/nexmark-flink/nexmark-flink.tgz. Be warned that build.sh doesn't bail out on errors, so you might still get something there even if the build partially fails.

Extract Nexmark archive:

tar xzf nexmark/nexmark-flink/nexmark-flink.tgz

Download and unpack Kafka:

wget https://dlcdn.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz
tar xfz kafka_2.13-3.4.0.tgz

Download the Flink connector for Kafka:

wget https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka/1.17.0/flink-sql-connector-kafka-1.17.0.jar

Build a Docker container based on the upstream Flink container that also contains Nexmark and Kafka, and give it the tag nexmark-kafka:

docker build -t nexmark-kafka .

Start the containers:

docker compose -p nexmark up

To reduce output spew, add -d (but sometimes that output can be informative):

docker compose up -d -p nexmark

Run the insert_kafka query to start inserting data into Kafka. This query will continue running in the background even after it "finishes". It will consume your disk space at an incredible rate (over 1 TB in 30 minutes).

You might want to do this from a separate terminal, because the docker compose from the previous step will spew tons of distractions (unless you added -d above).

docker exec nexmark-jobmanager-1 run-nexmark.sh insert_kafka

Run tests.

You can run a single test, a comma-separated list, or all of them:

docker exec nexmark-jobmanager-1 run-nexmark.sh q0,q1,q2
docker exec nexmark-jobmanager-1 run-nexmark.sh all

You can also run a shell in any of the containers, e.g.:

docker exec -it nexmark-jobmanager-1 bash
docker exec -it nexmark-taskmanager-1 bash

You'll probably run out of disk space if you run more than one.

If there are failures, then /opt/flink/log and /opt/nexmark/log can be informative.

When you're done, stop the containers:

docker compose -p nexmark down

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
avoid-jps-dependency.patch		avoid-jps-dependency.patch
config.sh		config.sh
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
flink-conf.yaml		flink-conf.yaml
flink-sql-connector-kafka-1.17.0.jar		flink-sql-connector-kafka-1.17.0.jar
nexmark.yaml		nexmark.yaml
omit-q10.patch		omit-q10.patch
run-nexmark.sh		run-nexmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Running Nexmark Benchmarks on Flink with Kafka

FilesExpand file tree

flink-kafka

Directory actions

More options

Directory actions

More options

Latest commit

History

flink-kafka

Folders and files

parent directory

README.md

Running Nexmark Benchmarks on Flink with Kafka