# Running Nexmark Benchmarks with Apache Beam Apache Beam is a layer over other data processing systems with multiple backends ("runners"). It includes its own implementation of the Nexmark benchmarks. These instructions document how to run them for four of its runners: * Direct: This is built into Beam. This is meant for checking correctness and is not optimized for performance. * Flink * Spark * Google Cloud Dataflow Each runner supports both batch and streaming processing. In addition, Beam's Nexmark suite includes an implementation of each query in as many as three forms. All of them are implemented using the Beam native representation in terms of transforms and aggregations. Some of them are also implemented in two forms of SQL: "standard" SQL and ZetaSQL. For a Beam overview, see https://beam.apache.org/get-started/beam-overview/. For information about the Nexmark benchmark for Beam, see https://beam.apache.org/documentation/sdks/java/testing/nexmark/. ## Prerequisites Install the Java Development Kit. These instructions were tested with OpenJDK 21.0.2. ## Setting up Beam You can follow the instructions below to build Nexmark, or run `setup.sh` in this directory. 1. Clone the Beam repository: ``` git clone https://github.com/apache/beam.git ``` If you wish to benchmark a particular version, check it out: ``` (cd beam && git checkout origin/release-2.55.0) ``` 2. Apply `configurable-spark-master.patch`: ``` (cd beam && git am < ../configurable-spark-master.patch) ``` There's no need to run a separate build step. Beam will build the first time you run a benchmark. ## Running the benchmarks Use `run-nexmark.sh` in the parent directory.