# Contributing to Feldera The Feldera team welcomes contributions from the community. Before you start working with Feldera, please read our [Developer Certificate of Origin (DCO)](https://developercertificate.org/). To acknowledge the DCO, sign your commits by adding `Signed-off-by: Your Name ` to the last line of each Git commit message. Your signature certifies that you wrote the patch or have the right to pass it on as an open-source patch. The e-mail address used to sign must match the e-mail address of the Git author. If you set your `user.name` and `user.email` git config values, you can sign your commit automatically with `git commit -s`. ## Dependencies Our team develops and tests using Linux and MacOS. Windows Subsystem for Linux works fine too. The Feldera container images and CI workflows use Linux. You can see our setup in our [Dockerfile](deploy/Dockerfile). Our dependencies for building the project are: - C and C++ compiler toolchain (e.g., gcc, gcc++) - cmake - libssl-dev - libsasl2-dev - zlib1g-dev - a Rust tool chain (install rustup and the default toolchain) - a Java Virtual Machine (at least Java 19) - maven - graphviz - Python 3.10 - Bun (https://bun.sh/docs/installation) Additional dependencies are automatically installed by the Rust, maven, Python, and TypeScript build tools. ## Contribution Flow ### Forking We recommend forking the Feldera repository and contributing from a fork. This [page](https://docs.github.com/en/get-started/quickstart/fork-a-repo) has instructions on how to fork a repository. After forking do not forget to add Feldera as a remote repository: ```shell git remote add upstream https://github.com/feldera/feldera.git ``` ### Workflow This is a rough outline of what a contributor's workflow looks like: - Create a topic branch from where you want to base your work - Make commits of logical units - Make sure your commit messages are in the proper format (see below) - Push your changes to a topic branch in the repository (push to your fork if you don't have commit access to the Feldera repository --- pushing directly to the repo is preferred because then CI will be able to add benchmark results to the PR in the comments). - Submit a pull request Example: ``` shell git checkout -b my-new-feature main git commit -a git push origin my-new-feature ``` ### Staying In Sync With Upstream When your branch gets out of sync with the feldera/main branch, use the following to update: ``` shell git checkout my-new-feature git fetch -a git pull --rebase upstream main git push --force-with-lease upstream my-new-feature ``` If you don't have permissions replace the last command with ``` git push --force-with-lease origin my-new-feature ``` ### Updating pull requests If your PR fails to pass CI or needs changes based on code review, you'll most likely want to squash these changes into existing commits. If your pull request contains a single commit or your changes are related to the most recent commit, you can simply amend the commit. ``` shell git add git commit --amend git push --force-with-lease origin my-new-feature ``` If you need to squash changes into an earlier commit, you can use: ``` shell git add git commit --fixup git rebase -i --autosquash main git push --force-with-lease origin my-new-feature ``` Be sure to add a comment to the PR indicating your new changes are ready to review, as GitHub does not generate a notification when you git push. ### Merging a pull request Since we run benchmarks as part of the CI, it's a good practice to preserve the commit IDs of the feature branch we've worked on (and benchmarked). Unfortunately, [the github UI does not have support for this](https://github.com/community/community/discussions/4618) (it only allows rebase, squash and merge commits to close PRs). Therefore, it's recommended to merge PRs using the following git CLI invocation: ```shell git checkout main git merge --ff-only feature-branch-name git push upstream main ``` ### Code Style Execute the following command to make `git push` check the code for formatting issues. ```shell GITDIR=$(git rev-parse --git-dir) ln -sf $(pwd)/scripts/pre-push ${GITDIR}/hooks/pre-push ``` ### Formatting Commit Messages We follow the conventions on [How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/). Be sure to include any related GitHub issue references in the commit message. See [GFM syntax](https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown) for referencing issues and commits. ## Reporting Bugs and Creating Issues When opening a new issue, try to roughly follow the commit message format conventions above. # For developers ## Building Feldera from sources Feldera is implemented in Rust and uses Rust's `cargo` build system. The SQL to DBSP compiler is implemented in Java and uses `maven` as its build system. You can build the rust sources by runnning the following at the top level of this tree. ``` cargo build ``` To build the SQL to DBSP compiler, run the following from `sql-to-dbsp-compiler`: ``` ./build.sh ``` If you want to develop Feldera without installing the required toolchains locally, you can use Github Codespaces; from https://github.com/feldera/feldera, click on the green `<> Code` button, then select Codespaces and click on "Create codespace on main". ## Learning the DBSP Rust code DBSP is a key crate that powers Feldera's pipelines. To learn how the DBSP core works, we recommend starting with the tutorial. From the project root: ``` cargo doc --open ``` Then search for `dbsp::tutorial`. Another good place to start is the `circuit::circuit_builder` module documentation, or the examples folder. For more sophisticated examples, try looking at the `nexmark` benchmark in the `benches` directory. ## Running Benchmarks against DBSP The repository has a number of benchmarks available in the `benches` directory that provide a comparison of DBSP's performance against a known set of tests. Each benchmark has its own options and behavior, as outlined below. ### Nexmark Benchmark You can run the complete set of Nexmark queries, with the default settings, with: ```shell cargo bench --bench nexmark ``` By default this will run each query with a total of 100 million events emitted at 10M per second (by two event generator threads), using 2 CPU cores for processing the data. To run just the one query, q3, with only 10 million events, but using 8 CPU cores to process the data and 6 event generator threads, you can run: ```shell cargo bench --bench nexmark -- --query q3 --max-events 10000000 --cpu-cores 8 --num-event-generators 6 ``` For further options that you can use with the Nexmark benchmark, ```shell cargo bench --bench nexmark -- --help ``` An extensive blog post about the implementation of Nexmark in DBSP: ## Updating the pipeline manager database schema The pipeline manager serves as the API server for Feldera. It persists API state in a Postgres DB instance. Here are some guidelines when contributing code that affects this database's schema. * We use SQL migrations to apply the schema to a live database to facilitate upgrades. We use [refinery](https://github.com/rust-db/refinery) to manage migrations. * The migration files can be found in `crates/pipeline-manager/migrations` * Do not modify an existing migration file. If you want to evolve the schema, add a new SQL or rust file to the migrations folder following [refinery's versioning and naming scheme](https://docs.rs/refinery/latest/refinery/#usage). The migration script should update an existing schema as opposed to assuming a clean slate. For example, use `ALTER TABLE` to add a new column to an existing table and fill that column for existing rows with the appropriate defaults. * If you add a new migration script `V{i}`, add tests for migrations from `V{i-1}` to `V{i}`. For example, add tests that invoke the pipeline manager APIs before and after the migration. ## Logging By default, the pipeline-manager and pipelines install a `tracing` subscriber which logs the Feldera crates at INFO level and all other crates at WARN level. This can be overridden by setting the `RUST_LOG` environment variable. For example, the following would be the same as the default with additionally backtrace enabled: ```bash RUST_BACKTRACE=1 RUST_LOG=warn,pipeline_manager=info,feldera_types=info,project=info,dbsp=info,dbsp_adapters=info,dbsp_nexmark=info cargo run --package=pipeline-manager --bin pipeline-manager -- --dev-mode ``` ## Release process The release process is done through github actions. Launch the "Create a release" action manually from the github actions UI. You have to provide a Git SHA which you want to release as the new version. The release CI scripts will then run in this order: * ci-release.yml - Publishes a new release on github for the commit - Adds the binaries that we built during the merge queue - Tags the docker image that we also built during the merge queue as $version and `latest` * ci-post-release.yml - Releases the python library to pypi - Releases the rust crates to crates.io - Determines the next version (this is controlled by a `RELEASE_NEXT_VERSION` variable in the repo settings) - Bumps the versions in Cargo.toml and pyproject.toml and openapi.yaml to the next version - Commits and pushes the changes to main Note that the release process requires that the commit you want to release was merged into main through the merge queue, otherwise the build artifacts will not be available.