docs: Update README

lalithsuresh · lalithsuresh · commit b57ffb970a7b · 2023-07-17T17:41:21.000-07:00
Among other things, it gives a higher-level overview of the project

Signed-off-by: Lalith Suresh &lt;lalith@feldera.com&gt;
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -87,7 +87,7 @@ notification when you git push.
 
 ### Merging a pull request
 
-Since we run benchmarks as part of the CI, it'sa  good practice to preserve the commit IDs of the feature branch
+Since we run benchmarks as part of the CI, it's a good practice to preserve the commit IDs of the feature branch
 we've worked on (and benchmarked). Unfortunately, [the github UI does not have support for this](https://github.com/community/community/discussions/4618)
 (it only allows rebase, squash and merge commits to close PRs).
 Therefore, it's recommended to merge PRs using the following git CLI invocation:
@@ -101,7 +101,7 @@ git push upstream main
 
 ### Code Style
 
-Execute the following command to make `git commit` check the code for formatting issues before commit. It is not yet applied to the sql compiler.
+Execute the following command to make `git push` check the code for formatting issues.
 
 ```shell
 GITDIR=$(git rev-parse --git-dir)
@@ -122,6 +122,44 @@ When opening a new issue, try to roughly follow the commit message format conven
 
 # For developers
 
+## Building DBSP from sources
+
+DBSP is implemented in Rust and uses Rust's `cargo` build system. The SQL
+to DBSP compiler is implemented in Java and uses `maven` as its build system.
+
+You can build the rust sources by runnning the following at the top level of this tree.
+
+```
+cargo build
+```
+
+To build the SQL to DBSP compiler, run the following from `sql-to-dbsp-compiler/SQL-compiler`:
+
+```
+mvn package
+```
+
+If you want to develop DBSP without installing the required toolchains
+locally, you can use Github Codespaces; from
+https://github.com/feldera/dbsp, click on the green `<> Code` button,
+then select Codespaces and click on "Create codespace on main".
+
+## Learning the DBSP Rust code
+
+To learn how the DBSP core works, we recommend starting with the tutorial.
+
+From the project root:
+
+```
+cargo doc --open
+```
+
+Then search for `dbsp::tutorial`.
+
+Another good place to start is the `circuit::circuit_builder` module documentation,
+or the examples folder.  For more sophisticated examples, try looking
+at the `nexmark` benchmark in the `benches` directory.
+
 ## Running Benchmarks against DBSP
 
 The repository has a number of benchmarks available in the `benches` directory that provide a comparison of DBSP's performance against a known set of tests.
@@ -156,9 +194,10 @@ An extensive blog post about the implementation of Nexmark in DBSP:
 
 ## Updating the pipeline manager database schema
 
-Here are some guidelines when contributing code that affects the Pipeline Manager's DB schema.
+The pipeline manager serves as the API server for Feldera. It persists API state in a Postgres DB instance.
+Here are some guidelines when contributing code that affects this database's schema.
 
-* We use SQL migrations to apply the schema to a live database to faciliate upgrades. We use [refinery](https://github.com/rust-db/refinery) to manage migrations.
+* We use SQL migrations to apply the schema to a live database to facilitate upgrades. We use [refinery](https://github.com/rust-db/refinery) to manage migrations.
 * The migration files can be found in `crates/pipeline_manager/migrations`
 * Do not modify an existing migration file. If you want to evolve the schema, add a new SQL or rust file to the migrations folder following [refinery's versioning and naming scheme](https://docs.rs/refinery/latest/refinery/#usage). The migration script should update an existing schema as opposed to assuming a clean slate. For example, use `ALTER TABLE` to add a new column to an existing table and fill that column for existing rows with the appropriate defaults.
-* If you add a new migration script `V{i}`, add tests for migrations from `V{i-1} to V{i}`. For example, add tests that invoke the pipeline manager APIs before and after the migration.
+* If you add a new migration script `V{i}`, add tests for migrations from `V{i-1}` to `V{i}`. For example, add tests that invoke the pipeline manager APIs before and after the migration.
diff --git a/README.md b/README.md
@@ -1,55 +1,109 @@
-# Database Stream Processor
-
-Database Stream Processor (DBSP) is a framework for computing over data streams
-that aims to be more expressive and performant than existing streaming engines.
+# The Feldera Continuous Analytics Platform
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
 [![CI workflow](https://github.com/feldera/dbsp/actions/workflows/ci.yml/badge.svg)](https://github.com/feldera/dbsp/actions)
 [![codecov](https://codecov.io/gh/feldera/dbsp/branch/main/graph/badge.svg?token=0wZcmD11gt)](https://codecov.io/gh/feldera/dbsp)
 [![nightly](https://github.com/feldera/dbsp/actions/workflows/containers.yml/badge.svg)](https://github.com/feldera/dbsp/actions/workflows/containers.yml)
 
-## DBSP Mission Statement
 
-Computing over streaming data is hard.  Streaming computations operate over
-changing inputs and must continuously update their outputs as new inputs arrive.
-They must do so in real time, processing new data and producing outputs with a
-bounded delay.
+The [Feldera Continuous Analytics Platform](https://www.feldera.com/), or Feldera in short, is a
+fast computational engine for *continuous analytics* over data in-motion. It
+allows users to build data pipelines as SQL programs that are continuously
+evaluated as new data arrives from various sources. What makes Feldera
+[unique](#theory) is its ability to *evaluate arbitrary SQL programs
+incrementally*, making it more expressive and performant than existing
+alternatives like streaming engines.
+
+With Feldera, software engineers and data scientists building data pipelines
+are not exposed to to the complexities of querying changing data, an otherwise
+notoriously hard problem. Instead, they can express their
+computations as declarative queries and have Feldera evaluate
+these queries incrementally, correctly and efficiently.
 
-We believe that software engineers and data scientists who build streaming data
-pipelines should not be exposed to this complexity.  They should be able to
-express their computations as declarative queries and use a streaming engine to
-evaluate these queries correctly and efficiently.  DBSP aims to be such an
-engine.  To this end we set the following high-level objectives:
+To this end we set the following high-level objectives:
 
-1. **Full SQL support and more.**  While SQL is just the first of potentially
-many DBSP frontends, it offers a reference point to characterize the
-expressiveness of the engine.  Our goal is to support the complete SQL syntax
-and semantics, including joins and aggregates, correlated subqueries, window
-functions, complex data types, time series operators, UDFs, etc.  Beyond
-standard SQL, DBSP supports recursive queries, which arise for instance in graph
-analytics problems.
+1. **Full SQL support and more.** Our goal is to support the complete SQL
+   syntax and semantics, including joins and aggregates, correlated subqueries,
+   window functions, complex data types, time series operators, UDFs, and
+   recursive queries.
 
-1. **Scalability in multiple dimensions.**  The engine scales with the number and
-complexity of queries, streaming data rate and the amount of state the system
-maintains in order to process the queries.
+1. **Scalability in multiple dimensions.**  The engine scales with the number
+   and complexity of queries, input data rate and the amount of state the
+   system maintains in order to process the queries.
 
 1. **Performance out of the box.**  The user should be able to focus on the
-business logic of their application, leaving it to the system to evaluate this
-logic efficiently.
+   business logic of their application, leaving it to the system to evaluate
+   this logic efficiently.
+
+
+## Architecture
+
+With Feldera, users create data pipelines out of SQL programs and data
+connectors. An SQL program comprises tables and views. Connectors feed data to
+input tables in a program or receive outputs computed by views. Example
+connectors we currently support are Kafka, Redpanda and an HTTP API to push/pull
+directly to and from tables/views. We are working on more connectors such as
+ones for database CDC streams. Let us know of any connectors you'd like us to
+cover.
+
+Feldera fundamentally operates on changes to data, i.e., inserts and deletes to
+tables. This model covers all kinds of data in-motion use cases, like
+insert-only streams of event, log, HTTP and timeseries data, as well as changes
+to traditional databases extracted via CDC streams.
+
+The following diagram shows Feldera's architecture.
+
+![Feldera Architecture](architecture.svg)
+
+## What's in this repository?
+
+This repository comprises all the buildings blocks to run continuous analytics pipelines using Feldera.
+
+* [web UI](web-ui): a web interface for writing SQL, setting up connectors, and managing pipelines.
+* [pipeline-manager](crates/pipeline_manager): serves the web UI and is the REST API server for building and managing data pipelines.
+* [dbsp](crates/dbsp): the core [engine](#theory) that allows us to evaluate arbitrary queries incrementally.
+* [SQL compiler](sql-to-dbsp-compiler): translates SQL programs into DBSP programs.
+* [connectors](crates/adapters/): to stream data in and out of Feldera pipelines.
+
+## Quick start
+
+First, make sure you have [Docker Compose](https://docs.docker.com/compose/) installed.
+
+Next, run the following command to download a Docker Compose file, and use it to bring up
+a DBSP deployment suitable for demos, development and testing:
+
+```text
+curl https://raw.githubusercontent.com/feldera/dbsp/main/deploy/docker-compose.yml | docker compose -f - --profile demo up
+```
+
+It can take some time for the container images to be downloaded. About ten seconds after that, the DBSP
+web interface will become available. Visit [http://localhost:8085](http://localhost:8085) on your browser
+to bring it up. We suggest going through our [demos](https://docs.feldera.io/docs/demos) next.
+
+Our [Getting Started](https://docs.feldera.io/docs/intro) guide has more detailed instructions on running the demo.
+
+## Documentation
+
+To learn more about Feldera, we recommend going through the [documentation](https://docs.feldera.io/docs/intro).
+
+* [Getting started](https://docs.feldera.io/docs/intro)
+* [UI tour](https://docs.feldera.io/docs/tour/)
+* [Demos](https://docs.feldera.io/docs/category/demos)
+* [SQL reference](https://docs.feldera.io/docs/sql/intro)
+* [API reference](https://docs.feldera.io/docs/api/rest/)
 
 ## Theory
 
-The above objectives can only be achieved by building on a solid mathematical
-foundation.  The formal model that underpins our system, also called DBSP, is
+Feldera achieves its objectives by building on a solid mathematical
+foundation.  The formal model that underpins our system, called DBSP, is
 described in the accompanying paper:
 
 - [Budiu, Chajed, McSherry, Ryzhyk, Tannen. DBSP: Automatic
   Incremental View Maintenance for Rich Query Languages, Conference on
   Very Large Databases, August 2023, Vancouver,
   Canada](https://github.com/feldera/dbsp/blob/main/docs/static/vldb23.pdf)
 
-- Here is the [video of a DBSP
-presentation](https://www.youtube.com/watch?v=iT4k5DCnvPU) at the 2023
+- Here is the [a presentation about DBSP](https://www.youtube.com/watch?v=iT4k5DCnvPU) at the 2023
 Apache Calcite Meetup.
 
 The model provides two things:
@@ -59,105 +113,7 @@ queries built out of these operators, and precisely specifies how these queries
 must transform input streams to output streams.
 
 1. **Algorithm.** DBSP also gives an algorithm that takes an arbitrary query and
-generates a dataflow program that implements this query correctly (in accordance
+generates an incremental dataflow program that implements this query correctly (in accordance
 with its formal semantics) and efficiently.  Efficiency here means, in a
 nutshell, that the cost of processing a set of input events is proportional to
 the size of the input rather than the entire state of the database.
-
-## DBSP Concepts
-
-DBSP unifies two kinds of streaming data: time series data and change data.
-
-- **Time series data** can be thought of as an infinitely growing log indexed by
-  time.
-
-- **Change data** represents updates (insertions, deletions, modifications) to
-  some state modeled as a table of records.
-
-In DBSP, a time series is just a table where records are only ever added and
-never removed or modified.  As a result, this table can grow unboundedly; hence
-most queries work with subsets of the table within a bounded time window.  DBSP
-does not need to wait for all data within a window to become available before
-evaluating a query (although the user may choose to do so): like all queries,
-time window queries are updated on the fly as new inputs become available.  This
-means that DBSP can work with arbitrarily large windows as long as they fit
-within available storage.
-
-DBSP queries are composed of the following classes of operators that apply to
-both time series and change data:
-
-1. **Per-record operators** that parse, validate, filter, transform data streams
-one record at a time.
-
-1. The complete set of **relational operators**: select, project, join,
-aggregate, etc.
-
-1. **Recursion**: Recursive queries express iterative computations, e.g.,
-partitioning a graph into strongly connected components.  Like all DBSP queries,
-recursive queries update their outputs incrementally as new data arrives.
-
-In addition, DBSP supports **windowing operators** that group time series data
-into time windows, including various forms of tumbling and sliding windows,
-windows driven by watermarks, etc.
-
-## Architecture
-
-The following diagram shows the architecture of the DBSP platform.  Blocks
-with solid borders indicate existing components.  Blocks with dashed borders
-are on our TODO list.
-
-```text
-                                  Frontends
-                ┌────────────────────────────────────────────┐
-                │┌─────┐  +---------------------------------+│
-                ││ SQL │  | Language bindings (Python, ...) |│
-                │└─────┘  +---------------------------------+│
- I/O adapters   │┌──────────────────────────────────────────┐│
-┌────────────┐  ││                 Optimizer                ││
-│+----------+│  │└──────────────────────────────────────────┘│
-│| Kinesis  |│  └────────────────────────────────────────────┘
-│+----------+│  +--------------------------------------------+
-│┌──────────┐│  |Distributed runtime, scale-out              |
-││  Kafka   ││  |┌──────────────────────────────────────────┐|
-│└──────────┘│  |│                                          │|
-│+----------+│  |│                                          │|
-│|PostgreSQL|│  |│             DBSP core engine             │|
-│+----------+│  |│                                          │|
-│    ...     │  |│                                          │|
-│+----------+│  |└──────────────────────────────────────────┘|
-│|          |│  |┌──────────────────────────────────────────┐|
-│+----------+│  |│            Persistent indexes            │|
-└────────────┘  |└──────────────────────────────────────────┘|
-                +--------------------------------------------+
-```
-
-The DBSP core engine is written in Rust and provides a Rust API for building
-data-parallel dataflow programs by instantiating and connecting streaming
-operators.  Developers can use this API directly to implement complex
-streaming queries.  We are also developing a
-[compiler from SQL to DBSP](sql-to-dbsp-compiler) that
-enables engineers and data scientists to use the engine via a familiar
-query language.  In the future, we will add DBSP bindings for languages
-like Python and Scala.
-
-At runtime, DBSP can consume inputs from and send outputs to
-event streams, e.g., Kafka, databases, e.g., Postgres, and data warehouses,
-e.g., Snowflake.
-
-The distributed runtime will extend DBSP's data-parallel execution model to
-multiple nodes for high availability and throughput.
-
-## Getting started
-
-DBSP is implemented in Rust and uses Rust's `cargo` build system.  You
-can build everything with `cargo build` at the top level of this tree.
-If you want to do development without installing the Rust toolchain
-locally, you can use Github Codespaces: from
-https://github.com/feldera/dbsp, click on the green `<> Code` button,
-then select Codespaces and click on "Create codespace on main".
-
-To learn about using DBSP as a Rust programmer, start by reading the
-[tutorial].  Another good place to start is the
-[`circuit_builder`](`circuit::circuit_builder`) module documentation,
-or the examples folder.  For more sophisticated examples, try looking
-at the `nexmark` benchmark in the `benches` directory.
diff --git a/architecture.svg b/architecture.svg