## Development Guide This guide provides comprehensive information for developers who want to contribute to the OpenSearch SQL CLI project. - [Development Environment Set Up](#development-environment-set-up) - [Code Architecture Details](#code-architecture-details) - [Run CLI](#run-cli) - [Testing](#testing) - [Style](#style) - [Release Guide](#release-guide) ## Development Environment Setup For complete setup instructions including prerequisites and installation steps, see the main [README.md](README.md#prerequisites) file. Once you have the environment set up, you can proceed with the development-specific information below. ## Code Architecture Details #### Layered Architecture The OpenSearch SQL CLI uses a layered architecture to bridge Python's interactive capabilities with Java's robust OpenSearch client libraries: ``` Python CLI Layer → Py4J Bridge → Java Gateway → OpenSearch Client → OpenSearch Cluster ``` 1. **Python CLI Layer** - Handles user interaction, command parsing, and display formatting - Manages configuration, history, and saved queries - Key components: `main.py`, `interactive_shell.py`, `execute_query.py` 2. **Py4J Bridge** - Enables Python code to access Java objects - Manages communication between Python and Java processes - Key components: `sql_connection.py`, `sql_library_manager.py` 3. **Java Gateway** - Provides entry point for Python to access Java functionality - Initializes connections to OpenSearch - Key components: `Gateway.java`, `GatewayModule.java` 4. **OpenSearch Client** - Handles communication with OpenSearch cluster - Executes queries and processes results - Key components: `Client4.java`/`Client5.java`, `QueryExecution.java` #### Key Classes and Their Responsibilities ##### Python Components - **OpenSearchSqlCli** (`main.py`): Entry point for the CLI, processes command-line arguments - **InteractiveShell** (`interactive_shell.py`): Manages the interactive shell, command history, and user input - **ExecuteQuery** (`execute_query.py`): Handles query execution and result formatting - **SqlConnection** (`sql_connection.py`): Manages connection to the Java gateway - **SqlLibraryManager** (`sql_library_manager.py`): Manages the Java process lifecycle - **SqlVersion** (`sql_version.py`): Handles version detection and JAR file selection ##### Java Components - **Gateway** (`Gateway.java`): Main entry point for Java functionality, exposed to Python via Py4J - **GatewayModule** (`GatewayModule.java`): Guice module for dependency injection - **QueryExecution** (`QueryExecution.java`): Executes queries against OpenSearch - **Client4/Client5** (`Client4.java`/`Client5.java`): HTTP client implementations for different OpenSearch versions #### Version-Specific JAR Building The CLI supports multiple OpenSearch versions by dynamically building version-specific JARs. The build system automatically selects the appropriate HTTP client and its unified query packages based on the provided OpenSearch SQL version: - For OpenSearch 3.x and above: Uses HTTP5 client - For OpenSearch below 3.x: Uses HTTP4 client ##### 1. Version Detection and Build Triggering The CLI supports three methods for specifying the SQL version: 1. **Maven Repository Version** (e.g., `opensearchsql -v 3.1`): - `sql_version.py` parses and normalizes it to a full version (e.g., `3.1.0.0`) - The system checks if a corresponding JAR file exists (e.g., `opensearchsql-3.1.0.0.jar`) - If not found, it automatically triggers the Gradle build process: ```bash # For OpenSearch SQL 3.1.0.0 ./gradlew 3_1_0_0 ``` 2. **Local Directory** (e.g., `opensearchsql --local /path/to/sql/plugin/directory`): - Uses a local directory containing the SQL plugin JAR files - Extracts the version from the JAR filename - Builds a local version-specific JAR: ```bash # For local OpenSearch SQL 3.1.0.0 ./gradlew 3_1_0_0_local -PlocalJarDir=/path/to/sql/plugin/directory ``` 3. **Remote Git Repository** (e.g., `opensearchsql --remote https://github.com/opensearch-project/sql.git -b `): - Clones the specified git repository and branch using: ```bash git clone --branch --single-branch ``` - Extracts the version from the cloned repository's JAR files - Builds a local version-specific JAR using the cloned repository ```bash # For local OpenSearch SQL 3.1.0.0 ./gradlew 3_1_0_0_local -PlocalJarDir=/project_root/remote/git_directory ``` ##### 2. Dynamic Gradle Task Creation The build system uses Gradle's task rules to dynamically create tasks based on version numbers: - When `./gradlew 3_1_0_0` is executed, it creates configurations specific to version 3.1.0.0 - The `createVersionConfigurations` function sets up dependencies and configurations - This allows supporting any OpenSearch version without hardcoding version-specific tasks ##### 3. HTTP Client Selection Based on the version, the system automatically selects the appropriate HTTP client: - HTTP5 for OpenSearch 3.x and above - HTTP4 for OpenSearch below 3.x This selection affects: - Which source sets are compiled (`http4` or `http5`) - Which dependencies are included - How the client connects to OpenSearch ##### 4. Dependencies Configuration The system adds shared dependencies and version-specific dependencies: ```bash # Shared dependencies are added and its specific version dependency: org.opensearch.query:unified-query-common:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-core:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-opensearch:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-ppl:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-sql:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-protocol:3.1.0.0-SNAPSHOT org.opensearch.query:unified-query-datasources:3.1.0.0-SNAPSHOT org.opensearch.client:opensearch-rest-high-level-client:3.1.0 org.opensearch.client:opensearch-rest-client:3.1.0 org.opensearch.client:opensearch-java:3.1.0 org.apache.httpcomponents.core5:httpcore5:5.2 org.apache.httpcomponents.client5:httpclient5:5.2.1 ``` ##### 5. Shadow JAR Creation The `createShadowJarTask` function creates a task to build a fat JAR with all dependencies: - The resulting JAR is named with the specific version (e.g., `opensearchsql-3.1.0.0.jar`) - This JAR includes all necessary dependencies for the specified version - The JAR is then loaded by the Python process when the CLI runs Similarly, the `createLocalShadowJarTask` function creates a task for building a fat JAR using local JAR files: - It accepts a local directory path containing the SQL plugin JAR files - It includes the JAR files from the specified local directory - The resulting JAR is named with the specific version and includes "_local" in the Gradle task name (e.g., `3_1_0_0_local`) This architecture allows the CLI to support multiple OpenSearch versions without requiring separate installations or complex configuration. ## Run CLI - Start an OpenSearch instance from either local, Docker with OpenSearch SQL plugin, or AWS OpenSearch - To launch the cli, use 'wake' word `opensearchsql` followed by endpoint of your running OpenSearch instance. If not specifying any endpoint, it uses http://localhost:9200 by default. If not provided with port number, http endpoint uses 9200 and https uses 443 by default. ### CLI Flow The OpenSearch SQL CLI follows this execution flow when processing queries: 1. **Command Invocation**: `opensearchsql` command will run with its default settings 2. **Initialization Process**: - Version detection determines whether to use HTTP4 (OpenSearch < 3.x) or HTTP5 (OpenSearch ≥ 3.x) client - Connection to OpenSearch cluster is verified - Java Gateway server is started via `sql_library_manager` - Appropriate JAR file is loaded based on OpenSearch version 3. **Query Processing Flow**: ``` Input Query → Python → Java → OpenSearch → Java → Python → Output ``` Detailed steps: 1. User enters query in interactive shell 2. `InteractiveShell.execute_query()` processes the input 3. `ExecuteQuery.execute_query()` prepares the query 4. `sql_connection.query_executor()` sends query to Java gateway 5. `Gateway.queryExecution()` in Java receives the query 6. `QueryExecution.execute()` processes the query: - Determines if it's PPL or SQL - Sends to appropriate service (pplService or sqlService) - Formats results based on requested format (JSON, Table, CSV) 7. Results are returned to Python and displayed to user 4. **Component Interaction**: - Python components use Py4J to communicate with Java - Java components use OpenSearch client libraries to communicate with OpenSearch - HTTP4 or HTTP5 client is used based on OpenSearch version ## Testing ### Automated Testing with Gradle The project includes an automated build and test system that handles setting up a test cluster, loading test data, and running all tests. #### Running All Tests ```bash # Run the complete test suite (Java + Python) ./gradlew build ``` This single command will: 1. Build the Java application 2. Run Java unit tests 3. Clone/pull the OpenSearch SQL repository to `./remote/sql` 4. Start an OpenSearch test cluster in the background 5. Wait for the cluster to be ready 6. Load test data from `test_data/accounts.json` 7. Run all pytest tests 8. Automatically stop the test cluster when done #### Available Gradle Tasks The build system provides several tasks for managing the test cluster: | Task | Description | |------|-------------| | `./gradlew cloneOrPullSqlRepo` | Clone or update the SQL repository in `./remote/sql` | | `./gradlew startTestCluster` | Start the OpenSearch cluster in background | | `./gradlew waitForCluster` | Wait for the cluster to be ready (polls `localhost:9200`) | | `./gradlew loadTestData` | Load `test_data/accounts.json` into the `accounts` index | | `./gradlew pytest` | Run all Python tests (automatically sets up cluster first) | | `./gradlew stopTestCluster` | Stop the running test cluster | #### Test Cluster Management The test cluster is automatically managed with intelligent lifecycle handling: - **External Cluster Detection**: Automatically detects if a cluster is already running at `localhost:9200` - **Cluster Reuse**: If an external cluster exists, it will be reused (and not stopped after tests) - **PID Tracking**: Cluster process ID is stored in `build/cluster.pid` for clusters we start - **External Marker**: Creates `build/cluster.external` flag when using an external cluster - **Log Output**: Cluster logs are written to `build/cluster.log` - **Automatic Cleanup**: Only stops clusters that we started (external clusters are left running) - **Idempotent Data Loading**: Checks if test data already exists before reloading - **Fast Reruns**: Subsequent test runs are faster as they skip cluster startup and data loading if already present #### Test Workflow Scenarios The build system intelligently handles different development scenarios: **Scenario 1: No cluster running** ```bash ./gradlew pytest ``` - Clones/pulls SQL repository - Starts a new cluster - Loads test data - Runs tests - Stops the cluster **Scenario 2: External cluster already running** ```bash # You have a cluster running at localhost:9200 ./gradlew pytest ``` - Detects existing cluster - Marks it as external - Loads test data (if not already present) - Runs tests - Leaves your cluster running **Scenario 3: Running tests multiple times** ```bash ./gradlew pytest # First run: starts cluster, loads data ./gradlew pytest # Second run: reuses cluster and data (much faster!) ``` **Scenario 4: Manual cluster control** ```bash ./gradlew startTestCluster # Start cluster manually # Do development work, run tests multiple times ./gradlew pytest # Tests run against existing cluster ./gradlew stopTestCluster # Stop when done ``` #### Running Python Tests Only If you want to run pytest directly without Gradle: ```bash # Manual approach (without automated cluster setup) cd src/main/python/opensearchsql_cli/tests pytest ``` For this manual approach: - Prerequisites: - Build the application - Start a local OpenSearch instance manually - Install test frameworks: `pip install -r requirements-dev.txt` #### Test Data Test data is located in `test_data/accounts.json` and contains sample account records in bulk format. The automated test system loads this data into the `accounts` index before running tests. #### Remote SQL Repository The build system uses `./remote/sql` to store a local copy of the OpenSearch SQL repository. This directory: - Is automatically created on first build - Is pulled/updated on subsequent builds - Is ignored by git (configured in `.gitignore`) - Used instead of `../sql` for better isolation To use a local SQL build during development, you can still use the `-PuseLocalSql=true` flag, which will reference `./remote/sql` instead of `../sql`. #### Troubleshooting Tests If tests fail or the cluster doesn't start: ```bash # Check cluster logs cat build/cluster.log # Manually stop any stuck cluster processes ./gradlew stopTestCluster # Clean and rebuild ./gradlew clean build ``` For more details on the test structure and writing new tests, refer to [tests/README.md](src/main/python/opensearchsql_cli/tests/README.md). ## Style - Use [black](https://github.com/psf/black) to format code. ``` # Format all Python files black . # Format all Java files ./gradlew spotlessApply ``` ## Release guide - Package Manager: pip - Repository of software for Python: PyPI ### Workflow 1. Update version number 1. Modify the version number in [`__init__.py`](src/main/python/opensearchsql_cli/__init__.py). It will be used by `setup.py` for release. 2. Create/Update `setup.py` (if needed) 1. For more details refer to https://packaging.python.org/tutorials/packaging-projects/#creating-setup-py 3. Update README.md, Legal and copyright files(if needed) 1. Update README.md when there is a critical feature added. 2. Update `THIRD-PARTY` files if there is a new dependency added. 4. Generate distribution archives 1. Make sure you have the latest versions of `setuptools` and `wheel` installed: `python3 -m pip install --user --upgrade setuptools wheel` 2. Run this command from the same directory where `setup.py` is located: `python3 setup.py sdist bdist_wheel` 3. Check artifacts under `sql-cli/dist/`, there should be a `.tar.gz` file and a `.whi` file with correct version. Remove other deprecated artifacts. 5. Upload the distribution archives to TestPyPI 1. Register an account on [testPyPI](https://test.pypi.org/) 2. `python3 -m pip install --user --upgrade twine` 3. `python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*` 6. Install your package from TestPyPI and do manual test 1. `pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple opensearchsql` 7. Upload to PyPI 1. Register an account on [PyPI](https://pypi.org/), note that these are two separate servers and the credentials from the test server are not shared with the main server. 2. Use `twine upload dist/*` to upload your package and enter your credentials for the account you registered on PyPI. You don't need to specify --repository; the package will upload to https://pypi.org/ by default. 8. Install your package from PyPI using `pip install [your-package-name]` ### Reference - https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56 - https://packaging.python.org/tutorials/packaging-projects/ - https://packaging.python.org/guides/using-testpypi/