Crunchy Bridge for Analytics allows users to query parquet files on S3 directly.
- Setup the cluster:
- Go to https://crunchybridge.com/
- Click
Create Cluster->Create Analytics Cluster - Choose Region (
eu-central-1) wheres3://clickhouse-public-datasets/hits_compatible/hits.parquetis - Click
Analytics-256 - Click
Create Cluster - Step Two: Set Up Analytics Credentials: Click "Skip for now"
- Wait until the state of the machine becomes "Running"
- Setup a VM on
awsin the same region as the clustereu-central-1.
This is to make sure the latency between the server and the client is not high. We are going to need psql on this VM, so you should install sudo yum install -y postgresql16 etc. depending on the linux distro.
- Get the application connection strings:
3.1) Application connection
- Click the "Connection" tab from the left menu
- Pick role: application, Format psql
- Click "Copy"
Set the APPCONNCMD that we are going to use with what you copied above:
export APPCONNCMD='psql postgres://application:XXXX@XXXXX.postgresbridge.com:5432/postgres'3.2) Get the postgres connection string:
- Click the "Connection" tab from the left menu
- Pick role: postgres, Format psql
- Click "Copy"
Set the SUPERUSERCONNCMD that we are going to use with what you copied above:
export SUPERUSERCONNCMD='psql postgres://postgres:XXXX@XXXX.postgresbridge.com:5432/postgres'- Run the script:
./run.sh For the cold run, we directly access to S3 while running the queries. For the warm runs, we first download the file from S3 to a local cache drive, then run the queries. This logic is coded into run.sh script.