Skip to content

aws-samples/aurora-dsql-loader

Aurora DSQL Data Loader

Fast, parallel data loader for Aurora DSQL. Load CSV, TSV, and Parquet files into DSQL with automatic schema detection and progress tracking.

Migrating from Python v1? See CHANGELOG.md.

Quick Start

1. Install

Download pre-built binary: Latest releases

Or build from source:

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and install
git clone https://github.com/aws-samples/aurora-dsql-loader.git
cd aurora-dsql-loader
cargo install --path .

2. Configure AWS credentials

aws configure
# or
aws sso login

3. Load data

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table my_table

That's it! The loader will:

  • Auto-detect the file format (CSV, TSV, or Parquet)
  • Infer the schema from your data
  • Load data in parallel with progress tracking
  • Handle retries and errors automatically

Common Examples

Load from S3:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri s3://my-bucket/data.parquet \
  --table analytics_data

Create table automatically:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table new_table \
  --if-not-exists

Validate without loading:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table my_table \
  --dry-run

Key Features

  • Fast: Parallel loading with configurable workers
  • Smart: Auto-detects file format and region
  • Reliable: Automatic retries and fault-tolerant loading
  • Flexible: Works with local files or S3 URIs
  • Formats: CSV, TSV, and Parquet support

Resuming Failed Loads

If a load fails mid-execution, resume from where it left off using --resume-job-id:

# Initial load (note the Job ID in output)
aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri large-file.csv \
  --table my_table \
  --manifest-dir ./my-load-manifest

# Resume after failure
aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri large-file.csv \
  --table my_table \
  --manifest-dir ./my-load-manifest \
  --resume-job-id abc-123-def-456

Resume automatically retries failed chunks and skips completed ones. For safety against duplicates on retry, use unique constraints on your table—the loader will use ON CONFLICT DO NOTHING to skip duplicates.

Requirements

  • Rust: 1.85 or later (for building)
  • AWS: Aurora DSQL cluster with dsql:DbConnectAdmin or dsql:DbConnect permissions
  • Credentials: Configured via AWS CLI, SSO, or IAM role

Options

See all options with:

aurora-dsql-loader load --help

Troubleshooting

Authentication error?

aws sts get-caller-identity  # Verify credentials

Build error?

rustup update stable  # Update Rust

Connection error? See Aurora DSQL Troubleshooting

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT-0 License. See LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Languages