Data Management
Large datasets shouldn't be uploaded with every job. C3 lets you upload datasets once, then reference them in any job—they mount instantly because they're already in the cloud.
Upload a dataset
c3 data cp ./local-data/ /datasets/my-dataset/
This uploads your data with content-addressed deduplication. Re-uploading unchanged files is instant.
Use it in a job
Reference the dataset in your submission script:
#C3 DATASET /datasets/my-dataset ./data
When your job runs, the dataset appears at ./data with no download wait.
Commands
| Command | Description |
|---|---|
c3 data ls /path/ | List files or datasets |
c3 data cp SRC DST | Copy files (upload or download) |
c3 data rm /path/ | Delete files |
c3 data du /path/ | Show disk usage |
c3 data log /path/ | Show version history |
Versioning
Every upload creates a new version. Your jobs always get exactly the data they expect:
c3 data log /datasets/my-dataset/
VERSION CREATED FILES SIZE
v3 2024-01-15 10:00:00 1000 2.5GB
v2 2024-01-10 09:00:00 1000 2.4GB
v1 2024-01-05 08:00:00 500 1.2GB
Jobs reference the latest version by default, or you can pin to a specific version for reproducibility.