Skip to main content

Data Management

Large datasets shouldn't be uploaded with every job. C3 lets you upload datasets once, then reference them in any job—they mount instantly because they're already in the cloud.

Upload a dataset

c3 data cp ./local-data/ /datasets/my-dataset/

This uploads your data with content-addressed deduplication. Re-uploading unchanged files is instant.

Use it in a job

Reference the dataset in your submission script:

#C3 DATASET /datasets/my-dataset ./data

When your job runs, the dataset appears at ./data with no download wait.

Commands

CommandDescription
c3 data ls /path/List files or datasets
c3 data cp SRC DSTCopy files (upload or download)
c3 data rm /path/Delete files
c3 data du /path/Show disk usage
c3 data log /path/Show version history

Versioning

Every upload creates a new version. Your jobs always get exactly the data they expect:

c3 data log /datasets/my-dataset/
VERSION   CREATED              FILES   SIZE
v3 2024-01-15 10:00:00 1000 2.5GB
v2 2024-01-10 09:00:00 1000 2.4GB
v1 2024-01-05 08:00:00 500 1.2GB

Jobs reference the latest version by default, or you can pin to a specific version for reproducibility.