Data Management

Large datasets shouldn't be uploaded with every job. C3 lets you upload datasets once, then reference them in any job—they mount instantly because they're already in the cloud.

Upload a dataset

c3 data cp ./local-data/ /datasets/my-dataset/

This uploads your data with content-addressed deduplication. Re-uploading unchanged files is instant.

Use it in a job

Reference the dataset in your submission script:

#C3 DATASET /datasets/my-dataset ./data

When your job runs, the dataset appears at ./data with no download wait.

Commands

Command	Description
`c3 data ls /path/`	List files or datasets
`c3 data cp SRC DST`	Copy files (upload or download)
`c3 data rm /path/`	Delete files
`c3 data du /path/`	Show disk usage
`c3 data log /path/`	Show version history

Versioning

Every upload creates a new version. Your jobs always get exactly the data they expect:

c3 data log /datasets/my-dataset/

VERSION   CREATED              FILES   SIZE
v3        2024-01-15 10:00:00  1000    2.5GB
v2        2024-01-10 09:00:00  1000    2.4GB
v1        2024-01-05 08:00:00  500     1.2GB

Jobs reference the latest version by default, or you can pin to a specific version for reproducibility.

Upload a dataset​

Use it in a job​

Commands​

Versioning​

Upload a dataset

Use it in a job

Commands

Versioning