Data Transfer and Batch Tools¶
The app supports Hugging Face dataset transfer from the Data menu and script/API workflows for batch conversion. Dataset JSON inputs follow the OSL JSON Format.
In-App Data Menu¶
Download Dataset from HF...¶
The download dialog asks for:
- repo ID
- branch/revision
- split
- format
- output directory
- optional token
- dry-run mode
It supports JSON split downloads (<split>.json) and Parquet split downloads
(<split>/). Files are written under <output directory>/<revision>/<split>.
For successful non-dry-run JSON downloads, source metadata is written into the JSON root:
hf_repo_idhf_branchhf_split
Dry-run support
Dry-run size estimation is available for JSON downloads. Parquet downloads run as real downloads/conversions.
Upload Dataset to HF...¶
Upload requires an opened dataset JSON from disk.
Upload modes:
- Upload as JSON uploads the current dataset JSON plus every file referenced
by
data[].inputs[].path. - Parquet + WebDataset converts locally, then uploads generated Parquet/WebDataset artifacts.
If the target repository or branch is missing, the app can prompt to create it and retry.
CLI Scripts¶
Run commands from the repository root.
Download Referenced Files¶
python tools/download_osl_hf.py \
--repo-id <org/repo> \
--revision main \
--split test \
--format json \
--output-dir <LOCAL_DIR> \
--dry-run
Upload Referenced Files¶
python tools/upload_dataset_to_hf.py \
--repo-id <org/repo> \
--json-path <local_dataset.json> \
--split test \
--revision main \
--format json
Convert JSON to Parquet + WebDataset¶
python tools/osl_json_to_parquet_webdataset.py \
annotations.json \
/path/to/media/root \
/path/to/output_dataset
Convert Parquet + WebDataset Back to JSON¶
Python Conversion API¶
from opensportslib.tools import convert_json_to_parquet, convert_parquet_to_json
convert_json_to_parquet(json_path="annotations.json", media_root=".", output_dir="out_parquet")
convert_parquet_to_json(dataset_dir="out_parquet", output_json_path="reconstructed.json")
For full script options, run any tool with --help.