You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/nanotron/run_train.py", line 234, in <module>
[rank0]: dataloader = get_dataloader(trainer)
[rank0]: File "/home/nanotron/run_train.py", line 204, in get_dataloader
[rank0]: get_dataloader_from_data_stage(
[rank0]: File "/home/nanotron/run_train.py", line 152, in get_dataloader_from_data_stage
[rank0]: train_dataset = Nanoset(
[rank0]: File "/home/nanotron/src/nanotron/data/nanoset.py", line 52, in __init__
[rank0]: DatatroveFolderDataset(
[rank0]: File "/usr/local/lib/python3.10/site-packages/datatrove/utils/dataset.py", line 101, in __init__
[rank0]: raise FileNotFoundError(f'No files matching "{filename_pattern}" found in {folder_path}')
[rank0]: FileNotFoundError: No files matching "datasets/fineweb-edu-dedup/*.ds" found in /home/nanotron/datasets/fineweb-edu-dedup
https://github.com/huggingface/smollm/blob/main/pre-training/README.md
https://github.com/huggingface/smollm/blob/main/pre-training/smollm1/config_smollm1_1B.yaml
smollm1 use some dataset:
the datasets of HF is .parquet,but it get this ".ds" error:
for example "fineweb-edu-dedup":
https://huggingface.co/datasets/argilla-warehouse/fineweb-edu-dedup-filtered/tree/main/data
docs/nanoset.md: https://github.com/huggingface/nanotron/blob/main/docs/nanoset.md
The text was updated successfully, but these errors were encountered: