Skip to content

Commit

Permalink
adding documentation + catching exception independently of torch version
Browse files Browse the repository at this point in the history
  • Loading branch information
mfarre committed Jul 29, 2024
1 parent 87ad8b5 commit 79b311f
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ These arguments give coarse control over input/output "shape" of the dataset. Fo

## Downloading YouTube Metadata

If we want to download a large amount of YouTube videos with video2dataset we can specify some parameters and also extract useful metadata as well. For directions on how to do so please see this [example](https://github.com/iejMac/video2dataset/blob/main/examples/yt_metadata.md).
If we want to download a large amount of YouTube videos with video2dataset we can specify some parameters - including a proxy to distribute requests - and also extract useful metadata as well. For directions on how to do so please see this [example](https://github.com/iejMac/video2dataset/blob/main/examples/yt_metadata.md).

## Incremental mode

Expand Down
13 changes: 13 additions & 0 deletions examples/yt_metadata.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
### Setting up yt-dlp proxy:
#### Usage

yt-dlp allows you to setup a proxy to send requests to YouTube. We surface this feature through our config file through the `proxy` and the flag `proxy-check-certificate`. If `proxy-check-certificate` is set to False, it supresses HTTPS certificate validation.

```yaml
yt_args:
download_size: 360
download_audio_rate: 44100
proxy: "url:port"
proxy-check-certificate: True / False
```
### Download YouTube metadata & subtitles:
#### Usage
Expand Down
3 changes: 2 additions & 1 deletion video2dataset/dataloader/custom_wds.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,9 +507,10 @@ def __init__(
main_datapipe.apply_sharding(world_size, global_rank)
# synchronize data across processes to prevent hanging if sharding is uneven (which is likely)
main_datapipe = main_datapipe.fullsync()
except ValueError as e:
except (RuntimeError, ValueError) as e:
if str(e) == "Default process group has not been initialized, please make sure to call init_process_group.":
print("torch distributed not used, not applying sharding in dataloader")
pass
else:
raise # re-raise if it's a different ValueError

Expand Down

0 comments on commit 79b311f

Please sign in to comment.