adding documentation + catching exception independently of torch version

iejMac · Jul 29, 2024 · 79b311f · 79b311f
1 parent 87ad8b5
commit 79b311f
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -163,7 +163,7 @@ These arguments give coarse control over input/output "shape" of the dataset. Fo
 
 ## Downloading YouTube Metadata
 
-If we want to download a large amount of YouTube videos with video2dataset we can specify some parameters and also extract useful metadata as well. For directions on how to do so please see this [example](https://github.com/iejMac/video2dataset/blob/main/examples/yt_metadata.md).
+If we want to download a large amount of YouTube videos with video2dataset we can specify some parameters - including a proxy to distribute requests - and also extract useful metadata as well. For directions on how to do so please see this [example](https://github.com/iejMac/video2dataset/blob/main/examples/yt_metadata.md).
 
 ## Incremental mode
 

diff --git a/examples/yt_metadata.md b/examples/yt_metadata.md
@@ -1,3 +1,16 @@
+### Setting up yt-dlp proxy:
+#### Usage
+
+yt-dlp allows you to setup a proxy to send requests to YouTube. We surface this feature through our config file through the `proxy` and the flag `proxy-check-certificate`. If `proxy-check-certificate` is set to False, it supresses HTTPS certificate validation.
+
+```yaml
+yt_args:
+    download_size: 360
+    download_audio_rate: 44100
+    proxy: "url:port"
+    proxy-check-certificate: True / False
+```
+
 ### Download YouTube metadata & subtitles:
 #### Usage
 

diff --git a/video2dataset/dataloader/custom_wds.py b/video2dataset/dataloader/custom_wds.py
@@ -507,9 +507,10 @@ def __init__(
             main_datapipe.apply_sharding(world_size, global_rank)
             # synchronize data across processes to prevent hanging if sharding is uneven (which is likely)
             main_datapipe = main_datapipe.fullsync()
-        except ValueError as e:
+        except (RuntimeError, ValueError) as e:
             if str(e) == "Default process group has not been initialized, please make sure to call init_process_group.":
                 print("torch distributed not used, not applying sharding in dataloader")
+                pass
             else:
                 raise  # re-raise if it's a different ValueError