Skip to content

Commit

Permalink
Merge pull request huggingface#181 from QasidSaleem/remove_import_Lis…
Browse files Browse the repository at this point in the history
…tFilter

remove ListFilter from the process_common_crawl_dump example
  • Loading branch information
hynky1999 authored May 10, 2024
2 parents 4d83342 + e41e40c commit 9f5f7b0
Showing 1 changed file with 0 additions and 2 deletions.
2 changes: 0 additions & 2 deletions examples/process_common_crawl_dump.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
GopherQualityFilter,
GopherRepetitionFilter,
LanguageFilter,
ListFilter,
URLFilter,
)
from datatrove.pipeline.readers import WarcReader
Expand Down Expand Up @@ -39,7 +38,6 @@
),
GopherRepetitionFilter(exclusion_writer=JsonlWriter(f"{MAIN_OUTPUT_PATH}/removed/repetitive/{DUMP}")),
GopherQualityFilter(exclusion_writer=JsonlWriter(f"{MAIN_OUTPUT_PATH}/removed/quality/{DUMP}")),
ListFilter(exclusion_writer=JsonlWriter(f"{MAIN_OUTPUT_PATH}/removed/list/{DUMP}")),
JsonlWriter(f"{MAIN_OUTPUT_PATH}/output/{DUMP}"),
],
tasks=8000,
Expand Down

0 comments on commit 9f5f7b0

Please sign in to comment.