Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting repetitions after pre-training #1573

Open
tonyv opened this issue Jun 18, 2024 · 0 comments
Open

Getting repetitions after pre-training #1573

tonyv opened this issue Jun 18, 2024 · 0 comments

Comments

@tonyv
Copy link

tonyv commented Jun 18, 2024

Hello, I am pre-training T5X to translate to Japanese on a large corpus of text. I tried to translate a simple "Hello", but it ends up repeating the "Hello" in Japanese several times in escape unicode sequences. The number of times it repeats is equivalent to the number of task feature lengths I have defined.

  1. Is there a setting I can tweak to reduce the number of repetitions similar to CTranslate2?
  2. In my preprocessor for the training task, I add the EOS tokens automatically as follows:
    preprocessors=[
        seqio.preprocessors.tokenize,
        seqio.preprocessors.append_eos_after_trim,
    ],
  1. Any tips on how to reduce repetitions?
@tonyv tonyv changed the title Getting repetitions after translation pre-training Getting repetitions after pre-training Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant