-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mandatory metadata are not all set #427
Comments
It should indeed ... but it is probably not feasible, because we do not (yet) have all metadata. I suspect the problem is something like the title which we source from the main webpage. Since we source lot of things from the main page (favicon, title, description, ...) one idea could be to modify zimit to run two crawls: one first crawl with Another idea could be to provide fallback values for these mandatory metadata so that ZIM is always valid ; this is simpler, we already do it for the illustration, but it is clearly a bit dirty when it comes to title and description. |
Yes that's a possibility ; I was only suggesting that when we figure out there's a missing title for instance, we fail with a clear message. |
Displaying list of missing metadata is already fixed in scraperlib 5.0.0 (just a matter of releasing it then). I still consider failing a scrape which might have taken hours or even days of crawling just because there is a missing title or description is quite disappointing for the user |
Yes. This has been discussed. You're welcome to revive the discussion if you want. I'd propose at least a flag for zimit SaaS where so replace with default values for them. For our ZIMs, it's a different issue:
|
Here's the log of a zimit run that eventually died with a
ValueError: Mandatory metadata are not all set.
.Should warc2zim fail on missing mandatory metadata before starting the creator?
The text was updated successfully, but these errors were encountered: