-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crawl https://git-scm.com looking for broken links #986
Conversation
Adds a build matrix entry that uses the broken-link-checker node module to crawl https://git-scm.com, searching through the site recursively, and attempting all links, reporting if they succeed or fail. This should make it easier to identify broken links on the site. (Closes git#957.) Also moves the sudo: line to the top of the file for style (it's a global build matrix configuration, so it only seems right that it belongs with the other global config settings up top).
Hmm. This tests the live site. But when will it get kicked off? I assume whenever we update any PR. But those two things aren't really linked. Ideally you'd check the PR itself to make sure it doesn't contain or cause any broken links. But it's hard to even test a single state anyway, because so much of the content is imported content in the database (that's pre-processed, but with unknown vintages of the ruby code; it depends on what was deployed when a particular version of the manpages got imported, or when I kick off a manual rebuild). So I'm not sure this really matches a Travis build. I think Travis does do periodic jobs, and this seems like it would be a better match for that. |
To check at PR or push time, we could try spawning the web site locally, but that would mean importing all the additional data from git and progit, quite a heavy work, prone to failures. This would be a good idea to periodically run a test on the site, but I'm definitely against adding a dependency to npm for that. There are surely such tools available natively. |
Another complication is that there are known broken links in older versions of the git manpages. We don't fix those, but preserve them in their broken state. So any link-checking would want to avoid digging into old versions at all, I'd think. |
@jnavila - what do you mean by "natively"? A ruby gem, like this one?: https://github.com/endymion/link-checker Also, I just checked and it should be fairly straightforward to set this up as a cron-only job: https://docs.travis-ci.com/user/cron-jobs/#detecting-builds-triggered-by-cron |
@sxlijin a ruby gem for instance, or maybe even a simple correctly crafted |
just found out about a solution using the awesome_bot gem: https://github.com/marmelo/tech-companies-in-portugal/blob/master/.travis.yml |
|
Adds a build matrix entry that uses the broken-link-checker node module to crawl
https://git-scm.com, searching through the site recursively, and attempting all
links, reporting if they succeed or fail. This should make it easier to identify
broken links on the site. (Closes #957.)
Also moves the sudo: line to the top of the file for style (it's a global build
matrix configuration, so it only seems right that it belongs with the other
global config settings up top).