-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate git-scm.com to a static site, generated via Hugo, served via GitHub Pages #1804
Conversation
1db01e4
to
bd332cc
Compare
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
4bd3b3f
to
7c5e7c5
Compare
🎉 This is great! Thank you so much for picking this up! The demo site looks great! |
👋 Sneaking in here with some thoughts from the search side! On first interactions, the search has some notable issues compared to the production rails search, for a few reasons on both sides of the fence.
(Amazing work migrating this to Hugo! ❤️) |
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Oh wow, Mr Pagefind himself! I'm honored to meet you, @bglw!
I kind of wanted to be able to find stuff in old versions that is no longer present in current versions. That's why I added dscho@e9fa963).
Excellent!
Heh, thank you for that!
Right, I had not worked on that because I hoped that the sorting by relevance would be "good enough"... |
About Heroku
That is true, but here has been an update since that 2022 mail. https://lore.kernel.org/git/ZRHTWaPthX%2FTETJz@nand.local/
It does seem like the PLC is still in favor of moving to a static solution, though. https://lore.kernel.org/git/ZRrfAdX0eNutTSOy@nand.local/
About the preview:Search
That is true. And in both the search results page as well as the little preview ( Minor issuesThere are some broken links in the preview on https://dscho.github.io/git-scm.com/docs/ that lead to https://dscho.github.io/docs/ <topic> There's a broken link on https://dscho.github.io/git-scm.com/about/free-and-open-source/ to https://dscho.github.io/git-scm.com/trademark. On the live site that redirects from https://git-scm.com/trademark to https://git-scm.com/about/trademark (dscho#1) The "Setup and Config" headline on https://dscho.github.io/git-scm.com/docs/ is blue in the preview, but not in the live site. This is not happening for me in local testing. There's some redirect that swallows anchors. https://dscho.github.io/git-scm.com/docs/ links to https://dscho.github.io/git-scm.com/docs/git#_git_commands , which redirects to https://dscho.github.io/git-scm.com/docs/git/ instead of https://dscho.github.io/git-scm.com/docs/git/#_git_commands https://dscho.github.io/git-scm.com/downloads/mac/ has an odd grammar issue that https://git-scm.com/download/mac doesn't. (dscho#2) It says
https://git-scm.com/download/mac correctly says
Also note the slight url change there from dowload to downloads. There is a redirect for that, though, so that should be fine. |
One additional note: There is a commit about porting the old 404 page, 18a3ac2, but I've only seen the generic GitHub pages 404 page on the preview in my testing. |
Switching to pagefind also changed search behaviour in another way. The rails site always searches the english content. Pagefind defaults to what they call multilingual search, i.e. searching only pages in the same language as the one you're searching from. That's theoretically a usability improvement, but with the partial nature of our non-english content (availability of any given language can vary from man page to man page, the book exists in languages that don't have any man pages, everything else only exists in english), we might need a fallback to english here. Pagefind offers an option to force all pages to be indexed as english, but I think we can slightly abuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. Only looked at the first 47 commits
This addresses that part of git#1804 (comment): There are some broken links in the preview on https://dscho.github.io/git-scm.com/docs/ that lead to https://dscho.github.io/docs/ <topic> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
I managed to fix it via 2d0f6c8 |
Hmm. The more I think about it, the more I get convinced that the older versions of the manual pages should be excluded from the search, I thought it was a feature, but it looks as if it incurs more downsides than upsides. |
this was a major effort @dscho , thank you very much! sorry for the silence, but i've been busy with other stuff. in the meanwhile, and to ensure this effort wont be wasted, can you summarize what do you need to make this merge-ready? what do you still need to tackle? where do you need help from other people? :) |
@pedrorijo91 Yes.
The big blocker is the "live search" one. |
Oh, and there's a ton of work still needed to address @rimrul's excellent feedback. |
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Updated via the `update-download-data.yml` GitHub workflow.
Sorry for the delay on my end, I just had a chance to make the cut-over and all appears to be working. I can load git-scm.com correctly on my end after purging caches and my local DNS cache, and it all appears to be working. book.git-scm.com is currently broken (it just redirects to a GitHub Pages site that says "There's no GitHub Pages site here"), but I'm willing to live with that since I suspect very few folks are using the book.git-scm.com address. Hopefully the fix is straightforward-ish on your end @dscho! If you need anything let me know. I think all that's left to do is announce the change to the mailing list and merge this branch, both of which @dscho should definitely have the honor of doing! ❤️ |
Updated via the `update-git-version-and-manual-pages.yml` GitHub workflow.
Updated via the `update-git-version-and-manual-pages.yml` GitHub workflow.
@ttaylorr I believe that this needs a |
@dscho: Thanks for the reference -- I just added a record there, though Cloudflare is taking a little while to propagate it. LMK when you're online tomorrow if you have time whether or not it works for you! Also I noticed that you merged this into 'main', but we have 'gh-pages' as well. Which should be the default branch? |
I merged this into I'd like to keep BTW I just disabled the two Heroku webhooks because we do not want the site to be deployed there anymore. |
That all sounds great to me, thanks!
Thanks again. I'll spin down the account in the next few days, I just wanted to make sure it was still in-tact in case we had to do an emergency revert back to Heroku, which seems unlikely now. Thanks again for all of this great work 😍 |
I like that idea.
Hereby done: https://lore.kernel.org/git/c3e372f6-3035-9e6b-f464-f1feceacaa4b@gmx.de/T/#u |
@ttaylorr It seems to take quite a bit more time than I would have expected. Looking at https://digwebinterface.com/?hostnames=book.git-scm.com&type=&ns=resolver&useresolver=9.9.9.10&nameservers=, I still see:
For comparison, this is what happens with
|
I think that there is some misconfiguration on the GitHub Pages side of things. The difference in dig lookups makes sense there, since the two Do we need to tell GitHub Pages that there is a custom domain that it should be responding to when sending traffic via a CNAME record from book.git-scm.com -> git.github.io? At least that redirect is working, so I think that the configuration issues may be within how we have Pages setup on the GitHub side of things. |
The way I read the documentation, we are supposed to have only a But I have to admit that I then fail to see how this could find the correct GitHub Pages site. Maybe The documentation on GitHub can also be interpreted in the way that any subdomain (with a Maybe the best we can do is to add a new repository at https://github.com/git/book.git-scm.com that has a <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Redirecting…</title>
<link rel="canonical" href="https://git-scm.com/book/en/v2">
<meta http-equiv="refresh" content="0; url=https://git-scm.com/book/en/v2">
<meta name="robots" content="noindex">
</head>
<body>
<script>window.location.replace(document.querySelector("link[rel='canonical']").href + window.location.hash)</script>
<h1>Redirecting…</h1>
<a href="https://git-scm.com/book/en/v2">Click here if you are not redirected.</a>
</body>
</html> Shall we try whether this works? |
I think that's the way it's configured from Cloudflare's perspective (i.e., that there is a CNAME record, but no A record for But since I went ahead and configured that, but it still looks like it's broken. I'm guessing that's either (a) a caching thing (seems unlikely) or (b) GitHub Pages rejects a request coming from book.git-scm.com, because it doesn't think there should be a GitHub Pages site there. I tried Googling for things like "multiple custom subdomains GitHub Pages" but couldn't come up with anything definitive, so I'm guessing that this is unsupported. I'm not opposed to the workaround you came up with, and think that that may be our best path forward. |
@ttaylorr I initialized https://github.com/git/book.git-scm.com and it seems to work (it does not work for https://book.git-scm.com/ -- yet: the response headers suggest that cloudflare cached this from its previous state, but if you try, say, https://book.git-scm.com/abc, it redirects as intended). I guess with this approach, we should actually go for those |
Thanks!
Done. |
And now it does! |
Also nice to see everything works as expected now a new release is out. 👏 |
Changes
This Pull Request adjusts the existing files such that the site is no longer served via a Rails App, but by GitHub Pages instead. A preview can be seen here: https://dscho.github.io/git-scm.com/ (which is generated and deployed from this Pull Request's branch, and will be updated via automation whenever that branch changes).
It is the culmination of a very long, and large, effort that started in February 2017 with the first attempt to migrate
the site to Jekyll. Several years, and a substantial effort by @spraints, @vdye and myself, later, here is the result: No longer a Jekyll site but a Hugo site (because of render times: 20 minutes vs 30 seconds), search implemented using Pagefind, links verified by Lychee.
The main themes of the subsequent migration from the Rails App to a Hugo-generated static site are:
We move the original Rails App files that contain Rails code mixed into HTML to
content/
, where the files defining the pages live in the Hugo world, then modify them to drop the Rails code and replace it with Hugo constructs. More often than not, we separate the commits that move the files from the commits that adjust the contents, to help Git realize that there has been a move (as opposed to a delete/add, Git's rename detection does have its shortcomings). This allows for noticing upstream changes that need to be reflected in moved & modified files when rebasing to upstream.In Hugo setups, the files live in the following locations:
hugo.yml
This is the central configuration file that tells Hugo how to render the site.
layouts/
This is where the "boiler plate" is defined that ties the site together, i.e. the header, the footer and the sidebar as well as the main scaffolding in which the pages' content is to be rendered.
This is the location where most of Hugo's functionality is available and complex stuff can happen such as looping or accessing site parameters.
layouts/partials/
This directory contains recurring templates, i.e. reusable partial layouts that are used to structure the elements of the site. This includes the side bar, how videos are rendered, etc.
layouts/shortcodes/
This directory contains so-called "shortcodes", i.e. reusable elements similar to partial layouts. The major difference is that shortcodes can be used within
content/
while partial layouts can only be used from withinlayouts/
.See https://gohugo.io/content-management/shortcodes/ for more information on this topic.
content/
This defines the content of the pages that are served. Only a subset of Hugo's functionality is available here (the idea is to leave the complicated stuff to the layout used to render the pages). These files have the extension
.html
but need to be processed using Hugo before becoming proper HTML pages. For example, most of these files begin with so-called front matter, i.e. metadata relevant to Hugo, specified using YAML that is enclosed in---
lines.To discern clearly between pages maintained in this repository vs HTML pages that are pre-generated using content from other repositories (such as the ProGit book and the manual pages), the pre-generated HTML pages are tracked in
external/book/
andexternal/docs/
, mapped via Hugo mounts. These files are not meant to be edited directly, and are clearly marked as such by comment at the top of the files, inside the front matter. Instead, these files are intended to be updated via GitHub workflows whenever the external repositories change.static/
These files are not processed by Hugo, but copied as-are. Good for images, for example.
assets/
These files are processed in specific ways. That is where the SASS-based style sheets live, for example.
data/
These files define metadata that can be used in Hugo's functions. For example, it contains the list of documentation categories that are rendered in various ways, and the GUIs that are shown at https://git-scm.com/downloads/guis are defined there.
In contrast to most Hugo-managed sites, we will refrain from using a Hugo theme, and instead stick with the existing style sheets.
Likewise, we refrain from using Markdown at all: The existing site did not use it, therefore it makes little sense to start using it now.
In addition to Hugo's directories, we also have these:
script/
This directory contains scripts to perform recurring tasks such as pre-rendering Git's manual pages into HTML that are then stored inside
external/docs/
.For historical reasons, these are Ruby scripts for the most part, as it is easier to follow the development when that functionality is extracted from the current Rails App and turned into Ruby scripts that can be run stand-alone.
.github/workflows/
and.github/actions/
The latter directory contains a file that defines a custom GitHub Action that accommodates for the lack of Hugo support in GitHub Pages: By default, only Jekyll pages are supported out of the box, but Hugo sites require a custom GitHub workflow to deploy the site.
The former directory contains files that define GitHub workflows that are typically run on a schedule, updating the various parts that are generated from external sources: the Git version, the ProGit Book, manual pages, etc. These workflows essentially keep the rendered HTML files in
content/
up to date with the respective external repositories.These workflows can be seen in action (pun intended) here: https://github.com/dscho/git-scm.com/actions
external/book/
It makes very, very little sense to render the ProGit book from scratch every time the site is deployed (and every time a PR build is run). To avoid that, one of the script/GitHub workflow pairs mentioned earlier populates and updates this directory with the latest version of the ProGit book.
The subdirectories of
external/book/
recapitulate Hugo's standard layout:content/
,data/
,static/
, and Hugo mounts map them into the Hugo project. The only exception to this rule issync/
, which contains.sha
files reflecting the tip commits of the ProGit book and its translations.Note: An alternative to this layout would have been to use submodules. However, the complexities, in particular in GitHub workflows, have been deemed not worth this approach and I opted for simplicity instead.
Also note: The files in
external/
are not meant to be edited directly, and are therefore clearly marked as such by comment at the top of the files, inside the front matter. The comment indicates the script that was used to populate/update the content; This will hopefully direct contributors who are tempted to edit these generated files to the right place to make their changes.external/docs/
Like the
book/
subdirectory, thedocs/
subdirectory contains pre-rendered versions of Git's manual pages and their translations (which is particularly important here because rendering them from scratch would easily take 20 minutes), and it is populated and updated via scripts that are run in regularly-scheduled GitHub workflows.Just like
external/book/sync/
, theexternal/docs/sync/
directory contains.sha
files whose contents reflect the tip commits of the external repositories.In addition, there is the
external/docs/asciidoc/
directory which serves as a cache of "expanded AsciiDoc": many of Git's manual pages include content from other files, and therefore it is non-trivial to determine whether or not a manual page has changed and needs to be re-rendered (essentially, the only way is to expand them by inlining theinclude
d files and then comparing the contents). Caching this content speeds up updating the manual pages drastically.Most of the core logic lives in
layouts/
. Hugo discerns between logic that is allowed inlayouts/
and logic that is allowed incontent/
; The latter can only access so-called "shortcodes". These shortcodes are essentially snippets of Hugo pages and are free to use the entire set of Hugo's functionality.tl;dr whenever we need to do something complicated that is confined to only a few pages, we have to implement it in
layouts/shortcodes/
and insert the corresponding{{< shortcode-name >}}
in the page itself. Whenever we need to do something complicated that is used in more places, it is implemented elsewhere inlayouts/
.Some of the logic that cannot be performed statically (such as telling the user how long ago the latest macOS installer was released, or adjusting the Windows downloads to reflect the CPU architecture indicated by the current user agent) are implemented using Javascript instead.
The site search needs to move to the client side, as there is no longer a server that can perform that service. Luckily, Pagefind matured in the meantime (I have helped, too), a very performant client-side search solution implemented in Javascript that makes use of a pre-computed, fine-grained search index that is loaded incrementally on demand.
In contrast to the Rails App, the static pages are easy to check for broken links. We use Lychee for that (which I helped support GitHub Pages better).
Context
Changes required to finalize the migration in addition to this Pull Request
This Pull Request is not actually meant to be merged, not to the
main
branch at least, but to be pushed to thegh-pages
branch which then should be made the default branch.To successfully deploy to GitHub Pages, the
Pages
configuration was already switched from "Deploy from a branch" to "GitHub Actions":Once everything is golden in this Pull Request and the decision to move to GitHub Pages is final,
git-scm.com
needs to be pointed to GitHub Pages (read:CNAME
needs to be configured to make use of the GitHub Pages-deployed site).The Pull Request branch was actually pushed to
gh-pages
already, reflected by the preview that can be seen at https://git.github.io/git-scm.com/.Why make these changes?
hugo serve -w
, then editing the files to your heart's extent.git remote renom
is not the correct Git command. This page is supposedly generated from thegit-html-l10n
repository but the typo does not exist there. It is quite unclear where the bug is, seeing as https://dscho.github.io/git-scm.com/docs/git-remote/fr does not show the bug. I am still flummoxed how this bug could be fixed, as I haven't found the culprit despite investigating for multiple hours. This type of bug will be much easier to fix in the Hugo site than in the current Rails App, where this bug persists to this day.