Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce --root-dir #1576

Merged
merged 26 commits into from
Dec 13, 2024
Merged

Introduce --root-dir #1576

merged 26 commits into from
Dec 13, 2024

Conversation

trask
Copy link
Contributor

@trask trask commented Nov 28, 2024

Related to #452 (comment)

I tried a couple of more ambitious refactoring, but they kept spiraling out of (my) control, so tried to keep it minimal.

lychee-lib/src/utils/path.rs Outdated Show resolved Hide resolved
@trask
Copy link
Contributor Author

trask commented Nov 28, 2024

Also, I don't understand the (remaining) CI failures if you could give some hint, thanks!

Update: there are only lint failures remaining now which seem unrelated to this PR, though I'm probably missing something

@trask trask marked this pull request as ready for review November 28, 2024 22:13
lychee-lib/src/types/input.rs Outdated Show resolved Hide resolved
lychee-lib/src/types/base.rs Outdated Show resolved Hide resolved
@mre
Copy link
Member

mre commented Nov 30, 2024

Hey @trask,

this is a very solid attempt and a very long-standing issue of the project, so thanks a lot for looking into that!

I haven't checked all the edge-cases yet, but I think you're on the right track.
However, I wonder if it supports setting base-url and root-path at the same time? They serve a different purpose: base-url is used for relative URLs, while --root-path is used for absolute URLs. Setting both at the same time is expected to be a common scenario.
I've added an explanation to the section in the code that seemed the most relevant.

On another note, we could use the opportunity and deprecate --offline at the same time.
See #668 for more information.
In order to deprecate it, we should add a warning message when it's used and update the help message accordingly to
point people to the new --root-path option.
We can also add a note in the README about the deprecation and create a pull request to update the documentation
at https://github.com/lycheeverse/lycheeverse.github.io.

But we don't have to do that in one go. Let's take it step by step.

@trask
Copy link
Contributor Author

trask commented Dec 1, 2024

Thanks for the feedback! I've incorporated your suggestions.

@trask trask changed the title Introduce --root-path Introduce --root-dir Dec 6, 2024
@mre
Copy link
Member

mre commented Dec 7, 2024

Awesome. Thanks!

From my side, this is good to go. Let's wait for the feedback of at least one other person and then we can merge it.

@trask trask mentioned this pull request Dec 10, 2024
@mre
Copy link
Member

mre commented Dec 10, 2024

@george-gca, any updates? Did you find the time to give it a try?

@thomas-zahner
Copy link
Member

thomas-zahner commented Dec 10, 2024

Code-wise it looks good. But now I'm confused about its usage.

When I'm in the fixtures directory I do:

$ cargo run -- resolve_paths_from_root_dir/ --root-dir resolve_paths_from_root_dir/

Issues found in 1 input. Find details below.

[resolve_paths_from_root_dir/nested/index.html]:
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about/index.html#missing | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about/index.html#fragment | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/another%20page | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about | Cannot find file

🔍 7 Total (in 0s) ✅ 2 OK 🚫 5 Errors

But:

$ cargo run -- resolve_paths_from_root_dir/ --root-dir ..

🔍 7 Total (in 0s) ✅ 7 OK 🚫 0 Errors

@trask Why isn't the first one working? Why is the second working?

@trask
Copy link
Contributor Author

trask commented Dec 10, 2024

@trask Why isn't the first one working? Why is the second working?

relative --root-dir paths are definitely odd (and not covered currently by the cli.rs tests)

--root-dir is prepended to all absolute local links

and so when --root-dir is a relative path, it effectively turns absolute local links into relative links

e.g. running

cargo run -- resolve_paths_from_root_dir/ --root-dir resolve_paths_from_root_dir/

turns the absolute local link /nested into a relative link resolve_paths_from_root_dir/nested which get resolved against the html file's directory and results in file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested

and running

cargo run -- resolve_paths_from_root_dir/ --root-dir ..

turns the absolute local link /nested into a relative link ../nested which get resolved against the html file's directory and results in file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested

It probably makes more sense to resolve relative --root-dir into an absolute dir, I'll give this a try...

@mre
Copy link
Member

mre commented Dec 10, 2024

Good point.

An alternative would be to prevent relative links by throwing an error. It would be simpler to implement while avoiding ambiguity and unwanted side effects.

The help text would need to be updated to indicate that an absolute directory is expected.

We could start with a conservative design and allow for future support of relative paths if we see a use-case.

@trask
Copy link
Contributor Author

trask commented Dec 10, 2024

An alternative would be to prevent relative links by throwing an error. It would be simpler to implement while avoiding ambiguity and unwanted side effects.

The help text would need to be updated to indicate that an absolute directory is expected.

done!

@mre
Copy link
Member

mre commented Dec 11, 2024

@thomas-zahner, what do you think? Can you try it again?

@george-gca
Copy link

george-gca commented Dec 11, 2024

Wouldn't it be better to accept both absolute and relative path, and handle them internally? Maybe doing something like suggested in these answers on stackoverflow? One uses PathBuf itself with std::fs::canonicalize, the other would add a new dependency path-clean.

@george-gca
Copy link

george-gca commented Dec 11, 2024

@mre sorry for the late reply, I haven't had the time during the weekend. Now, would this be useful for my case? How should I use it?

@mre
Copy link
Member

mre commented Dec 11, 2024

Ah, sorry. It's not applicable for your case because you have a more complex routing setup. root-dir is only if you want to prepend an absolute path before a relative link.

let mut collector = Collector::new(opts.config.base.clone())
if let Some(root_dir) = &opts.config.root_dir {
if root_dir.is_relative() {
bail!("`--root_dir` must be an absolute path");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing; I think we should move this to Collector::new. lychee can also be used as library through lychee-lib so that's another entrypoint where this invariant should be enforced

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

impl Collector {
/// Create a new collector with an empty cache
#[must_use]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy told me to remove this:

warning: this function has a #[must_use] attribute with no message, but returns a type already marked as #[must_use]

@norswap
Copy link

norswap commented Dec 13, 2024

Just tested it locally and it works as expected. Good work! 🥳

@norswap, @dkarlovi, @vanbroup, @george-gca, can any of you test this version and provide some feedback? Does it cover your use-cases? 🙏

Unfortunately I haven't been at the company at which we were using this for more than one year now :') They've also moved the specs this was to be used on, so it's hard to say.

@thomas-zahner
Copy link
Member

@trask Awesome, thank you 👍
From my side this looks good @mre

@mre mre merged commit 6d0e94c into lycheeverse:master Dec 13, 2024
7 checks passed
@mre
Copy link
Member

mre commented Dec 13, 2024

Great! Thanks for your work @trask! 👏

@mre
Copy link
Member

mre commented Jan 6, 2025

I changed my mind regarding relative path handling for --root-dir. Toying around with the feature and writing the docs for --root-dir, I found the lack of support for . to be a bit tedious.

# ✅ Good
lychee --root-dir "$(pwd)/public"

# ❌ Bad: will fail
lychee --root-dir ./public/

Maybe we should implement what @george-gca suggested all along: we simply always canonicalize the root-dir:

use std::fs::canonicalize;
use std::path::PathBuf;

fn main() {
        let path = PathBuf::from(".");
        println!("{:?}", canonicalize(path));
}

@trask, @george-gca, @thomas-zahner, I've created a separate issue for that here: #1606.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants