Normalize all line endings to Unix on the way in #78

DavisVaughan · 2024-12-02T21:46:47Z

Closes #74

DavisVaughan · 2024-12-02T21:47:45Z

.gitattributes

 * text=auto eol=lf
+
+# Windows specific test files where we need CRLF endings
+crates/air_r_formatter/tests/specs/r/crlf/*.R text eol=crlf


This sets it so that all files are checked out with LF endings except this one directory, which is forced to have CRLF endings (even on Mac, and particularly on CI)

DavisVaughan · 2024-12-02T21:49:15Z

crates/air/src/commands/format.rs

 fn format_file(path: &PathBuf) -> anyhow::Result<ExitStatus> {
-    let text = std::fs::read_to_string(path)?;
+    let contents = std::fs::read_to_string(path)?;
+
+    let line_ending = line_ending::infer(&contents);
+
+    // Normalize to Unix line endings
+    let contents = match line_ending {
+        LineEnding::Lf => contents,
+        LineEnding::Crlf => line_ending::normalize(contents),
+    };

    let parser_options = RParserOptions::default();
-    let parsed = air_r_parser::parse(text.as_str(), parser_options);
+    let parsed = air_r_parser::parse(contents.as_str(), parser_options);

    if parsed.has_errors() {
        return Ok(ExitStatus::Error);
    }

-    let formatter_options = RFormatOptions::default();
+    // TODO: Respect user specified `LineEnding` option too, not just inferred line endings
+    let line_ending = match line_ending {
+        LineEnding::Lf => biome_formatter::LineEnding::Lf,
+        LineEnding::Crlf => biome_formatter::LineEnding::Crlf,
+    };
+
+    let formatter_options = RFormatOptions::default().with_line_ending(line_ending);


The idea is that the cli should

infer the line endings from the contents

normalize to unix before parsing/formatting

convert back to inferred line endings on the way out (eventually respecting a forced LineEnding option if the user set one)

DavisVaughan · 2024-12-02T21:50:17Z

crates/air_formatter_test/src/spec.rs

        let input_code = std::fs::read_to_string(input_file).unwrap();

+        // Normalize to Unix line endings
+        let input_code = line_ending::normalize(input_code);
+


We don't have a way to set options on a per test basis right now, but eventually it will be nice to have a test where we force the format output to be CRLF to ensure it is working right

DavisVaughan · 2024-12-02T21:53:40Z

crates/line_ending/src/lib.rs

+
+use memchr::memmem;
+
+static FINDER: LazyLock<memmem::Finder> = LazyLock::new(|| memmem::Finder::new(b"\r\n"));


One of the benefits of Finder is that you can construct one up front and reuse it, like with regexes
https://github.com/astral-sh/uv/blob/81569c47bfa91b24ff0712baf1c001ef9604676e/crates/uv-scripts/src/lib.rs#L17

DavisVaughan · 2024-12-02T21:54:42Z

crates/line_ending/src/lib.rs

+    Lf,
+
+    /// Carriage Return + Line Feed characters (\r\n), common on Windows
+    Crlf,


I switched to the convention that biome_formatter::LineEnding uses so it maps over nicely

DavisVaughan · 2024-12-02T21:56:25Z

crates/line_ending/src/lib.rs

+/// # Source
+///
+/// ---
+/// authors = ["rust-analyzer team"]
+/// license = "MIT OR Apache-2.0"
+/// origin = "https://github.com/rust-lang/rust-analyzer/blob/master/crates/rust-analyzer/src/line_index.rs"
+/// ---
+pub fn normalize(x: String) -> String {


I feel like it is more flexible to move towards providing attribution in doc comments

It lets us structure the folders however we want, without needing a rust-analyzer folder

It allows us to change function names and whatnot while still pointing to the original source

DavisVaughan · 2024-12-06T18:52:14Z

I feel pretty good about this one, we can iterate as needed!

DavisVaughan commented Dec 2, 2024

View reviewed changes

DavisVaughan requested a review from lionel- December 2, 2024 21:57

DavisVaughan added 4 commits December 6, 2024 13:48

Normalize line endings to Unix everywhere on the way in

1bfcf37

Turn multiline raw string test back on

aae8d8b

Explicitly add a multiline string test

9a3938a

Add CRLF formatter test

12cb862

DavisVaughan force-pushed the feature/normalize-line-endings branch from 9e89a60 to 12cb862 Compare December 6, 2024 18:48

DavisVaughan merged commit 5d5a2db into main Dec 6, 2024
4 checks passed

DavisVaughan deleted the feature/normalize-line-endings branch December 6, 2024 18:52

DavisVaughan mentioned this pull request Jan 3, 2025

Don't normalize strings in the CLI #127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize all line endings to Unix on the way in #78

Normalize all line endings to Unix on the way in #78

DavisVaughan commented Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan Dec 2, 2024

DavisVaughan commented Dec 6, 2024


		use memchr::memmem;

		static FINDER: LazyLock<memmem::Finder> = LazyLock::new(\|\| memmem::Finder::new(b"\r\n"));

Normalize all line endings to Unix on the way in #78

Normalize all line endings to Unix on the way in #78

Conversation

DavisVaughan commented Dec 2, 2024

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan Dec 2, 2024

Choose a reason for hiding this comment

DavisVaughan commented Dec 6, 2024