Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash: Fix CR handling #4477

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

k-takata
Copy link
Contributor

@k-takata k-takata commented Mar 31, 2024

Fixes #1839

0005-bash-4.3-msys2-fix-lineendings.patch adds CRLF support. However, 0001-bash-4.4-cygwin.patch already added igncr option to Bash to support CRLF.

I confirmed that the Cygwin version of Bash has also the same issue with #1839 when the igncr option is set.

After debugging, I found that there is an issue in rewind_input_string() in parser.y that it doesn't take the CR into account.

This PR adds the following changes:

  • Set the igncr option by default.
  • Remove unnecessary changes in yy_input_name() in parser.y and use the igncr implementation.
  • Modify rewind_input_string() to take the CR into account. (It might be better to apply a similar change to the Cygwin version of Bash.)
  • Remove all the changes from y.tab.c. This file should be automatically generated from parser.y.
  • Set LC_ALL=C.UTF-8 when running make check for running on non-English locales.
  • Add a test: run-ps1lf.

@lazka
Copy link
Member

lazka commented Apr 15, 2024

Sorry for the delay. I'll try to find some time/motivation for a more in-depth look.

Do you see any potential problems with the default changing for existing users?

@k-takata
Copy link
Contributor Author

0005-bash-4.3-msys2-fix-lineendings.patch and the igncr option of 0001-bash-4.4-cygwin.patch had duplicated functionality.
MSYS2's bash has already done almost the same thing as igncr is set to 1. So, changing the default of igncr (and removing the duplicated part) doesn't have potential issues, I think. (If a user explicitly unset igncr, the behavior will be changed, though.)


In this PR, the most important part is here:

diff --git a/parse.y b/parse.y
index 8fd24a1c..232a33dc 100644
@@ -1678,7 +1684,16 @@ rewind_input_string ()
      into account, e.g., $(...\n) */
   xchars = shell_input_line_len - shell_input_line_index;
   if (bash_input.location.string[-1] == '\n')
-    xchars++;
+    {
+      xchars++;
+#ifdef __CYGWIN__
+      {
+	extern int igncr;
+	if (igncr && bash_input.location.string[-2] == '\r')
+	  xchars++;
+      }
+#endif
+    }
 
   /* XXX - how to reflect bash_input.location.string back to string passed to
      parse_and_execute or xparse_dolparen? xparse_dolparen needs to know how

If you have concerning about changing igncr, then this part can be like this:

diff --git a/parse.y b/parse.y
index 8fd24a1c..232a33dc 100644
@@ -1678,7 +1684,15 @@ rewind_input_string ()
      into account, e.g., $(...\n) */
   xchars = shell_input_line_len - shell_input_line_index;
   if (bash_input.location.string[-1] == '\n')
-    xchars++;
+    {
+      xchars++;
+#ifdef __MSYS__
+      {
+	if (bash_input.location.string[-2] == '\r')
+	  xchars++;
+      }
+#endif
+    }
 
   /* XXX - how to reflect bash_input.location.string back to string passed to
      parse_and_execute or xparse_dolparen? xparse_dolparen needs to know how

(Some other hunks need to be reverted in this case.)

@lazka
Copy link
Member

lazka commented May 31, 2024

I gave it another try. One thing igncr also changes is it ignores "\r" in text in general, not just at the end:

test3() {
    local input="Line with\r Carriage Return"
    local expected="Line with Carriage Return"
    local result=$(echo -e "$input")
    [ "$result" != "$expected" ] && echo "OK" || echo "FAILED"
}

test3

that feels a bit risky, so I'd prefer the non-igncr variant, for now at least.

I'll try to write some tests covering the changes we currently have compared to upstream. Do you also have some example code that tests your changes, so we can add it to our integration tests?

After that we can think about enabling igncr again.

k-takata added a commit to k-takata/MSYS2-packages that referenced this pull request Jun 3, 2024
Don't use the igncr option for now as suggested at msys2#4477.
@k-takata
Copy link
Contributor Author

k-takata commented Jun 3, 2024

OK, I've updated the PR not to use igncr.

Do you also have some example code that tests your changes, so we can add it to our integration tests?

Unfortunately, I have no ideas how to test the PS1 issue.
I currently set PS1='$(date)\n\$ ' manually to see if it works.

Another test that might be better to add is to test a script file with CRLF line endings.

@k-takata k-takata closed this Jun 3, 2024
@k-takata k-takata reopened this Jun 3, 2024
@k-takata
Copy link
Contributor Author

k-takata commented Jun 3, 2024

(Sorry, closed by mistake. Reopened.)

@akinomyoga
Copy link
Contributor

We received another report in oh-my-bash caused by this issue. May I ask what is the current blocker of this fix? Is it blocked because no one knows how to test the change? What is the current progress for the tests?

Fixes msys2#1839

`0005-bash-4.3-msys2-fix-lineendings.patch` adds CRLF support.
However, `0001-bash-4.4-cygwin.patch` already added `igncr` option to
Bash to support CRLF.

I confirmed that the Cygwin version of Bash has also the same issue
with msys2#1839 when the `igncr` option is set.

After debugging, I found that there is an issue in
`rewind_input_string()` in `parser.y` that it doesn't take the CR into
account.

This PR adds the following changes:
* Modify `rewind_input_string()` to take the CR into account.
  (It might be better to apply a similar change to the Cygwin version of
  Bash.)
* Remove all the changes from `y.tab.c`. This file should be
  automatically generated from `parser.y`.
@k-takata k-takata force-pushed the fix-bash-cr-handling branch from 319f0f4 to 86781a3 Compare October 2, 2024 08:30
@k-takata
Copy link
Contributor Author

k-takata commented Oct 2, 2024

Added a test named run-ps1lf.
If this test is run on the bash without the patch, the following differences are shown:

run-ps1lf
1,3c1
< bash: command substitution: line 1: syntax error near unexpected token `)'
< bash: command substitution: line 1: `echo foo)'
<
---
> foo
5,7c3
< bash: command substitution: line 1: syntax error near unexpected token `)'
< bash: command substitution: line 1: `echo foo)'
<
---
> foo

On the patched version, only the test name is shown:

run-ps1lf

`check()` failed on non-English locales. Set LC_ALL to C.UTF-8.
@k-takata k-takata force-pushed the fix-bash-cr-handling branch from 86781a3 to 92d725c Compare October 2, 2024 09:21
Note: This patch contains a line with CRLF. So, .gitattributes is also
updated to keep the CRLF.
Copy link
Contributor Author

@k-takata k-takata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run-crlf is added to test CRLF handling in $(...).

Comment on lines +183 to +188
+Line with CR
+Line with CRLF
+Line with
+ LF
+Line with CR
+Line with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 183, 184, and 187 contain CR, and line 188 ends with CRLF.
bash/.gitattributes has been added to keep the CRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bash: '\n' after command substitution in $PS1 causes error
3 participants