Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't search "aws help" output with "grep" due to ^H in the document text #5455

Closed
2 tasks done
atsalolikhin-spokeo opened this issue Aug 6, 2020 · 12 comments
Closed
2 tasks done
Labels

Comments

@atsalolikhin-spokeo
Copy link

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
"aws help" output includes special character ^H which means I can't use grep to search it

SDK version number
aws-cli/1.18.114

Platform/OS/Hardware/Device
macOS Catalina

To Reproduce (observed behavior)

I can't simply grep the output of "aws help". For example:

[~] $ aws help | grep endpoint
[~] 1 $

Yet the text is clearly there:

[~] $ aws help | head -27 | tail -3
       --endpoint-url (string)

       Override command's default URL with the given URL.
[~] $

Then I tried:

[~] 1 $ aws help > /tmp/aws-help.txt
[~] $ head -27 /tmp/aws-help.txt | tail -3
       --endpoint-url (string)

       Override command's default URL with the given URL.
[~] $

Finally I opened /tmp/aws-help.txt in vim:

 27        Override command's default URL with the given URL.
 28
 29        -^H--^H-n^Hno^Ho-^H-v^Hve^Her^Hri^Hif^Hfy^Hy-^H-s^Hss^Hsl^Hl (boolean)

Why do you have ^H in that line? That's what's breaking my grep. :)

Expected behavior
I was expecting the grep command to return the matching text from the "aws help" output:

       --endpoint-url (string)

Logs/output
N/A (but if you need it, let me know and I can provide it)

Additional context
It's not a show-stopper for me. I know about the AWS CLI User Guide and Command Reference, and about the AWS-Shell project.

@atsalolikhin-spokeo atsalolikhin-spokeo added the needs-triage This issue or PR still needs to be triaged. label Aug 6, 2020
@kdaily
Copy link
Member

kdaily commented Aug 12, 2020

Hi @atsalolikhin-spokeo, looking into this.

@kdaily kdaily self-assigned this Aug 12, 2020
@kdaily kdaily added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 12, 2020
@kdaily
Copy link
Member

kdaily commented Aug 12, 2020

Hi @atsalolikhin-spokeo, looks like the way the are outputted, emphasized text gets wrapped with ^H (a backspace) in order to modify their appearance. An overhaul to the way that help is generated would be required to change this behavior. Search within the paginated help does work as expected, and as you noted the web-based user guide or aws shell.

@kdaily kdaily added feature-request A feature should be added or improved. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Aug 12, 2020
@atsalolikhin-spokeo
Copy link
Author

atsalolikhin-spokeo commented Aug 12, 2020

Thanks @kdaily !

emphasized text gets wrapped with ^H (a backspace) in order to modify their appearance

I don't quite understand -- doesn't ^H delete the character?

I am not necessarily requesting a change in behavior. :) (If you feel that's appropriate, that's fine -- I see you added the "feature-request" label.) Just curious about it.

BTW, awscli is quite impressive, it can do so much! I'm really enjoying learning to use it.

@kdaily
Copy link
Member

kdaily commented Aug 12, 2020

Good question - I know we use groff to generate these, which is common for man page generation. I confirmed that this behavior is consistent in other man pages, like man ping > /tmp/help.txt and observe the same unprintable characters! I know there are command characters in groff to format, I don't know how they are interpreted to text.

Do you have a use case for the help pages being available as text files, or just curious as to why you couldn't use them this way? If you have a use case, I can leave it as a feature request, or else I can close it as is.

Thanks for confirming, and glad you're learning about the capabilities of the AWS CLI.

@kdaily
Copy link
Member

kdaily commented Aug 12, 2020

Total speculation and out of curiousity, if I look at this output from the ping man page:

N^HNA^HAM^HME^HE

It looks like to get bold, a character is written (N), then a command to move back a space is written (^H), and then the next two characters (NA), etc - so the effect is writing the same character twice in the same place to bold?

@atsalolikhin-spokeo
Copy link
Author

Wow, funky! Yeah, "man ping" has it too. I was not aware of that.

Nope, I don't have a use case, feel free to close this case -- you folks have enough real work to do.

And I agree with your speculation and analysis there.

This sounds like printing twice!! If you think ink printing -- printing a character twice would make it darker.

How's that for IT archeology?

@atsalolikhin-spokeo
Copy link
Author

Thank you very much! :)

@kdaily kdaily added guidance Question that needs advice or information. and removed feature-request A feature should be added or improved. labels Aug 12, 2020
@salewski
Copy link

FWIW, I'd like to see the behavior changed. While not The End of the World (how many bugs truly are?), not being able to grep the output makes the tool feel like it doesn't quite work -- it unnecessarily casts a shade of "Good enough for 1992" on it :-)

I stumbled over this "ungrepable help" behavior today, and was going to open a new issue for it but found this one.

While writing up issue #5478 I was looking for the aws command line option that is supposed to disable paging:

    $ aws help | grep -i pag
           Disable automatic pagination.

Say what?! I know there's a command line option with 'pag' in it...

Looking at the aws help in my pager showed the expected content:

       --no-paginate (boolean)

       Disable automatic pagination.

That "ungrepable help" behavior is surprising.

Compare with the ls command from GNU coreutils. In the output of man 1 ls in a pager displayed on my terminal, if I search for "control" I see the following content with formatting (in my terminal, the names of the command line opts are in displayed in bold):

       -q, --hide-control-chars
              print ? instead of nongraphic characters

       --show-control-chars
              show nongraphic characters as-is (the default, unless program is
              'ls' and output is a terminal)

That does not prevent the options from being grepped from either the man page:

    $ man 1 ls | grep control
           -q, --hide-control-chars
           --show-control-chars

...or from the command's --help output:

    $ ls --help | grep control
      -q, --hide-control-chars   print ? instead of nongraphic characters
          --show-control-chars   show nongraphic characters as-is (the default,

Since others have mentioned ping above, I'll note what I see, too. On my machine, ping[0] has a similar story to ls; I see formatted output in the pager on my terminal (here searched for "nodeinfo"):

       -N nodeinfo_option
           IPv6 only. Send ICMPv6 Node Information Queries (RFC4620), instead
           of Echo Request. CAP_NET_RAW capability is required.

...but I can grep both the man page output:

    $ man 8 ping | grep nodeinfo
                [-N nodeinfo_option] [-w deadline] [-W timeout] [-p pattern]
           -N nodeinfo_option

...and the ping help (-h) ouput:

    $ ping -h 2>&1 | grep nodeinfo
                 [-N nodeinfo_option] [-p pattern] [-Q tclass] [-s packetsize]

[0] From the Debian iputils-ping package (version 3:20180629-2+deb10u1).

The man command on my machine[1] (which, similar to aws, also wraps groff) only emits formatting characters when the output is being directed at a terminal. It has a knob in the form of the MAN_KEEP_FORMATTING to override this behavior, though. Here's the documentation for it from man(1):

        MAN_KEEP_FORMATTING
              Normally,  when output is not being directed to a terminal (such
              as to a file or a pipe), formatting characters are discarded  to
              make  it  easier to read the result without special tools.  How‐
              ever, if $MAN_KEEP_FORMATTING is set  to  any  non-empty  value,
              these  formatting  characters  are retained.  This may be useful
              for wrappers around man that can  interpret  formatting  charac‐
              ters.

[1] From the Debian man-db package (version 2.8.5-2), which comes from: https://nongnu.org/man-db/

I would like to see the aws help command adopt a similar behavior -- only emit formatting characters when output is going to a terminal, or when explicitly requested by configuration.

I'll also note that I think the help content is more important than the formatting of it. In ultimate terms, it would be better to have grepable content without formatting than ungrepable content with formatting. But if the man-db behavior is adopted, we can have the best of both worlds.

Finally, here we have been talking about grepable help output because that was the use case that lead a couple of us here. But the issue is not specific to grep; it applies for any non-pager app processing the aws tool's output. In design terms, a Unix tool should be designed for generality, but also optimized for the common case. The current behavior of aws help is optimized for the common case (reading aws help output in a pager on a terminal), but the output is not general because it assumes the help output is always going to a terminal; it violates the expectation of the "universal interface".

@kdaily
Copy link
Member

kdaily commented Aug 14, 2020

Hi @salewski, thanks for the thoughtful and detailed response. I'm going to open this as a feature request for more discussion. I do note that we use the man macro to groff:

https://github.com/aws/aws-cli/blob/v2/awscli/help.py#L112

@kdaily kdaily reopened this Aug 14, 2020
@kdaily kdaily added feature-request A feature should be added or improved. needs-discussion and removed guidance Question that needs advice or information. labels Aug 14, 2020
@kdaily kdaily removed their assignment Oct 1, 2020
@richard-mauri
Copy link

richard-mauri commented Jun 3, 2021

I'm not certain if this is related, but the aws cli outputs with special characters that make automation/parsing almost impossible.

For example, aws sqs create-queue --queue-name queue1 --output text returns the following:

^[[?1h^[=^Mhttp://localhost:4566/000000000000/queue1^[[m^M
^M^[[K^[[?1l^[>

The same problem happens regardless of output format (text,json,etc)

Oooh - this may be a dockerism. The problem is when running the cl from docker like below.
In my Jenkins CI system slave host, the awscli is not installed so I run the commands through docker, but this leads to the special character problem.

docker run --network=host --rm -it -e AWS_DEFAULT_REGION=us-east-1 -e AWS_ACCESS_KEY_ID=not_needed_locally -e AWS_SECRET_ACCESS_KEY=not_needed_locally amazon/aws-cli:2.2.8 --endpoint-url=http://localhost:4566 sqs create-queue --queue-name queue1 --output text

I found the "-it" in the docker run caused the conrol character mess, so this is not an issue with awscli, but I'll leave this note in case someone else finds this useful.

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Jun 3, 2022
@github-actions github-actions bot closed this as completed Jun 5, 2022
@kevin-sellers
Copy link

kevin-sellers commented Jun 30, 2022

#7041 seems to indirectly solve this by using man directly instead of groff to format the help pages, but #6973 should be the final solution such that man -wK ${search} could be used to properly search all the help pages at once.

Before

% MANPAGER='cat -v' aws help | head
AWS()                                                                    AWS()



^[[1mNAME^[[0m
       aws -

^[[1mDESCRIPTION^[[0m
       The  AWS  Command  Line  Interface is a unified tool to manage your AWS
       services.

After

% MANPAGER='cat -v' aws help | head
AWS()                                                                    AWS()



NAME
       aws -

DESCRIPTION
       The  AWS  Command  Line  Interface is a unified tool to manage your AWS
       services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants