Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awscli2 ignores SIGTSTP, breaks shell job control (only from pyinstaller) #5478

Closed
2 tasks done
salewski opened this issue Aug 14, 2020 · 10 comments
Closed
2 tasks done
Labels
enhancement feature-request A feature should be added or improved. pyinstaller v2

Comments

@salewski
Copy link

salewski commented Aug 14, 2020

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug

The aws command from awscli2 seems to be ignoring the SIGTSTP signal. This
violates the user's expectation when working interactively in the shell because
the process cannot be easily suspended (CTRL-Z) and resumed. Effectively, this
behavior breaks job control.

This is a regression from the awscli version 1 behavior, which works with
standard Unix job control.

SDK version number

This is using the awscli2 program downloaded on 2020-08-13 from:

    $ aws --version
    aws-cli/2.0.40 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10

Platform/OS/Hardware/Device

What are you running the cli on?

    $ uname -srmvo
    Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

To Reproduce (observed behavior)

You'll need two terminal emulator windows. The first ("window A" below) is for
normal user activity; the second ("window B" below) is to see what is happening
and to allow us to easily "rescue" the first window when it inevitably hangs.

There are four steps, but only the first two are needed to reproduce the issue.

  • Step 1 of 4 - setup
  • Step 2 of 4 - hang terminal window A
  • Step 3 of 4 (optional) - rescue with SIGSTOP
  • Step 4 of 4 (optional) - rescue with SIGCONT

Steps 3 and 4 describe ways of dealing with the issue once encountered, and give
us an opportunity to show some additional detail.

Step 1 of 4 - setup

In window A, we note our terminal and then just run 'aws help'. That lands the
user in the pager (less).

    $ tty
    /dev/pts/52

    $ aws help
    [output to pager (less)]

In window B, use the ps command take a look at the process tree from window A.

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   S+     0:00      \_ aws help
     7981 pts/52   S+     0:00          \_ less

So far, so good. No problems yet.

Note that the plus (+) signs in the STAT column indicate that the aws help
process (and friends) are in the foreground process group:

Step 2 of 4 - hang terminal window A

In window A, press CTRL-Z to suspend the process; the terminal session will be
effectively hung:

    CTRL-Z
    [terminal is hung]

In window B, again use ps to look at the process tree from window A. This time
we can see that only two of the three processes involved have been stopped (as
indicated by the T for them in the STAT column). The plus (+) signs in the
STAT column tell us that the aws help process (and friends) are still in the
foreground process group. The process group is "half suspended":

$ ps fww -t /dev/pts/52
  PID TTY      STAT   TIME COMMAND
  896 pts/52   Ss     0:00 bash
 7971 pts/52   S+     0:00  \_ aws help
 7972 pts/52   T+     0:00      \_ aws help
 7981 pts/52   T+     0:00          \_ less

Step 3 of 4 (optional) - rescue with SIGSTOP

We can force the aws process to stop by sending it a SIGSTOP signal (recall
that SIGSTOP cannot be caught):

    $ kill -SIGSTOP 7971

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss+    0:00 bash
     7971 pts/52   T      0:00  \_ aws help
     7972 pts/52   T      0:00      \_ aws help
     7981 pts/52   T      0:00          \_ less

You can see that the aws process (and friends) are no longer in the foreground
process group. The single + in the STAT column shows that the parent shell
(bash) is back in control.

Furthermore, the shell in window A is again usable:

    $ jobs
    [1]+  Stopped                 aws help

Step 4 of 4 (optional) - rescue with SIGCONT

In window A, bring the suspended aws help process group back into the
foreground. This will resume the pager. Press CTRL-Z to again hang the
terminal:

    $ fg
    [resumed pager (less)]

    CTRL-Z
    [terminal is hung (again)]

In window B, confirm the state looks like the hung scenario described above:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   T+     0:00      \_ aws help
     7981 pts/52   T+     0:00          \_ less

Now rather than send SIGSTOP to force the process group to suspend, instead
send two SIGCONT signals to the already-stopped processes to resume them:

    $ kill -SIGCONT 7972
    $ kill -SIGCONT 7981

And again confirm the state of the process tree. We can see that the entire aws help
process group is again in the foreground; it is no longer in a "half suspended" state:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
     7971 pts/52   S+     0:00  \_ aws help
     7972 pts/52   S+     0:00      \_ aws help
     7981 pts/52   S+     0:00          \_ less

At this point, the pager in window A is again usable. You can type 'q' to exit
out of 'less' and otherwise continue using the terminal window.

Expected behavior

Standard Unix job control could be used with aws (awscli2).

Pressing CTRL-Z while reading the docs would put the aws help/pager process
group into the background and return control to the shell. The use would be able
to jump back into the pager using the standard Unix job control features of the
shell.

Logs/output

Nothing much relevant. All logging precedes the triggering of the hung behavior,
and no additional log messages are recorded beyond that point when performing
the above steps. But FWIW, below are some snippets as captured by running:

    $ aws --debug help 2> stderr.log

The top and bottom of the debug log:

2020-08-14 11:46:32,351 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/2.0.40 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10
2020-08-14 11:46:32,351 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['--debug', 'help']
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_timestamp_parser at 0x7f2716c3f158>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x7f27175d8b70>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_binary_formatter at 0x7f2716bfe400>
2020-08-14 11:46:32,351 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f2717534b70>
2020-08-14 11:46:32,353 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x7f2716d83f28>
2020-08-14 11:46:32,353 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_json_file_cache at 0x7f2716dbe8c8>
...
2020-08-14 11:46:32,365 - MainThread - botocore.hooks - DEBUG - Event doc-relateditems-start.aws: calling handler <bound method CLIDocumentEventHandler.doc_relateditems_start of <awscli.clidocs.ProviderDocumentEventHandler object at 0x7f2716b73a20>>
2020-08-14 11:46:32,365 - MainThread - botocore.hooks - DEBUG - Event doc-relateditem.aws.aws help topics: calling handler <bound method CLIDocumentEventHandler.doc_relateditem of <awscli.clidocs.ProviderDocumentEventHandler object at 0x7f2716b73a20>>
2020-08-14 11:46:32,432 - MainThread - awscli.help - DEBUG - Running command: ['groff', '-m', 'man', '-T', 'ascii']
2020-08-14 11:46:32,445 - MainThread - awscli.help - DEBUG - Running command: ['less']

Additional context

Empirical testing suggests there are actually two different flavors of this
issue: one that affects aws help output, and one that affects the output AWS
API service calls.

The former is what is described above, and is "the worse" of the two both
because a) it is the more common use case (for the author) for using aws with
job control; and b) there is no easy workaround for it.

[UPDATE (2020-08-20): There is a workaround: build from the git 'v2'
branch and invoke the aws python script directly; avoid invoking the aws
binary executable from the pyinstaller-created installer. See comments
below, especially this one.]

The second has a workaround, which is to use the pager to consume all of the
service output data before attempting to suspend the pager. While clumsy, this
workaround can be performed "inline" in the shell session; it can be applied
without the need to open additional terminal windows and go hunting for PIDs to
which we would then send signals (as described above).

Note that for the second flavor the bug works differently depending on the
cli_pager setting[0]. With the default settings it will be aws that
launches less, and when that is the case the "consume all data before
suspending" workaround does not apply. Even after consuming all of the data, the
aws process will go into the "half suspended" state if you press CTRL-Z in
the pager.

[0] For the above tests, the --no-paginate command line option is not
equivalent to setting the cli_pager option to an empty value in
~/.aws/config. That seems like a different bug, though.

With cli_pager set to an empty value, it would be the user piping the output
of the aws to less. The less invocation is part of the same process group,
so is still susceptible to hanging the terminal. But in this arrangement, having
the pager consume all of the output data from the AWS service call allows the
aws command to exit. When that happens, the only process left in the
foreground process group will be the pager; at that point job control will work
as expected. The user can suspend the pager, resume it, etc.

Step 1 of 3 - disable paging of service data output

In ~/.aws.config, add this setting to the relevant profile:

    cli_pager =

Step 2 of 3 - pipe data to pager

In terminal window A, invoke a command that triggers an AWS API service call
and emits a decent amount of data (enough for multiple pages in your pager
app). Here we happen to be using iam list-policies for that purpose:

    $ aws iam list-policies | less

In window B, use ps to look at the state of the process tree from window A:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
    10135 pts/52   S+     0:00  \_ aws iam list-policies
    10137 pts/52   S+     0:00  |   \_ aws iam list-policies
    10136 pts/52   S+     0:00  \_ less

So far, so good. Note that less is a direct descendant of the shell (bash)
rather than of the aws command.

Note: If you were to press CTRL-Z at this point, then window A would
effectively be hung as described above. Don't do that here.

Step 3 of 3 - use pager to slurp-up all data

In window A, tell your pager to "jump to the end". This will have the effect of
consuming all of the data from the AWS API service call, as emitted by the aws
command. In less, this can be done by issuing the command 0G (zero
capital-gee).

In window B, again look at the process tree:

    $ ps fww -t /dev/pts/52
      PID TTY      STAT   TIME COMMAND
      896 pts/52   Ss     0:00 bash
    10136 pts/52   S+     0:00  \_ less

Notice that the aws process has completed its work and exited. The only
process still in the foreground process group is the pager. At this point, you
can suspend the pager (CTRL-Z) and it will work as expected; it will not
hang the terminal.

@salewski salewski added the needs-triage This issue or PR still needs to be triaged. label Aug 14, 2020
@kdaily
Copy link
Member

kdaily commented Aug 19, 2020

Hi @salewski,

Thanks for the detailed report. I'm able to CTRL-Z out of running aws help and successfully resume it using fg, so I'm not sure that I've reproduced the scenario you've explained.

I'm unable to reproduce the scenario mentioned for output of a service call too:

If you were to press CTRL-Z at this point, then window A would effectively be hung as described above. Don't do that here.

I ran aws ec2 describe-instances which has more than one page, and am able to CTRL-Z and resume.

For your comment at [0], the --no-paginate refers to pagination of responses, not paging. We have another open issue to address this in the documentation: #5330

@kdaily kdaily added guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 19, 2020
@salewski
Copy link
Author

Hi @kdaily,

Thanks for your response.

I'm able to CTRL-Z out of running aws help and successfully resume it using fg, so I'm not sure that I've reproduced the scenario you've explained.

Encouraged by what you wrote, I sanity-checked my setup and the above issue report with "fresh" terminal windows (and a "fresh" head :-) but still see the issue as described.

I then downloaded the latest aws-cli, which bumped me up from 2.0.40 to 2.0.41:

    $ aws --version
    aws-cli/2.0.41 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10

Unfortnately I still see the issue with this new version, too.

And just to help rule out something with my particular xterm setup, I tried the test with other terminal emulators, too (urxvt and kitty). I see the same behavior with both of them.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 20, 2020
@salewski
Copy link
Author

salewski commented Aug 20, 2020

Hi @kdaily,

I found that the problem is somehow related to the binary aws executable created by pyinstaller. Just running the aws python script directly does not have the problem. I have not worked with pyinstaller before, but I'll poke around to see what I can find out.

I will note, though, that my "normal" and historical use aws-cli has been via the awscli Debian package. The /usr/bin/aws command it installs is the Python script, not an ELF executable. So my comment above about this SIGTSTP behavior being a regression may be incorrect, as it is only in the last week that I've run the ELF executable version of the program, and only because I wanted to run version 2. I have never run the version 1 aws-cli from the ELF. When I wrote-up this issue description I was not aware of there being different "python script" and "binary executable" flavors of the aws command.

The rest of this comment is just details for future reference; just recording it here to track my evolving understanding.


I built the tip of the aws-cli 'v2' branch, and when I run the 'aws' python3 script in my python venv I am able to run aws help and CTRL-Z and the whole process group correctly suspends. Here's the process tree before and after suspending; note there are only two processes in the 'aws help' process subtree:

    $ ps fww -t /dev/pts/76
      PID TTY      STAT   TIME COMMAND
    27603 pts/76   Ss     0:00 bash
    12605 pts/76   S+     0:00  \_ /path/to/aws-cli/aljunk-venv/bin/python3 /path/to/aws-cli/aljunk-venv/bin/aws help
    12623 pts/76   S+     0:00      \_ less

[CTRL-Z pressed]

    $ ps fww -t /dev/pts/76
      PID TTY      STAT   TIME COMMAND
    27603 pts/76   Ss+    0:00 bash
    12605 pts/76   T      0:00  \_ /path/to/aws-cli/aljunk-venv/bin/python3 /path/to/aws-cli/aljunk-venv/bin/aws help
    12623 pts/76   T      0:00      \_ less

[Process is now correctly suspended, and I can type in the bash shell]

I then built the pyinstaller-based executable:

    $ ./scripts/installers/make-exe
    ...
    6408 INFO: Building EXE from EXE-00.toc completed successfully.
    6413 INFO: checking COLLECT
    6413 INFO: Building COLLECT because COLLECT-00.toc is non existent
    6413 INFO: Building COLLECT COLLECT-00.toc
    6764 INFO: Building COLLECT COLLECT-00.toc completed successfully.

    Update metadata values {'distribution_source': 'exe'}
    Copying contents of /path/to/aws-cli/exe/pyinstaller/dist/aws_completer into /tmp/tmpcksw1h05/aws/dist
    Copying contents of /path/to/aws-cli/exe/assets into /tmp/tmpcksw1h05/aws
    Deleted build directory: /path/to/aws-cli/exe/pyinstaller/build
    Deleted build directory: /path/to/aws-cli/exe/pyinstaller/dist
    Exe build is available at: /path/to/aws-cli/dist/awscli-exe.zip

I then did the following to unpack the zip file and install it in-tree beneath the dist subdir:

    $ cd dist

    $ mkdir tt && cd tt

    $ unzip -q ../awscli-exe.zip

    $ ./aws/install -i "$(pwd)/INSTALLED" -b "$(pwd)/INSTALLED/bin"
    You can now run: /path/to/aws-cli/dist/tt/INSTALLED/bin/aws --version

    $ ./INSTALLED/bin/aws --version
    aws-cli/2.0.41 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10

    $ file ./INSTALLED/bin/aws
    ./INSTALLED/bin/aws: symbolic link to /path/to/aws-cli/dist/tt/INSTALLED/v2/current/bin/aws

    $ file $(readlink -f ./INSTALLED/bin/aws)
    /path/to/aws-cli/dist/tt/INSTALLED/v2/2.0.41/dist/aws: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=28ba79c778f7402713aec6af319ee0fbaf3a8014, stripped

When I run the ELF version of the aws command, I am not able to suspend the process.

Here's our ELF 'aws help' command before hitting CTRL-Z:

    $ ps fww -t /dev/pts/76
      PID TTY      STAT   TIME COMMAND
    27603 pts/76   Ss     0:00 bash
     6539 pts/76   S+     0:00  \_ ./INSTALLED/bin/aws help
     6540 pts/76   S+     0:00      \_ ./INSTALLED/bin/aws help
     6550 pts/76   S+     0:00          \_ less

Here's strace(1) waiting for me to hit CTRL-Z:

    $ strace -p 6539
    strace: Process 6539 attached
    wait4(6540, 

[CTRL-Z pressed here]

strace shows our SIGTSTP arriving:

    $ strace -p 6539
    strace: Process 6539 attached
    wait4(6540, 0x7ffe5f8dc3fc, 0, NULL)    = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_STOPPED, si_pid=6540, si_uid=1000, si_status=SIGTSTP, si_utime=28, si_stime=2} ---
    --- SIGTSTP {si_signo=SIGTSTP, si_code=SI_KERNEL} ---
    kill(6540, SIGTSTP)                     = 0
    rt_sigreturn({mask=[]})                 = 61
    wait4(6540, 

And ps(1) shows us its state afterwards, in the "half suspended" state:

    $ ps fww -t /dev/pts/76
      PID TTY      STAT   TIME COMMAND
    27603 pts/76   Ss     0:00 bash
     6539 pts/76   S+     0:00  \_ ./INSTALLED/bin/aws help
     6540 pts/76   T+     0:00      \_ ./INSTALLED/bin/aws help
     6550 pts/76   T+     0:00          \_ less

@salewski
Copy link
Author

Looks like this issue might be due to pyinstaller issue #4057, fixed in PR #4244 (merged on 2020-05-02), and included in the project's 4.0 release on 2020-08-08.

@salewski
Copy link
Author

salewski commented Aug 20, 2020

I've confirmed that the following trivial patch to use pyinstaller 4.0 (instead of 3.5) produces an aws ELF executable that works with SIGTSTP, and seems to work generally though I've not tested extensively:

$ git diff requirements-build.txt 
diff --git a/requirements-build.txt b/requirements-build.txt
index 5a2d6c82d..7020bb0fd 100644
--- a/requirements-build.txt
+++ b/requirements-build.txt
@@ -2,4 +2,4 @@
 # We create the separation for cases where we're doing installation
 # from a local dependency directory instead of requirements.txt.
 cryptography==2.8
-PyInstaller==3.5
+PyInstaller==4.0

I'm not submitting a pull request for that change, though, because the resulting ELF binary can no longer be launched via a symlink:

    $ file /symlink/path/to/bin/aws
    /symlink/path/to/bin/aws: symbolic link to /another/symlink/path/to/install/dir/v2/current/bin/aws

    $ /symlink/path/to/bin/aws --version
    [21164] Error loading Python lib '/symlink/path/to/bin/libpython3.7m.so.1.0': dlopen: /symlink/path/to/bin/libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

It works fine if you use the real path (either relative or absolute):

    $ file $(readlink -f /symlink/path/to/bin/aws)
    /some/real/path/to/install/dir/v2/2.0.41/dist/aws: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4485007e1df54173663f148d5840c963a7e868ee, stripped

    $ ./real/path/to/install/dir/v2/2.0.41/dist/aws --version
    aws-cli/2.0.41 Python/3.7.3 Linux/4.19.0-9-amd64 exe/x86_64.debian.10

That issue looks like pyinstaller issue #4674, which was opened in February and has had some active discussion within the last two weeks. Looks like the easiest path forward for the current issue might be to wait for a pyinstaller 4.0+N release.

A workaround for now is to avoid the pyinstaller-based ELF aws binary and just run the aws python script directly.

@salewski salewski changed the title awscli2 ignores SIGTSTP, breaks shell job control awscli2 ignores SIGTSTP, breaks shell job control (only from pyinstaller) Aug 20, 2020
@kdaily kdaily added enhancement and removed guidance Question that needs advice or information. labels Sep 1, 2020
@salewski
Copy link
Author

Just want to note that there has been movement on that pyinstaller issue #4674 mentioned above. It has an open PR that has seen some activity within the last two weeks.

@kdaily kdaily added feature-request A feature should be added or improved. v2 labels Feb 2, 2021
@kdaily
Copy link
Member

kdaily commented Mar 29, 2021

Hi @salewski,

We bumped PyInstaller to 4.2. Can you confirm if this is still an issue with a new version of the V2 CLI?

#5958

Thanks!

@kdaily kdaily added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 29, 2021
@salewski
Copy link
Author

Hi @salewski,

We bumped PyInstaller to 4.2. Can you confirm if this is still an issue with a new version of the V2 CLI?

#5958

Thanks!

Hi @kdaily, Thanks for the heads-up; I'll take a look.
More soon,
-Al

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 30, 2021
@salewski
Copy link
Author

We bumped PyInstaller to 4.2. Can you confirm if this is still an issue with a new version of the V2 CLI?

Hi @kdaily, I no longer see the issue. I tested both the nightly built artifact from https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip:

    $ /path/to/awscli2-2021-03-29_205242/bin/aws --version
    aws-cli/2.1.32 Python/3.8.8 Linux/4.19.0-9-amd64 exe/x86_64.debian.10 prompt/off

and with a version I built from source:

    $ ./dist/tttt/INSTALLED/v2/current/dist/aws --version
    aws-cli/2.1.32 Python/3.9.1+ Linux/4.19.0-9-amd64 exe/x86_64.debian.10 prompt/off

In both cases, suspending (CTRL-Z) and resuming the ELF binary (pyinstaller) version of the awscli2 program now "Just Works" for me. I also used ps to peek at the process states as described above, and everything looks good now.

Thank you to the @pyinstaller folks for the upstream fix, and to the @aws aws-cli team for tracking the fix and incorporating it.

@kdaily kdaily closed this as completed Jul 14, 2021
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature-request A feature should be added or improved. pyinstaller v2
Projects
None yet
Development

No branches or pull requests

2 participants