Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return ffmpeg error logs to caller, and fix StreamingStatus #254

Merged

Conversation

ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented May 19, 2023

FFmpeg transcoder logic has been implemented within the driver code now, so that way we have more control over it. This allows us to keep track of the process output and log it to the console as well as return it to the caller of StartStreaming. Because we have more precise control over the process, we can better sync the StreamingStatus to whether or not the ffmpeg process is running.

The handling of publishing StreamingStatus to the message bus has been optimized to send better based on when it actually changes, and not send when the user calls StopStreaming when already stopped, or StartStreaming when already started.

Logic has also been added to redact the rtsp credentials from the log file and API responses to avoid exposing secrets to the caller.

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/edgex-go/blob/main/.github/Contributing.md

PR Checklist

Please check if your PR fulfills the following requirements:

  • I am not introducing a breaking change (if you are, flag in conventional commit message with BREAKING CHANGE: describing the break)
  • I am not introducing a new dependency (add notes below if you are)
  • I have added unit tests for the new feature or bug fix (if not, why?)
  • I have fully tested (add details below) this the new feature or bug fix (if not, why?)
  • I have opened a PR for the related docs change (if not, why?) N/A

Testing Instructions

  • checkout and build this PR
  • run edgex in secure or non secure mode using the image built from this pr
    • update add-device-usb-camera.yaml
    • run make run ds-usb-camera
  • wait for your camera to be auto-detected, and get the device name
  • execute StartStreaming, and expect a 500 error with a log message that contains messages related to 401 authentication error
  • execute StreamingStatus and verify isstreaming=false and error is the same (or a subset) of the api call result
  • post your rtspauth credentials to /api/v3/secrets using the guide from edgex-docs (see step 4 here)
  • execute StartStreaming with valid inputs and expect it to return a 200 OK, taking about 1 second as before
  • view the stream using mplayer or ffplay (see here)
  • execute StreamingStatus and see that there is no error, and isstreaming=true
  • execute StartStreaming and expect an error already running
  • execute StreamingStatus and see that there is no error, and isstreaming=true
  • execute StopStreaming and expect 200 OK
  • execute StreamingStatus and see that there is no error, and isstreaming=false
  • execute StopStreaming and expect 200 OK
  • execute StreamingStatus and see that there is no error, and isstreaming=false
  • execute StartStreaming with invalid inputs and expect it to return a 500 error, with details about the error such as mpeg456 not being valid
  • execute StreamingStatus and verify isstreaming=false and error is the same (or a subset) of the api call result
  • execute StopStreaming and expect 200 OK
  • execute StreamingStatus and nothing has changed

additional things to validate:

  • start streaming and then stop the device service while still streaming. make sure the camera is no longer active. start the service back up and ensure it starts fine and u can start streaming the same camera again no problems.
  • try and start streaming invalid input, and then make sure stream with valid afterwards still works
  • try and start streaming with invalid input and then stop the device service. start the service back up and ensure it starts fine and u can start streaming the same camera again no problems.

New Dependency Instructions (If applicable)

Fixes #253
Implements a proper fix for #23

FFmpeg transcoder logic has been implemented within the driver code now,
so that way we have more control over it. This allows us to keep track
of the process output and log it to the console as well as return it to
the caller of StartStreaming. Because we have more precise control over
the process, we can better sync the StreamingStatus to whether or not
the ffmpeg process is running.

The handling of publishing StreamingStatus to the message bus has been
optimized to send better based on when it actually changes, and not send
when the user calls StopStreaming when already stopped, or
StartStreaming when already started.

Logic has also been added to redact the rtsp credentials from the log
file and API responses to avoid exposing secrets to the caller.

Signed-off-by: Anthony Casagrande <anthony.j.casagrande@intel.com>
@ChristianDarr-personal
Copy link
Contributor

To start and view the stream, see here

For how to set credentials - see step 4 here

Signed-off-by: Anthony Casagrande <anthony.j.casagrande@intel.com>
@ajcasagrande
Copy link
Contributor Author

ajcasagrande commented May 20, 2023

It looks like gosec is flagging a G204: Subprocess launched with a potential tainted input or cmd arguments (gosec).

This should be evaluated, however do understand that the G204 already exists in the code base and is just not being detected due to the fact it resides in a 3rd-party library.

Update:
After researching, the way it is utilized is safe. The reason G204 was being flagged was because exec.Command was being passed a function, and not a variable. Since the function call is just a getter for a variable, I moved it out into a temp var and that removed the warning.

More info:

https://github.com/securego/gosec/blob/87cc45e1cd903e2038e868eaf026cbc5d1dd1a26/rules/subproc.go#L34-L42

https://github.com/securego/gosec/blob/87cc45e1cd903e2038e868eaf026cbc5d1dd1a26/rules/subproc.go#L91C89-L92

Signed-off-by: Anthony Casagrande <anthony.j.casagrande@intel.com>
@codecov-commenter
Copy link

Codecov Report

Merging #254 (a7aa7f0) into main (faedb97) will decrease coverage by 0.25%.
The diff coverage is 0.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff            @@
##            main    #254      +/-   ##
========================================
- Coverage   2.70%   2.45%   -0.25%     
========================================
  Files          7       8       +1     
  Lines        777     856      +79     
========================================
  Hits          21      21              
- Misses       756     835      +79     
Impacted Files Coverage Δ
internal/driver/device.go 11.42% <0.00%> (-0.34%) ⬇️
internal/driver/driver.go 0.00% <0.00%> (ø)
internal/driver/helper.go 0.00% <0.00%> (ø)
internal/driver/transcoder.go 0.00% <0.00%> (ø)

Comment on lines 124 to 125
dev.lc.Debugf("FFmpeg process with pid %d for device %s exited with code %d. User time: %v, System time: %v",
proc.Process.Pid, dev.name, proc.ProcessState.ExitCode(), proc.ProcessState.UserTime(), proc.ProcessState.UserTime())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dev.lc.Debugf("FFmpeg process with pid %d for device %s exited with code %d. User time: %v, System time: %v",
proc.Process.Pid, dev.name, proc.ProcessState.ExitCode(), proc.ProcessState.UserTime(), proc.ProcessState.UserTime())
dev.lc.Debugf("FFmpeg process with pid %d for device %s exited with code %d. User time: %v, System time: %v",
proc.Process.Pid, dev.name, proc.ProcessState.ExitCode(), proc.ProcessState.UserTime(), proc.ProcessState.SystemTime())

@vyshali-chitikeshi
Copy link
Contributor

I validated this PR thoroughly both in secure and non-secure modes.
Tested what was recommend in test instructions of this PR and I added some of my own tests.
Functionality wise it works correctly, no issues seen. Tried several time start steaming with valid and invalid inputs, stop streaming, restarting/stopping/starting usb service while streaming is going on, etc checked streaming status for every test to ensure it reflects correct streaming status.
But it gives status code 200 in below scenarios:
Start streaming with valid inputs without rtsp creds uploaded
Start streaming with invalid inputs
Based on PR testing instructions, its supposed to give status code 500 for above 2 cases.
However though it gives status code 200 in above 2 cases, streaming will not start and gives appropriate error in streaming status api error section. only problem is api output status code. Find attached logs and screenshots.

@vyshali-chitikeshi
Copy link
Contributor


// wait a little bit before returning to see if there are any errors on startup
select {
case <-time.After(time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, one second is not enough for FFmpeg to complete the execution of the start streaming command. This could be the root cause of the issue @vyshali-chitikeshi encountered.
I tried extending it to 3 seconds, and then I was able to receive the expected 500 status code when the streaming failed to start.

However, defining how long to wait seems challenging as it may depend on machine performance. We also need to consider the HTTP request timeout, which is 5 seconds by default. https://github.com/edgexfoundry/edgex-go/blob/3d7959681c73ad1dd76784b2336248bd6b6384c3/cmd/core-common-config-bootstrapper/res/configuration.yaml#L32

Copy link
Contributor Author

@ajcasagrande ajcasagrande May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally I had made this to be 2 seconds (instead of the current 1 second) when I started this work, but I dropped it back down to 1 as I noticed my typical error response was like 400-600ms. However, I think that may be based on what the actual error was (eg. bad argument, rtsp server offline, etc), and like you said, system performance. In theory we could make this configurable as well.

I had an idea that we could use something in the output stream to determine if the streaming has started ok, but was not sure what we could parse as a potential indicator of that. I know enabling progress logs would allow that, but doesn't seem worth it for just grabbing beginning of stream confirmation.

The last idea is that we return right away, and expect the user to monitor StreamingStatus endpoint for any errors, since as @vyshali-chitikeshi has validated, that always contains the error when the api returned success too early.

Copy link
Contributor

@ChristianDarr-personal ChristianDarr-personal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Anthony Casagrande <anthony.j.casagrande@intel.com>
@ajcasagrande ajcasagrande requested a review from FelixTing May 24, 2023 23:24
@ajcasagrande
Copy link
Contributor Author

ajcasagrande commented May 24, 2023

@vyshali-chitikeshi @FelixTing I have just pushed updated code to handle this situation. It now waits up to 4 seconds before returning in case the error takes longer on certain machines. I have also enabled ffmpeg progress reporting of 60 second intervals. This way it is not super chatty, but still allows us to have something to parse when the stream has successfully started.

In my experimentation, I have noticed ffmpeg will always print progress for the first frame, so no matter if the reporting interval is 1 minute, we still get notified right away. This seems to be the best of both worlds, as it allows more time for slower machines/commands, but doesn't drastically delay the response of successful streams.

Successful response in 669ms
Screenshot from 2023-05-24 16-32-02

Error response in 1097ms
Screenshot from 2023-05-24 16-32-39

@vyshali-chitikeshi
Copy link
Contributor

With the updated code, it works correctly. Its gives status code 500 with appropriate error message n below 2 scenarios as expected
Start streaming with valid inputs without uploading rtsp creds
Start streaming with invalid inputs
attached screenshots.
Screenshot from 2023-05-24 23-51-02
Screenshot from 2023-05-24 23-50-14

Also tested other api's, and some negative scenarios like starting streaming while streaming in-progress and restarting usb service while streaming is going on etc Everything looks good.

@vyshali-chitikeshi
Copy link
Contributor

Also from performance perspective, I don't see noticeable delay to get rest-api outputs.

Copy link
Member

@FelixTing FelixTing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@FelixTing FelixTing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajcasagrande ajcasagrande merged commit e4cb32a into edgexfoundry:main May 25, 2023
@ajcasagrande ajcasagrande deleted the return-ffmpeg-error branch May 25, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StartStreaming fails with unknown error when rtsp auth credentials are not set
5 participants