Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Download ePSF Files: 403 Forbidden Error in psf_retriever Function #213

Closed
haticekaratay opened this issue Oct 28, 2024 · 6 comments

Comments

@haticekaratay
Copy link

Description

I'm encountering a 403 Forbidden error when trying to download focus-diverse ePSF files using the psf_retriever function from the acstools.focus_diverse_epsfs module. This error occurs in both local environments and containerized environments (e.g., GitHub Codespaces).

The error arises while running a Jupyter notebook that demonstrates the workflow for downloading and examining a single focus-diverse ePSF file. You can view the notebook here: https://github.com/spacetelescope/hst_notebooks/blob/main/notebooks/ACS/acs_focus_diverse_epsfs/acs_focus_diverse_epsfs.ipynb

Expected behavior

The psf_retriever function should successfully download the specified ePSF FITS file when provided with a valid observation rootname as demonstrated in the notebook.

Actual behavior

The function returns a 403 Forbidden error from the AWS API Gateway, indicating that access to the resource is denied.

Steps to Reproduce

  1. Open the notebook and set up the download location in the current working directory:
download_location = os.path.join(os.getcwd(), 'downloads')
os.makedirs(download_location, exist_ok=True)
  1. Attempt to retrieve the file by running:
download_location = os.path.join(os.getcwd(), 'downloads')
os.makedirs(download_location, exist_ok=True)
  1. Observe the 403 Forbidden error response.
image

System Details

Environment: Tested in local Python environment and GitHub Codespaces.
Python Version: 3.11, and 3.12
Operating System: macOS locally, Ubuntu in Codespaces

I attempted to access the API directly using curl commands with and without authentication. Below are the outputs from these attempts:

curl -I https://8cclxcxse4.execute-api.us-east-1.amazonaws.com/main/psf-server-ops/

$ curl -I https://8cclxcxse4.execute-api.us-east-1.amazonaws.com/main/psf-server-ops/
HTTP/2 403
content-type: application/json
content-length: 0
date: Mon, 28 Oct 2024 19:24:02 GMT
x-amz-apigw-id: AYCw-FVWIAMEhHQ=
x-amzn-requestid: dc6c6afd-0059-4861-8db2-0f81c41a5dfc
x-amzn-errortype: ForbiddenException
x-cache: Error from cloudfront
via: 1.1 6d5b0fa46ef77b2ff227bdbcee6603ee.cloudfront.net (CloudFront)
x-amz-cf-pop: IAD55-P4
x-amz-cf-id: eSPdYfE1MQb0DvGIIp8ZfsEieuqly7q9PIWHINoLwSZkDp4mAin5OA

@gsanand
Copy link
Contributor

gsanand commented Oct 28, 2024

@haticekaratay can you clarify one detail, you said that you are getting errors locally as well. do you mean locally through the notebook, or locally through curl commands (the latter of which is not an intended use case)?

@pllim pllim added the PSF label Oct 28, 2024
@haticekaratay
Copy link
Author

The notebook runs fine locally. Thank you for clarifying that curl isn't intended to access this API.
The primary issue I'm experiencing is that the psf_retriever function works as expected when running the notebook locally and successfully downloading the files. However, when I run the same notebook in a container or in CI(both running on Ubuntu), the psf_retriever function fails with a 403 Forbidden error.
To diagnose the discrepancy between local and containerized/CI environments, I tested the API with curl to check if the issue might be related to differences in network permissions, session handling, or other environmental factors. Please correct me if I am wrong, but I suspect additional configuration might be needed when running in these environments.
Should the notebook workflow be adjusted to account for differences in runtime environments? If there are any known configuration requirements specific to these environments, guidance on how to proceed would be greatly appreciated.

@pllim
Copy link
Collaborator

pllim commented Oct 29, 2024

@haticekaratay , what service does this CI run on? AWS probably has to whitelist the IP addresses.

For example: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-githubs-ip-addresses

It is not uncommon for some servers to blacklist CI services.

@haticekaratay
Copy link
Author

Thank you for following up. The CI runs on Github Actions, which generally communicates over port 443 for web traffic, the standard HTTPS port. Given that the psf_retriever function queries and API are hosted on AWS API Gateway, any IP restrictions or access controls would need configuring on the AWS side to permit access from GitHub actions IP ranges.
From what I understand, API Gateway may be blocking requests from Github actions due to network restrictions or, as you said, blacklisting CI provider IPs. This is the repository the notebook is running on

@pllim
Copy link
Collaborator

pllim commented Oct 29, 2024

Hmm. There is really nothing we can do in the repository here. I think this is better handled as a private help call to HST Help Desk as you might need to exchange sensitive info with the ACS Team.

https://hsthelp.stsci.edu

@haticekaratay
Copy link
Author

This issue has now been resolved with the help of the HST help desk support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants