Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Decompression Bombs #13557

Merged
merged 26 commits into from
Feb 15, 2024
Merged

Python: Decompression Bombs #13557

merged 26 commits into from
Feb 15, 2024

Conversation

am0o0
Copy link
Contributor

@am0o0 am0o0 commented Jun 24, 2023

This is part of All for one, one for all query submission, I'm going to submit an issue in github/securitylab for this pull request too.
I've also added FileAndFormRemoteFlowSource module which is related to all queries that need a file as their user provided remote flow sources. in this module I added support of file and form of FastAPI framework.
I tried my best to add as much sanitizers as possible. if you have another idea about sanitizers I can write more sanitizers too.

@smowton
Copy link
Contributor

smowton commented Jun 24, 2023

Anyone not watching the repo in general, note this is part of a family of submissions:

#13553
#13554
#13555
#13556
#13557
#13558

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 25, 2023

Anyone not watching the repo in general, note this is part of a family of submissions:

#13553 #13554 #13555 #13556 #13557 #13558

#13560 is added too

@github-actions
Copy link
Contributor

github-actions bot commented Jun 25, 2023

QHelp previews:

python/ql/src/experimental/Security/CWE-409/DecompressionBombs.qhelp

Decompression Bomb

Extracting Compressed files with any compression algorithm like gzip can cause to denial of service attacks.

Attackers can compress a huge file which created by repeated similiar byte and convert it to a small compressed file.

Recommendation

When you want to decompress a user-provided compressed file you must be careful about the decompression ratio or read these files within a loop byte by byte to be able to manage the decompressed size in each cycle of the loop.

Example

python ZipFile library is vulnerable by default

import zipfile


def Bad(zip_path):
    zipfile.ZipFile(zip_path, "r").extractall()

By checking the decompressed size from input zipped file you can check the decompression ratio. attackers can forge this decompressed size header too. So can't rely on file_size attribute of ZipInfo class. this is recommended to use "ZipFile.open" method to be able to manage decompressed size.

Reading decompressed file byte by byte and verifying the total current size in each loop cycle in recommended to use in any decompression library.

import zipfile


def safeUnzip(zipFileName):
    '''
    safeUnzip reads each file inside the zipfile 1 MB by 1 MB
    and during extraction or reading of these files it checks the total decompressed size
    doesn't exceed the SIZE_THRESHOLD
    '''
    buffer_size = 1024 * 1024 * 1  # 1 MB
    total_size = 0
    SIZE_THRESHOLD = 1024 * 1024 * 10  # 10 MB
    with zipfile.ZipFile(zipFileName) as myzip:
        for fileinfo in myzip.infolist():
            with myzip.open(fileinfo.filename, mode="r") as myfile:
                content = b''
                chunk = myfile.read(buffer_size)
                total_size += buffer_size
                if total_size > SIZE_THRESHOLD:
                    print("Bomb detected")
                    return False  # it isn't a successful extract or read
                content += chunk
                # reading next bytes of uncompressed data
                while chunk:
                    chunk = myfile.read(buffer_size)
                    total_size += buffer_size
                    if total_size > SIZE_THRESHOLD:
                        print("Bomb detected")
                        return False  # it isn't a successful extract or read
                    content += chunk

                # An example of extracting or reading each decompressed file here
                print(bytes.decode(content, 'utf-8'))
    return True  # it is a successful extract or read

References

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 26, 2023

Hi, I've completed the work on this query and I don't have any further updates/commits here.

@ghsecuritylab ghsecuritylab marked this pull request as draft July 31, 2023 16:02
@ghsecuritylab ghsecuritylab marked this pull request as ready for review October 17, 2023 18:22
@am0o0
Copy link
Contributor Author

am0o0 commented Nov 22, 2023

Hi @RasmusWL
I'm sorry if I shouldn't mention you here!
Is there any chance for you and your colleagues to start reviewing this PR in the next two weeks? because after these two weeks, it is not specific for me whether I have enough time to focus on this PR or not.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things, besides the inline comments:

  1. Have you applied the feedback from Ruby/Go to this PR as well?
  2. My understanding of the safeUnzip function is that it will return early if a file is larger than SIZE_THRESHOLD, which is set to 100 kb. Overall that doesn't sound super useful in practice, since I imagine many zip files to contain images (which are usually in the 1-10mb filesize). I also noted that it doesn't actually extract anything, but only checks if a zip-file is safe, it should probably also be called is_zipfile_safe and return true/false.
  3. If the open call has a mode argument that is not known, and later an .extractall() call is made, I would have assumed we can deduct this is a problem... but your code currently doesn't seem to allow this. Can you explain why you added this restriction?
  4. When you have looked through results for this query, what is your own impression on number of FPs?

NIT: After being promoted we do care to keep the .ql file and .qll file in separate directories, but for Python we have adopted a simpler approach for experimental queries, so it's just fine to place python/ql/src/experimental/semmle/python/security/DecompressionBomb.qll right next to the .ql file 😊

NIT: It would be nice if you could adopt DataflowQueryTest.ql, like we do here for path-problem tests in the future, since it makes reading the expected good/bad results much easier.

@am0o0
Copy link
Contributor Author

am0o0 commented Dec 7, 2023

A few things, besides the inline comments:

0. Have you applied the feedback from Ruby/Go to this PR as well?

1. My understanding of the `safeUnzip` function is that it will return early if a file is larger than `SIZE_THRESHOLD`, which is set to 100 kb. Overall that doesn't sound super useful in practice, since I imagine many zip files to contain images (which are usually in the 1-10mb filesize). I also noted that it doesn't actually extract anything, but only checks if a zip-file is safe, it should probably also be called `is_zipfile_safe` and return true/false.

2. If the `open` call has a `mode` argument that is not known, and later an `.extractall()` call is made, I would have assumed we can deduct this is a problem... but your code currently doesn't seem to allow this. Can you explain why you added this restriction?

3. When you have looked through results for this query, what is your own impression on number of FPs?

NIT: After being promoted we do care to keep the .ql file and .qll file in separate directories, but for Python we have adopted a simpler approach for experimental queries, so it's just fine to place python/ql/src/experimental/semmle/python/security/DecompressionBomb.qll right next to the .ql file 😊

NIT: It would be nice if you could adopt DataflowQueryTest.ql, like we do here for path-problem tests in the future, since it makes reading the expected good/bad results much easier.

  1. I'll apply the feedback of other submissions too.
  2. this is my mistake, I've updated the example, the new example has comments for each part and has a reasonable SIZE_THRESHOLD value.
  3. Yes you are right, checking for mode can be deducted.
  4. The number of FPs is low because most of the results from my GitHub MRVA scan are vulnerable.

I'll add inline tests, but I think I can't import decompression bomb sinks into inline tests I put the DecompressionBomb.qll file into the current DecompressionBomb.ql directory?

@am0o0
Copy link
Contributor Author

am0o0 commented Dec 7, 2023

I tried to add the inline test but I failed :

Could not evaluate queries in /home/am/CodeQL-home/codeql-repo-amammad/python/ql/test/experimental/query-tests/Security/CWE-409: The CodeQL database is not compatible with a QL library referenced by the query you are trying to run.
        CodeQL database location: CWE-409.testproj
        Query location: DataflowQueryTest.ql
The database may be too new for the QL libraries the query is using; try upgrading them.
Alternatively, running 'codeql database upgrade CWE-409.testproj' with an appropriate --search-path option might help.
The CodeQL database is not compatible with a QL library referenced by the query you are trying to run.
        CodeQL database location: CWE-409.testproj
        Query location: DataflowQueryTest.ql
The database may be too new for the QL libraries the query is using; try upgrading them.
Alternatively, running 'codeql database upgrade CWE-409.testproj' with an appropriate --search-path option might help.

Since the helper predicate had nice qldocs
The old title/id matches how we used to write queries, but I think just
using the normal conversational name is easier for everyone :)
All those newlines are not good for inline expectations
You're only allowed to have `result=OK` if there is a sink on that line...
@RasmusWL
Copy link
Member

Thanks for making these updates. I've taken the liberty to fix up a few things in this PR, instead of merging it as-is and then filing a PR to do it afterwards.

Overall I think some of the additional taint-steps looks a little too broad, and it would have been more clear if they were split up a bit more (example). So something to keep in mind for future PRs 😊

RasmusWL
RasmusWL previously approved these changes Feb 14, 2024
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking my comments above into account, I think it's good enough to merge as-is (at least for experimental)

@am0o0
Copy link
Contributor Author

am0o0 commented Feb 14, 2024

@RasmusWL you are the best, thanks a lot.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge to experimental (see other comments above)

@RasmusWL RasmusWL merged commit e4c3037 into github:main Feb 15, 2024
13 checks passed
@am0o0 am0o0 deleted the amammad-python-bombs branch September 14, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants