-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Any thoughts on doing something (see below) to add a way to skip the base64 output of the scanned file in JSON format? I recognize that having it in there is part of SBuD and I can definitely see the benefit/convenience (having a more-or-less "self-contained" format with the file data is great for say later security/virus/malware analysis...) -- but it also makes the JSON output absolutely gigantic (which scales up with the size of the input file scanned, of course).
Options could be:
- Add a new output format (like "json-nob64") that doesn't include it
- Add a command line switch to skip it (
--no-contentsor something like that?)
Also a second question becomes:
- Change the schema of the JSON output and remove the
b64contentskey entirely (this is probably a bad idea...) - Just set the
b64contentskey to an empty string (or evenNone) - Set the
b64contentskey to some string actually encoded in base64 ... saybase64("null")...
Ultimately, the idea is to not introduce a breaking change into the default behavior - arguably, either a new output format or a --no-contents flag preserves existing functionality. As to removing the key entirely, I suppose it's also arguable about which is better/worse: removing the b64contents key, replacing the data in the key with None/null, or setting the key to a short base64 encoded string of "null".
In my experience, at least in the Python world, developers often don't check for the existence of a key in a dict (or they do not use the dict.get() method which gracefully handles a non-existing key - unlike the case of mydict['noKey'] ). I suppose that the concern is somewhat moot since the default behavior won't change.
With either option, it seems prudent to add an optional parameter to the polyfile.Analyzer.sbud method (see below) to skip the encoding of the data to base64 - there doesn't appear to be a reason to waste CPU cycles (and memory) to convert the data to base64 if it will be stripped from the output.
Line 372 in 438628f
| def sbud(self, matches: Optional[Iterable[Match]] = None) -> Dict[str, Any]: |
Line 383 in 438628f
| b64contents = base64.b64encode(data) |