Skip to content

[Feature Request/Improvement] Alternate JSON Output w/o b64contents #3399

@danieldjewell

Description

@danieldjewell

Any thoughts on doing something (see below) to add a way to skip the base64 output of the scanned file in JSON format? I recognize that having it in there is part of SBuD and I can definitely see the benefit/convenience (having a more-or-less "self-contained" format with the file data is great for say later security/virus/malware analysis...) -- but it also makes the JSON output absolutely gigantic (which scales up with the size of the input file scanned, of course).

Options could be:

  • Add a new output format (like "json-nob64") that doesn't include it
  • Add a command line switch to skip it (--no-contents or something like that?)

Also a second question becomes:

  • Change the schema of the JSON output and remove the b64contents key entirely (this is probably a bad idea...)
  • Just set the b64contents key to an empty string (or even None)
  • Set the b64contents key to some string actually encoded in base64 ... say base64("null")...

Ultimately, the idea is to not introduce a breaking change into the default behavior - arguably, either a new output format or a --no-contents flag preserves existing functionality. As to removing the key entirely, I suppose it's also arguable about which is better/worse: removing the b64contents key, replacing the data in the key with None/null, or setting the key to a short base64 encoded string of "null".

In my experience, at least in the Python world, developers often don't check for the existence of a key in a dict (or they do not use the dict.get() method which gracefully handles a non-existing key - unlike the case of mydict['noKey'] ). I suppose that the concern is somewhat moot since the default behavior won't change.

With either option, it seems prudent to add an optional parameter to the polyfile.Analyzer.sbud method (see below) to skip the encoding of the data to base64 - there doesn't appear to be a reason to waste CPU cycles (and memory) to convert the data to base64 if it will be stripped from the output.

def sbud(self, matches: Optional[Iterable[Match]] = None) -> Dict[str, Any]:

b64contents = base64.b64encode(data)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions