I thought I’d point out some new capabilities that we’ve recently added to our python-maec API that may be of interest.
A)Package merging APIs/script: this code will merge multiple MAEC Packages into one. However, it also attempts to intelligently merge “related” MAEC Malware Subjects, that is those that are defined for the same malware entity (i.e. Malware_Instance_Object_Attributes).
In this latter case, it will merge multiple such Malware Subjects into a single one, while deduplicating common entities like Labels and keeping unique, analysis-derived data such as Bundles. This may be useful for aggregating data pertaining to the same piece
of malware that comes from multiple sources such as different malware sandboxes.
B)Distance calculation APIs/script: this code calculates the distances between two or more MAEC Malware Subjects, and outputs the resulting distance matrix. It creates a feature vector (using numpy) based on both dynamic (i.e. Actions)
and static (i.e. PE file attributes) features found in the Malware Subject for use in the distance calculation. It makes use of the recent CybOX Object normalization code added to python-cybox (https://github.com/CybOXProject/python-cybox/blob/master/cybox/utils/normalize.py)
to better correlate between Objects (e.g. Windows files) whose features may be reported slightly different depending on the particular tool used. At the moment the distance calculation is fairly simple and is based on whether or not the Malware Subject possesses
a particular feature (e.g. for Actions, each unique instance corresponds to an individual feature), which is then used to calculate the Euclidean distance between Malware Subjects. The APIs are fairly configurable and allow for the customization of whether
to use static or dynamic features in the distance calculation (or both), which dynamic features to ignore, which static features to use, etc. We also plan on making them more extensible and potentially further integrating them with other APIs like scipy to
allow one, for instance, to use this calculation directly in clustering, etc. Any feedback on further extensions or applications of this code would be most welcome!