python-maec Object Deduplicator

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

python-maec Object Deduplicator

Kirillov, Ivan A.
All,

I thought I'd send out a quick blurb about a new capability we've added to the python-maec - the MAEC "Deduplicator". This is a new API that can deduplicate MAEC data by parsing a document, looking for unique entities and their duplicates, and replacing the duplicates with references to the unique entity. At the moment it operates only on Objects in MAEC Bundles (including those in Actions), but it should be straightforward to extend to Actions and other entities in the future.

Here's an example of what it does with Objects. Say you have a MAEC Bundle with two Actions that operate on the same Object that is defined twice:

<maecBundle:Action id="maec-anubis_to_maec-act-1">
  <cybox:Name xsi:type="maecVocabs:FileActionNameVocab-1.0">create file</cybox:Name>
  <cybox:Associated_Objects>
      <cybox:Associated_Object id="maec-anubis_to_maec-obj-1">
          <cybox:Properties xsi:type="FileObj:FileObjectType">
               <FileObj:File_Path>C:\WINDOWS\system32\i1ru74n4.exe</FileObj:Full_Path>
           </cybox:Properties>
          <cybox:Association_Type xsi:type="maecVocabs:ActionObjectAssociationTypeVocab-1.0">output</cybox:Association_Type>
      </cybox:Associated_Object>
  </cybox:Associated_Objects>
</maecBundle:Action>

<maecBundle:Action id="maec-anubis_to_maec-act-2">
  <cybox:Name xsi:type="maecVocabs:FileActionNameVocab-1.0">modify file</cybox:Name>
  <cybox:Associated_Objects>
      <cybox:Associated_Object id="maec-anubis_to_maec-obj-2">
          <cybox:Properties xsi:type="FileObj:FileObjectType">
               <FileObj:File_Path>C:\WINDOWS\system32\i1ru74n4.exe</FileObj:Full_Path>
           </cybox:Properties>
          <cybox:Association_Type xsi:type="maecVocabs:ActionObjectAssociationTypeVocab-1.0">input</cybox:Association_Type>
      </cybox:Associated_Object>
  </cybox:Associated_Objects>
</maecBundle:Action>

Running the Deduplicator on this Bundle will move this deduplicated Object into a "Deduplicated Objects" Collection and replace the instances of the Object with a reference to the moved Object (via its ID):

<maecBundle:Object_Collection name="Deduplicated Objects" id="maec-anubis_to_maec-objc-1">
  <maecBundle:Object_List>
      <maecBundle:Object id="maec-anubis_to_maec-obj-1">
          <cybox:Properties xsi:type="FileObj:FileObjectType">
              <FileObj:File_Path>C:\WINDOWS\system32\i1ru74n4.exe</FileObj:Full_Path>
          </cybox:Properties>
      </maecBundle:Object>
  </maecBundle:Object_List>
</maecBundle:Object_Collection>

<maecBundle:Action id="maec-anubis_to_maec-act-1">
  <cybox:Name xsi:type="maecVocabs:FileActionNameVocab-1.0">create file</cybox:Name>
  <cybox:Associated_Objects>
      <cybox:Associated_Object idref="maec-anubis_to_maec-obj-1"/>
</maecBundle:Action>

<maecBundle:Action id="maec-anubis_to_maec-act-2">
  <cybox:Name xsi:type="maecVocabs:FileActionNameVocab-1.0">modify file</cybox:Name>
  <cybox:Associated_Objects>
      <cybox:Associated_Object idref="maec-anubis_to_maec-obj-1"/>
  </cybox:Associated_Objects>
</maecBundle:Action>

This makes better use of MAEC's Collections and subsequently helps reduces the size of MAEC documents. Of course, it means having to dereference these Objects when parsing the deduplicated  documents, which might not be ideal for all use cases. However, we do plan on incorporating it in the future in all MAEC utilities (as a feature that one can toggle on/off) to allow for more "best-practice" oriented generation of MAEC content and smaller output sizes.

At the moment the Deduplicator API resides in the "deduplicator" branch in our GitHub repository: https://github.com/MAECProject/python-maec/tree/deduplicator. After more testing, we plan on merging it with the master branch. I've added a little sample script that demonstrates how the API can be used here: https://github.com/MAECProject/python-maec/blob/deduplicator/examples/deduplicator_test.py. I recommend running it against the newly added sample file (https://github.com/MAECProject/python-maec/blob/deduplicator/examples/sample_report_maec.xml) to see a good demonstration of how it works.

Please let us know if you have any questions/thoughts/bug reports on this or any other MAEC capability.

Regards,
Ivan Kirillov
MITRE