MAEC Deduplicator Analysis

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

MAEC Deduplicator Analysis

Kirillov, Ivan A.

I thought it’d be an interesting to do a quick little study of how much the MAEC deduplicator can save in terms of file size. Here are the results for the files in the Zeus dataset:

 

                              All

Anubis

ThreatExpert

Cuckoo

Average

6.41

4.49

2.31

12.74

Minimum

-1.96

-1.96

-0.97

2.05

Maximum

31.89

30.10

9.26

31.89

Median

4.31

0.00

1.21

10.71

 

It’s no surprise that the largest benefit comes from using it with the Cuckoo outputs, which are often the most verbose. The negative numbers were unexpected, but are accounted by the fact that there are cases where only one or two simple Objects are deduplicated and the extra overhead from creating a new Object_Collection with these Objects results in a larger file.

 

-Ivan

Reply | Threaded
Open this post in threaded view
|

RE: MAEC Deduplicator Analysis

Kirillov, Ivan A.

Oh, and I probably should have mentioned that these numbers all reflect the % difference in file size after running the deduplicator J

 

Thanks,

Ivan

 

From: Kirillov, Ivan A.
Sent: Monday, January 13, 2014 11:26 AM
To: maec-discussion-list Malware Attribute Enumeration Discussion
Subject: MAEC Deduplicator Analysis

 

I thought it’d be an interesting to do a quick little study of how much the MAEC deduplicator can save in terms of file size. Here are the results for the files in the Zeus dataset:

 

                              All

Anubis

ThreatExpert

Cuckoo

Average

6.41

4.49

2.31

12.74

Minimum

-1.96

-1.96

-0.97

2.05

Maximum

31.89

30.10

9.26

31.89

Median

4.31

0.00

1.21

10.71

 

It’s no surprise that the largest benefit comes from using it with the Cuckoo outputs, which are often the most verbose. The negative numbers were unexpected, but are accounted by the fact that there are cases where only one or two simple Objects are deduplicated and the extra overhead from creating a new Object_Collection with these Objects results in a larger file.

 

-Ivan

Reply | Threaded
Open this post in threaded view
|

Re: MAEC Deduplicator Analysis

Terry MacDonald
Hi Ivan,

Thats an interesting bit of information. It's probably a good idea to send those figures through to the [hidden email] mailinglist so that the developers of Cuckoo can roll in some deduplication into the MAEC module. It could possibly save some money on bandwidth for the guys on malwr.com (if they are letting people download MAEC feeds)

Cheers

Terry MacDonald


---------- Forwarded message ----------
From: Kirillov, Ivan A. <[hidden email]>
Date: 14 January 2014 05:27
Subject: RE: MAEC Deduplicator Analysis
To: maec-discussion-list Malware Attribute Enumeration Discussion <[hidden email]>


Oh, and I probably should have mentioned that these numbers all reflect the % difference in file size after running the deduplicator J

 

Thanks,

Ivan

 

From: Kirillov, Ivan A.
Sent: Monday, January 13, 2014 11:26 AM
To: maec-discussion-list Malware Attribute Enumeration Discussion
Subject: MAEC Deduplicator Analysis

 

I thought it’d be an interesting to do a quick little study of how much the MAEC deduplicator can save in terms of file size. Here are the results for the files in the Zeus dataset:

 

                              All

Anubis

ThreatExpert

Cuckoo

Average

6.41

4.49

2.31

12.74

Minimum

-1.96

-1.96

-0.97

2.05

Maximum

31.89

30.10

9.26

31.89

Median

4.31

0.00

1.21

10.71

 

It’s no surprise that the largest benefit comes from using it with the Cuckoo outputs, which are often the most verbose. The negative numbers were unexpected, but are accounted by the fact that there are cases where only one or two simple Objects are deduplicated and the extra overhead from creating a new Object_Collection with these Objects results in a larger file.

 

-Ivan