Re: Test Data Set Initiative - Draft Straw Man Proposal

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Test Data Set Initiative - Draft Straw Man Proposal

Barnum, Sean D.
I think this is a great idea.
I will try to look through this in detail when I have cycles.

sean

From: Patrick Maroney <[hidden email]>
Date: Tuesday, October 22, 2013 5:08 PM
To: "Barnum, Sean D." <[hidden email]>, "Thomas. Millar" <[hidden email]>, "Struse, Richard" <[hidden email]>
Cc: stix-discussion-list Structured Threat Information Expression/ST <[hidden email]>, cybox-discussion-list Cyber Observable Expression/CybOX Discussi <[hidden email]>, maec-discussion-list Malware Attribute Enumeration Discussion <[hidden email]>
Subject: Test Data Set Initiative - Draft Straw Man Proposal

Sean/Tom/Richard,

I'm sending this initial draft outline on a proposed Test Data Set Initiative to determine if there is interest in this initiative (or establish if  any such initiatives are already underway/planned).  I've attached the original iThoughts Mind Map and some different formats along with the inline text paste below.

I believe that the availability of the proposed data sets would provide numerous benefits to the Community and the advancement of our shared objectives.  Welcome any feedback.


Patrick Maroney

President
Integrated Networking Technologies, Inc.
PO Box 569
Marlton, NJ 08053
Office: (856)983-0001
Cell: (609)841-5104
Fax: (856)983-0001
[hidden email]

Test Data Sets

·      Overview
This proposal seeks to engage the community in the development of a rich and diverse library of common publicly available test data sets. 

Initial targeted data formats include:

STIX
CybOx
MAEC
TAXII

Initiative could extend to other formal and de facto standard formats (e.g.  IODEF, OpenIOC, CIF, etc.). These formats could be automatically derived from STIX/CybOx/etc or submitted as original forms and subsequently converted to STIX/CybOx/etc.

The availability of such data sets provides numerous benefits to the Community and the advancement of our shared objectives.


·      Objectives

 

·      Provide a variety of simple to complex standard test data sets that can be applied to a broad range of functional testing requirements. 

·      Small test data sets focusing on complex structural elements and relationships

·      Large data sets focusing on performance, "big data" and related use cases

·      Targeted environments include development, test, and fully operational production networks and systems.

·      Efficacy

 

 


 

·      Test Data Set Generation Options


There are number of challenges in fully meeting the objectives of this initiative.  With the exception of nonpublic sources each of the following may provide a hybrid approach to meeting various objectives in this initiative.

Initial thinking is that test data sets derived, aggregated, and correlated from Open Source threat intelligence represents the best overall outcomes.

Sources of OSI would be engaged to  evangelize/convey the value propositions in supporting this initiative.


·      Open source

·      Use of open-source public domain threat intelligence may run into reuse and copyright issues.

·      Initial thinking is that test data sets derived, aggregated, and correlated from Open Source threat intelligence represents the best overall outcomes.

·      Sources of OSI would be engaged to  evangelize/convey the value propositions in supporting this initiative.

·      Inclusion of unvented indicators could lead to unintentional operational impacts

·      Nonpublic sources

·      Use of nonpublic threat intelligence is not possible for obvious reasons.

·      Private addressing

·      While use of private addressing could address some use cases, it does not fully address all the capabilities required.

·      Use of private addresses could lead to unintentional operational impacts on organizations who internally deploy private addresses

·      Randomly generated test data

·      Use of randomly generated test data sets has its own issues both in terms of potential unintentional operational impacts as well as the efficacy of the underlying data models and relationships.

·      Other options?

 

 


 

·      Requirements

·      Publicly Accesible

·      Statically on github

·      Dynamically on TAXII reference gateways

·      Minimize unforeseen/inadvertent impacts on internal and external operational production networks

·      Support for all major schema versions/formats

·      STIX

·      CybOx

·      MAEC

·      TAXII

·      Specific source identification and tagging to discriminate and control use of test data sets

·      Ensure OPSEC exposures are limited to publicly available intelligence only

·      Processes and tools for Localization/customization of key parameters

·      Implementation

·      Community engagement

·      Establish consensus on:

·      Value proposition

·      Working Group and Leadership for Initiative

·      Objectives

·      Process

·      Key Milestones

·      Test Data Set Generation Methodology

·      Open source derivation?

·      Private addressing?

·      Randomly generated test data?

·      Hybrid?

·      Other?

·      Deliverables

·      Priorities

·      Establish test data set specifications



Reply | Threaded
Open this post in threaded view
|

RE: Test Data Set Initiative - Draft Straw Man Proposal

Kirillov, Ivan A.
Hi James,

I just pushed out some fixes to the OpenIOC to STIX tool on GitHub which should hopefully fix the issues that you encountered; please let us know if it does not. Thanks again for the bug report!

Regards,
Ivan Kirillov
MITRE

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Poole, James
Sent: Wednesday, October 30, 2013 9:46 AM
To: Jerome Athias; Poole, James
Cc: [hidden email]; stix-discussion-list Structured Threat Information Expression/ST; cybox-discussion-list Cyber Observable Expression/CybOX Discussi; maec-discussion-list Malware Attribute Enumeration Discussion
Subject: Re: Test Data Set Initiative - Draft Straw Man Proposal

Does anyone have experience running the openioc_to_stix tool?  I was
trying to convert the example files, but ran into issues.  I created a bug
against the tool: https://github.com/STIXProject/Tools/issues/26

I'll be happy to post the example files back to the github project if I
can get it working.

-James

On 10/26/13 1:47 AM, "Jerome Athias" <[hidden email]> wrote:

>Maybe a quick win could be to use
>https://github.com/STIXProject/Tools/tree/master/openioc_to_stix
>with some .ioc files
>http://www.openioc.org/iocs/
>
>2013/10/26 Jerome Athias <[hidden email]>:
>> Hi,
>>
>> maybe we could use the github projects
>> to directly find examples/test data set at this level:
>> https://github.com/STIXProject
>> to avoid having to go at this level
>> https://github.com/STIXProject/Tools/tree/master/stix_to_html/examples
>>
>> https://github.com/MAECProject/
>> https://github.com/MAECProject/schemas/tree/master/examples
>>
>> https://github.com/TAXIIProject
>>
>> I think it just need some coordination between the projects, and the
>> community will be happy to contribute.
>>
>> Best regards
>> /JA
>>
>> 2013/10/25 Poole, James <[hidden email]>:
>>> As a new member of the email list, I just want to concur that this is a
>>> great idea.  My team is trying to incorporate the STIX standard into a
>>>new
>>> product, but the lack of test data or public feeds is proving to be a
>>> hindrance to adoption.
>>>
>>> -James
>>>
>>> From: <Barnum>, "Sean D." <[hidden email]>
>>> Date: Wednesday, October 23, 2013 11:05 AM
>>> To: Patrick Maroney <[hidden email]>, Tom Millar
>>> <[hidden email]>, "Struse, Richard"
>>><[hidden email]>
>>>
>>> Cc: stix-discussion-list Structured Threat Information Expression/ST
>>> <[hidden email]>, cybox-discussion-list Cyber
>>> Observable Expression/CybOX Discussi
>>> <[hidden email]>, maec-discussion-list Malware
>>> Attribute Enumeration Discussion <[hidden email]>
>>> Subject: Re: Test Data Set Initiative - Draft Straw Man Proposal
>>>
>>> I think this is a great idea.
>>> I will try to look through this in detail when I have cycles.
>>>
>>> sean
>>>
>>> From: Patrick Maroney <[hidden email]>
>>> Date: Tuesday, October 22, 2013 5:08 PM
>>> To: "Barnum, Sean D." <[hidden email]>, "Thomas. Millar"
>>> <[hidden email]>, "Struse, Richard"
>>><[hidden email]>
>>> Cc: stix-discussion-list Structured Threat Information Expression/ST
>>> <[hidden email]>, cybox-discussion-list Cyber
>>> Observable Expression/CybOX Discussi
>>> <[hidden email]>, maec-discussion-list Malware
>>> Attribute Enumeration Discussion <[hidden email]>
>>> Subject: Test Data Set Initiative - Draft Straw Man Proposal
>>>
>>> Sean/Tom/Richard,
>>>
>>> I'm sending this initial draft outline on a proposed Test Data Set
>>> Initiative to determine if there is interest in this initiative (or
>>> establish if  any such initiatives are already underway/planned).  I've
>>> attached the original iThoughts Mind Map and some different formats
>>>along
>>> with the inline text paste below.
>>>
>>> I believe that the availability of the proposed data sets would provide
>>> numerous benefits to the Community and the advancement of our shared
>>> objectives.  Welcome any feedback.
>>>
>>>
>>> Patrick Maroney
>>>
>>> President
>>> Integrated Networking Technologies, Inc.
>>> PO Box 569
>>> Marlton, NJ 08053
>>> Office: (856)983-0001
>>> Cell: (609)841-5104
>>> Fax: (856)983-0001
>>> [hidden email]
>>>
>>> Test Data Sets
>>>
>>> *      Overview
>>> This proposal seeks to engage the community in the development of a
>>>rich and
>>> diverse library of common publicly available test data sets.
>>>
>>> Initial targeted data formats include:
>>>
>>> STIX
>>> CybOx
>>> MAEC
>>> TAXII
>>>
>>> Initiative could extend to other formal and de facto standard formats
>>>(e.g.
>>> IODEF, OpenIOC, CIF, etc.). These formats could be automatically
>>>derived
>>> from STIX/CybOx/etc or submitted as original forms and subsequently
>>> converted to STIX/CybOx/etc.
>>>
>>> The availability of such data sets provides numerous benefits to the
>>> Community and the advancement of our shared objectives.
>>>
>>>
>>> *      Objectives
>>>
>>>
>>>
>>> *      Provide a variety of simple to complex standard test data sets
>>>that
>>> can be applied to a broad range of functional testing requirements.
>>>
>>> *      Small test data sets focusing on complex structural elements and
>>> relationships
>>>
>>> *      Large data sets focusing on performance, "big data" and related
>>>use
>>> cases
>>>
>>> *      Targeted environments include development, test, and fully
>>> operational production networks and systems.
>>>
>>> *      Efficacy
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *      Test Data Set Generation Options
>>>
>>>
>>> There are number of challenges in fully meeting the objectives of this
>>> initiative.  With the exception of nonpublic sources each of the
>>>following
>>> may provide a hybrid approach to meeting various objectives in this
>>> initiative.
>>>
>>> Initial thinking is that test data sets derived, aggregated, and
>>>correlated
>>> from Open Source threat intelligence represents the best overall
>>>outcomes.
>>>
>>> Sources of OSI would be engaged to  evangelize/convey the value
>>>propositions
>>> in supporting this initiative.
>>>
>>>
>>> *      Open source
>>>
>>> *      Use of open-source public domain threat intelligence may run
>>>into
>>> reuse and copyright issues.
>>>
>>> *      Initial thinking is that test data sets derived, aggregated, and
>>> correlated from Open Source threat intelligence represents the best
>>>overall
>>> outcomes.
>>>
>>> *      Sources of OSI would be engaged to  evangelize/convey the value
>>> propositions in supporting this initiative.
>>>
>>> *      Inclusion of unvented indicators could lead to unintentional
>>> operational impacts
>>>
>>> *      Nonpublic sources
>>>
>>> *      Use of nonpublic threat intelligence is not possible for obvious
>>> reasons.
>>>
>>> *      Private addressing
>>>
>>> *      While use of private addressing could address some use cases,
>>>it does
>>> not fully address all the capabilities required.
>>>
>>> *      Use of private addresses could lead to unintentional operational
>>> impacts on organizations who internally deploy private addresses
>>>
>>> *      Randomly generated test data
>>>
>>> *      Use of randomly generated test data sets has its own issues
>>>both in
>>> terms of potential unintentional operational impacts as well as the
>>>efficacy
>>> of the underlying data models and relationships.
>>>
>>> *      Other options?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *      Requirements
>>>
>>> *      Publicly Accesible
>>>
>>> *      Statically on github
>>>
>>> *      Dynamically on TAXII reference gateways
>>>
>>> *      Minimize unforeseen/inadvertent impacts on internal and external
>>> operational production networks
>>>
>>> *      Support for all major schema versions/formats
>>>
>>> *      STIX
>>>
>>> *      CybOx
>>>
>>> *      MAEC
>>>
>>> *      TAXII
>>>
>>> *      Specific source identification and tagging to discriminate and
>>> control use of test data sets
>>>
>>> *      Ensure OPSEC exposures are limited to publicly available
>>>intelligence
>>> only
>>>
>>> *      Processes and tools for Localization/customization of key
>>>parameters
>>>
>>> *      Implementation
>>>
>>> *      Community engagement
>>>
>>> *      Establish consensus on:
>>>
>>> *      Value proposition
>>>
>>> *      Working Group and Leadership for Initiative
>>>
>>> *      Objectives
>>>
>>> *      Process
>>>
>>> *      Key Milestones
>>>
>>> *      Test Data Set Generation Methodology
>>>
>>> *      Open source derivation?
>>>
>>> *      Private addressing?
>>>
>>> *      Randomly generated test data?
>>>
>>> *      Hybrid?
>>>
>>> *      Other?
>>>
>>> *      Deliverables
>>>
>>> *      Priorities
>>>
>>> *      Establish test data set specifications
>>>
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Remove

Philip Waters
On Oct 30, 2013, at 9:54 AM, "Kirillov, Ivan A." <[hidden email]> wrote:

> Hi James,
>
> I just pushed out some fixes to the OpenIOC to STIX tool on GitHub which should hopefully fix the issues that you encountered; please let us know if it does not. Thanks again for the bug report!
>
> Regards,
> Ivan Kirillov
> MITRE
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Poole, James
> Sent: Wednesday, October 30, 2013 9:46 AM
> To: Jerome Athias; Poole, James
> Cc: [hidden email]; stix-discussion-list Structured Threat Information Expression/ST; cybox-discussion-list Cyber Observable Expression/CybOX Discussi; maec-discussion-list Malware Attribute Enumeration Discussion
> Subject: Re: Test Data Set Initiative - Draft Straw Man Proposal
>
> Does anyone have experience running the openioc_to_stix tool?  I was
> trying to convert the example files, but ran into issues.  I created a bug
> against the tool: https://github.com/STIXProject/Tools/issues/26
>
> I'll be happy to post the example files back to the github project if I
> can get it working.
>
> -James
>
> On 10/26/13 1:47 AM, "Jerome Athias" <[hidden email]> wrote:
>
>> Maybe a quick win could be to use
>> https://github.com/STIXProject/Tools/tree/master/openioc_to_stix
>> with some .ioc files
>> http://www.openioc.org/iocs/
>>
>> 2013/10/26 Jerome Athias <[hidden email]>:
>>> Hi,
>>>
>>> maybe we could use the github projects
>>> to directly find examples/test data set at this level:
>>> https://github.com/STIXProject
>>> to avoid having to go at this level
>>> https://github.com/STIXProject/Tools/tree/master/stix_to_html/examples
>>>
>>> https://github.com/MAECProject/
>>> https://github.com/MAECProject/schemas/tree/master/examples
>>>
>>> https://github.com/TAXIIProject
>>>
>>> I think it just need some coordination between the projects, and the
>>> community will be happy to contribute.
>>>
>>> Best regards
>>> /JA
>>>
>>> 2013/10/25 Poole, James <[hidden email]>:
>>>> As a new member of the email list, I just want to concur that this is a
>>>> great idea.  My team is trying to incorporate the STIX standard into a
>>>> new
>>>> product, but the lack of test data or public feeds is proving to be a
>>>> hindrance to adoption.
>>>>
>>>> -James
>>>>
>>>> From: <Barnum>, "Sean D." <[hidden email]>
>>>> Date: Wednesday, October 23, 2013 11:05 AM
>>>> To: Patrick Maroney <[hidden email]>, Tom Millar
>>>> <[hidden email]>, "Struse, Richard"
>>>> <[hidden email]>
>>>>
>>>> Cc: stix-discussion-list Structured Threat Information Expression/ST
>>>> <[hidden email]>, cybox-discussion-list Cyber
>>>> Observable Expression/CybOX Discussi
>>>> <[hidden email]>, maec-discussion-list Malware
>>>> Attribute Enumeration Discussion <[hidden email]>
>>>> Subject: Re: Test Data Set Initiative - Draft Straw Man Proposal
>>>>
>>>> I think this is a great idea.
>>>> I will try to look through this in detail when I have cycles.
>>>>
>>>> sean
>>>>
>>>> From: Patrick Maroney <[hidden email]>
>>>> Date: Tuesday, October 22, 2013 5:08 PM
>>>> To: "Barnum, Sean D." <[hidden email]>, "Thomas. Millar"
>>>> <[hidden email]>, "Struse, Richard"
>>>> <[hidden email]>
>>>> Cc: stix-discussion-list Structured Threat Information Expression/ST
>>>> <[hidden email]>, cybox-discussion-list Cyber
>>>> Observable Expression/CybOX Discussi
>>>> <[hidden email]>, maec-discussion-list Malware
>>>> Attribute Enumeration Discussion <[hidden email]>
>>>> Subject: Test Data Set Initiative - Draft Straw Man Proposal
>>>>
>>>> Sean/Tom/Richard,
>>>>
>>>> I'm sending this initial draft outline on a proposed Test Data Set
>>>> Initiative to determine if there is interest in this initiative (or
>>>> establish if  any such initiatives are already underway/planned).  I've
>>>> attached the original iThoughts Mind Map and some different formats
>>>> along
>>>> with the inline text paste below.
>>>>
>>>> I believe that the availability of the proposed data sets would provide
>>>> numerous benefits to the Community and the advancement of our shared
>>>> objectives.  Welcome any feedback.
>>>>
>>>>
>>>> Patrick Maroney
>>>>
>>>> President
>>>> Integrated Networking Technologies, Inc.
>>>> PO Box 569
>>>> Marlton, NJ 08053
>>>> Office: (856)983-0001
>>>> Cell: (609)841-5104
>>>> Fax: (856)983-0001
>>>> [hidden email]
>>>>
>>>> Test Data Sets
>>>>
>>>> *      Overview
>>>> This proposal seeks to engage the community in the development of a
>>>> rich and
>>>> diverse library of common publicly available test data sets.
>>>>
>>>> Initial targeted data formats include:
>>>>
>>>> STIX
>>>> CybOx
>>>> MAEC
>>>> TAXII
>>>>
>>>> Initiative could extend to other formal and de facto standard formats
>>>> (e.g.
>>>> IODEF, OpenIOC, CIF, etc.). These formats could be automatically
>>>> derived
>>>> from STIX/CybOx/etc or submitted as original forms and subsequently
>>>> converted to STIX/CybOx/etc.
>>>>
>>>> The availability of such data sets provides numerous benefits to the
>>>> Community and the advancement of our shared objectives.
>>>>
>>>>
>>>> *      Objectives
>>>>
>>>>
>>>>
>>>> *      Provide a variety of simple to complex standard test data sets
>>>> that
>>>> can be applied to a broad range of functional testing requirements.
>>>>
>>>> *      Small test data sets focusing on complex structural elements and
>>>> relationships
>>>>
>>>> *      Large data sets focusing on performance, "big data" and related
>>>> use
>>>> cases
>>>>
>>>> *      Targeted environments include development, test, and fully
>>>> operational production networks and systems.
>>>>
>>>> *      Efficacy
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *      Test Data Set Generation Options
>>>>
>>>>
>>>> There are number of challenges in fully meeting the objectives of this
>>>> initiative.  With the exception of nonpublic sources each of the
>>>> following
>>>> may provide a hybrid approach to meeting various objectives in this
>>>> initiative.
>>>>
>>>> Initial thinking is that test data sets derived, aggregated, and
>>>> correlated
>>>> from Open Source threat intelligence represents the best overall
>>>> outcomes.
>>>>
>>>> Sources of OSI would be engaged to  evangelize/convey the value
>>>> propositions
>>>> in supporting this initiative.
>>>>
>>>>
>>>> *      Open source
>>>>
>>>> *      Use of open-source public domain threat intelligence may run
>>>> into
>>>> reuse and copyright issues.
>>>>
>>>> *      Initial thinking is that test data sets derived, aggregated, and
>>>> correlated from Open Source threat intelligence represents the best
>>>> overall
>>>> outcomes.
>>>>
>>>> *      Sources of OSI would be engaged to  evangelize/convey the value
>>>> propositions in supporting this initiative.
>>>>
>>>> *      Inclusion of unvented indicators could lead to unintentional
>>>> operational impacts
>>>>
>>>> *      Nonpublic sources
>>>>
>>>> *      Use of nonpublic threat intelligence is not possible for obvious
>>>> reasons.
>>>>
>>>> *      Private addressing
>>>>
>>>> *      While use of private addressing could address some use cases,
>>>> it does
>>>> not fully address all the capabilities required.
>>>>
>>>> *      Use of private addresses could lead to unintentional operational
>>>> impacts on organizations who internally deploy private addresses
>>>>
>>>> *      Randomly generated test data
>>>>
>>>> *      Use of randomly generated test data sets has its own issues
>>>> both in
>>>> terms of potential unintentional operational impacts as well as the
>>>> efficacy
>>>> of the underlying data models and relationships.
>>>>
>>>> *      Other options?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *      Requirements
>>>>
>>>> *      Publicly Accesible
>>>>
>>>> *      Statically on github
>>>>
>>>> *      Dynamically on TAXII reference gateways
>>>>
>>>> *      Minimize unforeseen/inadvertent impacts on internal and external
>>>> operational production networks
>>>>
>>>> *      Support for all major schema versions/formats
>>>>
>>>> *      STIX
>>>>
>>>> *      CybOx
>>>>
>>>> *      MAEC
>>>>
>>>> *      TAXII
>>>>
>>>> *      Specific source identification and tagging to discriminate and
>>>> control use of test data sets
>>>>
>>>> *      Ensure OPSEC exposures are limited to publicly available
>>>> intelligence
>>>> only
>>>>
>>>> *      Processes and tools for Localization/customization of key
>>>> parameters
>>>>
>>>> *      Implementation
>>>>
>>>> *      Community engagement
>>>>
>>>> *      Establish consensus on:
>>>>
>>>> *      Value proposition
>>>>
>>>> *      Working Group and Leadership for Initiative
>>>>
>>>> *      Objectives
>>>>
>>>> *      Process
>>>>
>>>> *      Key Milestones
>>>>
>>>> *      Test Data Set Generation Methodology
>>>>
>>>> *      Open source derivation?
>>>>
>>>> *      Private addressing?
>>>>
>>>> *      Randomly generated test data?
>>>>
>>>> *      Hybrid?
>>>>
>>>> *      Other?
>>>>
>>>> *      Deliverables
>>>>
>>>> *      Priorities
>>>>
>>>> *      Establish test data set specifications
>>