Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question.
Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels.
The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding specifications. In this case the “context” (which can be expressed in a separate reference
able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would form the JSON representation format implementation specification.
At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it.
I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate expressivity to fully map our domain or the capability to provide automated validation.
It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support).
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to a full semantic form with dereferenceable structures (such that the JSON-LD context
could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not to date been something we have put on the roadmap to tackle for STIX 2.0.
Does that help put it in context?
Anyone familiar with JSON-LD please feel free to point out any errors in my explanation.
sean
From: "[hidden email]" <[hidden email]> on behalf of Mark Davidson <[hidden email]>
Date: Friday, October 2, 2015 at 9:42 AM To: "John K. Smith" <[hidden email]>, Shawn Riley <[hidden email]>, Cory Casanave <[hidden email]> Cc: John Wunder <[hidden email]>, "Jordan, Bret" <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]> Subject: [cti-stix] RE: [cti-users] MTI Binding How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products
will send to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes
the place of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better
adoption by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to a
full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully represent
in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like the fundamental
needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this, but Nomagic
does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [mailto:[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding
specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would
form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate
expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs
form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not to
date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products will send
to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes the place
of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better adoption
by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Thank you for the clarification.
I still think you and I will continue to disagree on whether or not UML is an appropriate choice to attempt to represent the semantics needed for STIX and CybOX. I continue to hold the opinion that UML was/is not designed
to specify the semantics of a language or information representation like we are developing. It can be be useful for capturing much of what we need but not all.
In my opinion, UML is an appropriate stepping stone to get us where we need to go but not necessarily the best end solution.
I think we can continue to agree to disagree on this and still make progress. :-)
BTW, I would love to see a test of auto generating from the current models. I also don’t know the extent of RSA capability on this front. Would you be interested in using your Nomagic to give it a shot?
sean
From: Cory Casanave <[hidden email]>
Date: Friday, October 2, 2015 at 11:35 AM To: "Barnum, Sean D." <[hidden email]>, Mark Davidson <[hidden email]>, "John K. Smith" <[hidden email]>, Shawn Riley <[hidden email]> Cc: John Wunder <[hidden email]>, "Jordan, Bret" <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Jim Logan ([hidden email])" <[hidden email]> Subject: RE: [cti-stix] RE: [cti-users] MTI Binding Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to
a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully
represent in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like
the fundamental needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this,
but Nomagic does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the
binding specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s)
itself would form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has
adequate expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text
docs form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not
to date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products
will send to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes
the place of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better
adoption by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Sean, Re:
I think we can continue to agree to disagree on this and still make progress. :-) Yes we can. I have tried to make 2 separate assertions: 1: You need some high-level model as your foundation. 2: UML works for this, but there are other approaches.
Through personal experience I have confidence the UML approach works. I also know it is far from perfect, but so is everything else I have tried (including OWL). Re:
Would you be interested in using your Nomagic to give it a shot? Yes, I may even get Nomagic to help. The sticking point today is effective model import from RSA. Some work may be required on the models to make them appropriate for this purpose. From: Barnum, Sean D. [mailto:[hidden email]]
Thank you for the clarification. I still think you and I will continue to disagree on whether or not UML is an appropriate choice to attempt to represent the semantics needed for STIX and CybOX. I continue to hold the opinion
that UML was/is not designed to specify the semantics of a language or information representation like we are developing. It can be be useful for capturing much of what we need but not all. In my opinion, UML is an appropriate stepping stone to get us where we need to go but not necessarily the best end solution. I think we can continue to agree to disagree on this and still make progress. :-) BTW, I would love to see a test of auto generating from the current models. I also don’t know the extent of RSA capability on this front. Would you be interested in using your Nomagic to
give it a shot? sean From:
Cory Casanave <[hidden email]> Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to a
full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully represent
in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like the fundamental
needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this, but Nomagic
does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding
specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would
form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate
expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs
form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not to
date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products will send
to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes the place
of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better adoption
by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
I agree completely with your first assertion.
It is the second one we disagree on. I definitely agree that nothing is perfect (including OWL) but my personal experience leads me to have lower confidence that UML will be adequate.
As we agreed, we can still agree to disagree. :-) How very meta. ;-)
On the model import, are you trying to import the .emx or the more general UML2.2 form? Are you having issues with both?
sean
From: Cory Casanave <[hidden email]>
Date: Friday, October 2, 2015 at 11:57 AM To: "Barnum, Sean D." <[hidden email]>, Mark Davidson <[hidden email]>, "John K. Smith" <[hidden email]>, Shawn Riley <[hidden email]> Cc: John Wunder <[hidden email]>, "Jordan, Bret" <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Jim Logan ([hidden email])" <[hidden email]> Subject: RE: [cti-stix] RE: [cti-users] MTI Binding Sean, Re:
I think we can continue to agree to disagree on this and still make progress. :-) Yes we can. I have tried to make 2 separate assertions: 1: You need some high-level model as your foundation. 2: UML works for this, but there are
other approaches. Through personal experience I have confidence the UML approach works. I also know it is far from perfect, but so is everything else I have tried (including OWL). Re:
Would you be interested in using your Nomagic to give it a shot? Yes, I may even get Nomagic to help. The sticking point today is effective model import from RSA. Some work may be required on the models to make them appropriate for this purpose. From: Barnum, Sean D. [[hidden email]]
Thank you for the clarification. I still think you and I will continue to disagree on whether or not UML is an appropriate choice to attempt to represent the semantics needed for STIX and CybOX. I continue to hold the opinion
that UML was/is not designed to specify the semantics of a language or information representation like we are developing. It can be be useful for capturing much of what we need but not all. In my opinion, UML is an appropriate stepping stone to get us where we need to go but not necessarily the best end solution. I think we can continue to agree to disagree on this and still make progress. :-) BTW, I would love to see a test of auto generating from the current models. I also don’t know the extent of RSA capability on this front. Would you be interested in using your Nomagic to give
it a shot? sean From:
Cory Casanave <[hidden email]> Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to
a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully
represent in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like
the fundamental needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this,
but Nomagic does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the
binding specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s)
itself would form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has
adequate expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text
docs form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not
to date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products
will send to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes
the place of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better
adoption by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Has any one considered: PMML stands for "Predictive Model Markup Language". It is the de facto standard to represent
predictive solutions. A PMML file may contain a myriad of data transformations (pre- and post-processing) as well as one or more predictive models. Because it is a standard, PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among different
tools and applications without the need for custom coding. For example, it may be developed in one application and directly deployed on another. Traditionally, the deployment of a predictive solution could take months, since after building it, the data scientist team had to write a document describing the entire solution.
This document was then passed to the IT engineering team, which would then recode it into the production environment to make the solution operational. With PMML, that double effort is no longer required since the predictive solution as a whole (data transformations
+ predictive model) is simply represented as a PMML file which is then used as is for production deployment. What took months before, now takes hours or minutes with PMML. PMML is developed by the Data Mining Group (DMG), a consortium of commercial and open-source data mining companies. The latest version of PMML, version 4.1, was released by the
DMG in December 2011. Since PMML is XML-based, it is not rocket science. Its structure follows a set of pre-defined elements and attributes which reflect the inner structure of a predictive workflow:
data manipulations followed by one or more predictive models. What are the benefits of PMML? PMML makes it extremely easy for any predictive solution to be moved from one data mining system to another. For example, once represented as a PMML file, a predictive solution
can be operationally deployed right away, without the need for custom code. In this way, PMML transforms predictive analytic solutions into dynamic assets that can be put to work immediately. For big companies with many in-house statistical and data mining tools, PMML works as the common denominator, since whenever the solution is built, it is immediately represented
as a PMML file. This allows companies to use "best of breed" tools to build the best possible solutions. Since PMML is a standard, it also fosters transparency and best practices. Transparency comes from the fact that the predictive solution is no longer a black box. Open the box and
understanding what is inside, the analytics team can easily recognize past decisions and establish practices that work. What kind of predictive techniques are supported by PMML? PMML defines specific elements for several predictive techniques, including neural networks, decision trees, and clustering models, to name just a few. New techniques just recently
supported are k-Nearest Neighbors and Scorecards, which include reason codes. PMML also defines an element for representing multiple models. That is, PMML can be used to represent model segmentation, composition, chaining, cascading, and ensemble, including
Random Forest Models. To review all the elements supported by PMML, take a look at the language specification at the DMG website (see Resources below). Can PMML represent data pre- and post-processing? PMML has several built-in functions, such as IF-THEN-ELSE and arithmetic functions, that allow for extensive data manipulation. It also defines specific elements for the most common
pre-processing tasks such as normalization, discretization, and value mapping. To review all the pre-processing capabilities PMML has to offer, refer to the
PMML pre-processing primer. With PMML 4.1, all the capabilities available for data pre-processing were also made available for post-processing. In this case, a PMML file can now also contain a set of business
rules that define actions or decisions to be taken based on the outcome of the predictive model. A PMML file can represent the entire predictive solution, from raw data and model to business decisions. Resources Websites
Book:
PMML in Action (2nd Edition) - Available on Amazon.com From: [hidden email] [mailto:[hidden email]]
On Behalf Of Barnum, Sean D. I agree completely with your first assertion. It is the second one we disagree on. I definitely agree that nothing is perfect (including OWL) but my personal experience leads me to have lower confidence that
UML will be adequate. As we agreed, we can still agree to disagree. :-) How very meta. ;-) On the model import, are you trying to import the .emx or the more general UML2.2 form? Are you having issues with both? sean From:
Cory Casanave <[hidden email]> Sean, Re:
I think we can continue to agree to disagree on this and still make progress. :-) Yes we can. I have tried to make 2 separate assertions: 1: You need some high-level model as your foundation. 2: UML works for this, but there are other approaches.
Through personal experience I have confidence the UML approach works. I also know it is far from perfect, but so is everything else I have tried (including OWL). Re:
Would you be interested in using your Nomagic to give it a shot? Yes, I may even get Nomagic to help. The sticking point today is effective model import from RSA. Some work may be required on the models to make them appropriate for this purpose. From: Barnum, Sean D. [[hidden email]]
Thank you for the clarification. I still think you and I will continue to disagree on whether or not UML is an appropriate choice to attempt to represent the semantics needed for STIX and CybOX. I continue to
hold the opinion that UML was/is not designed to specify the semantics of a language or information representation like we are developing. It can be be useful for capturing much of what we need but not all. In my opinion, UML is an appropriate stepping stone to get us where we need to go but not necessarily the best end solution. I think we can continue to agree to disagree on this and still make progress. :-) BTW, I would love to see a test of auto generating from the current models. I also don’t know the extent of RSA capability on this front. Would you be interested in using your
Nomagic to give it a shot? sean From:
Cory Casanave <[hidden email]> Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to a
full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully represent
in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like the fundamental
needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this, but Nomagic
does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding
specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would
form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate
expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs
form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not to
date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products will send
to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes the place
of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better adoption
by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Help me understand how some of these other lesser known bindings are going to make things easier and faster for the average million+ open source, web application, and APP developers? Further are these lesser known bindings in the development stacks of the majority of networking and security product vendors? If not, then that is a show stopper. We need to make things easier, not more complex. If JSON-LD or some other binding is going to make things easier, please give real examples so we can all better understand why.
Keep in mind that we are trying to get away from an overly complex and difficult to implement XML model, that few understand, so we can increase adoption. I do not want to propose that we adopt yet another difficult to understand and non-widely accepted binding. When I look at the traction that Facebook's ThreatExchange with its simple JSON structure is getting, we need to realize that there might be something to that. Developers love JSON, for what ever reason, but they do. Further I do not see how native JSON can not do everything we need. Intelworks is using it and it seem to be working quite well for their CTI solution and Soltra is using it on the backend of their product for all of their needs and other companies that I talk to are immediately translating the XML on the wire to JSON so they can "do something with it".
I would love to see some examples of a real indicator with a TTP in JSON-LD so we can better understand what that would mean and look like. I am okay with getting on board with JSON-LD, I just want to make sure it does not hinder broad scale adoption. Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards | Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." |
In reply to this post by Camp, Warren (CTR)
Warren, I think exchanging PMML Models in STIX packages is an excellent way to convey and iterate on higher level heuristic models (vs. exchanging ephemeral IOCs). It would also be a great way to "show your work" when making higher level assertions. Context here is of course for the narrow set of communities with the capability maturity and desire to exchange analytic models. Patrick Maroney (609)841-5104 Has any one considered: PMML stands for "Predictive Model Markup Language". It is the de facto standard to represent
predictive solutions. A PMML file may contain a myriad of data transformations (pre- and post-processing) as well as one or more predictive models. Because it is a standard, PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among different
tools and applications without the need for custom coding. For example, it may be developed in one application and directly deployed on another. Traditionally, the deployment of a predictive solution could take months, since after building it, the data scientist team had to write a document describing the entire solution.
This document was then passed to the IT engineering team, which would then recode it into the production environment to make the solution operational. With PMML, that double effort is no longer required since the predictive solution as a whole (data transformations
+ predictive model) is simply represented as a PMML file which is then used as is for production deployment. What took months before, now takes hours or minutes with PMML. PMML is developed by the Data Mining Group (DMG), a consortium of commercial and open-source data mining companies. The latest version of PMML, version 4.1, was released by the
DMG in December 2011. Since PMML is XML-based, it is not rocket science. Its structure follows a set of pre-defined elements and attributes which reflect the inner structure of a predictive workflow:
data manipulations followed by one or more predictive models. What are the benefits of PMML? PMML makes it extremely easy for any predictive solution to be moved from one data mining system to another. For example, once represented as a PMML file, a predictive solution
can be operationally deployed right away, without the need for custom code. In this way, PMML transforms predictive analytic solutions into dynamic assets that can be put to work immediately. For big companies with many in-house statistical and data mining tools, PMML works as the common denominator, since whenever the solution is built, it is immediately represented
as a PMML file. This allows companies to use "best of breed" tools to build the best possible solutions. Since PMML is a standard, it also fosters transparency and best practices. Transparency comes from the fact that the predictive solution is no longer a black box. Open the box and
understanding what is inside, the analytics team can easily recognize past decisions and establish practices that work. What kind of predictive techniques are supported by PMML? PMML defines specific elements for several predictive techniques, including neural networks, decision trees, and clustering models, to name just a few. New techniques just recently
supported are k-Nearest Neighbors and Scorecards, which include reason codes. PMML also defines an element for representing multiple models. That is, PMML can be used to represent model segmentation, composition, chaining, cascading, and ensemble, including
Random Forest Models. To review all the elements supported by PMML, take a look at the language specification at the DMG website (see Resources below). Can PMML represent data pre- and post-processing? PMML has several built-in functions, such as IF-THEN-ELSE and arithmetic functions, that allow for extensive data manipulation. It also defines specific elements for the most common
pre-processing tasks such as normalization, discretization, and value mapping. To review all the pre-processing capabilities PMML has to offer, refer to the
PMML pre-processing primer. With PMML 4.1, all the capabilities available for data pre-processing were also made available for post-processing. In this case, a PMML file can now also contain a set of business
rules that define actions or decisions to be taken based on the outcome of the predictive model. A PMML file can represent the entire predictive solution, from raw data and model to business decisions. Resources Websites
Book:
PMML in Action (2nd Edition) - Available on Amazon.com From: [hidden email] [[hidden email]]
On Behalf Of Barnum, Sean D. I agree completely with your first assertion. It is the second one we disagree on. I definitely agree that nothing is perfect (including OWL) but my personal experience leads me to have lower confidence that
UML will be adequate. As we agreed, we can still agree to disagree. :-) How very meta. ;-) On the model import, are you trying to import the .emx or the more general UML2.2 form? Are you having issues with both? sean From:
Cory Casanave <[hidden email]> Sean, Re:
I think we can continue to agree to disagree on this and still make progress. :-) Yes we can. I have tried to make 2 separate assertions: 1: You need some high-level model as your foundation. 2: UML works for this, but there are other approaches.
Through personal experience I have confidence the UML approach works. I also know it is far from perfect, but so is everything else I have tried (including OWL). Re:
Would you be interested in using your Nomagic to give it a shot? Yes, I may even get Nomagic to help. The sticking point today is effective model import from RSA. Some work may be required on the models to make them appropriate for this purpose. From: Barnum, Sean D. [[hidden email]]
Thank you for the clarification. I still think you and I will continue to disagree on whether or not UML is an appropriate choice to attempt to represent the semantics needed for STIX and CybOX. I continue to
hold the opinion that UML was/is not designed to specify the semantics of a language or information representation like we are developing. It can be be useful for capturing much of what we need but not all. In my opinion, UML is an appropriate stepping stone to get us where we need to go but not necessarily the best end solution. I think we can continue to agree to disagree on this and still make progress. :-) BTW, I would love to see a test of auto generating from the current models. I also don’t know the extent of RSA capability on this front. Would you be interested in using your
Nomagic to give it a shot? sean From:
Cory Casanave <[hidden email]> Sean, Good summary and context. I would disagree with one point: Re:
It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs form to a
full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. The JASON-LD schema, which is essentially RDF, can be generated from well-formed UML just like XML schema can*. If there were semantics you could not fully represent
in JSON-LD the UML model (and/or generated OWL) could also be referenced to add such semantics. There are a few things you can say in full OWL you can’t say directly in UML and a few things in UML you can’t say directly in OWL – but it seems like the fundamental
needs of CTI are captured in both, and they can be mapped. *Here is a product that does so:
http://www.nomagic.com/products/magicdraw-addons/cameo-concept-modeler-plugin.html It would not be much of a stretch to test generating from your current models to a schema that can be used with JSON-LD (Don’t know if RSA does this, but Nomagic
does). Nomagic copied for comment. -Cory From: Barnum, Sean D. [[hidden email]]
Without going to far down the rabbit hole right now let me take a VERY simple stab at providing some context on your question. Out stack for the CTI TC ecosystem language specs (STIX and CybOX) looks something like:
JSON-LD would basically fit into this stack at the binding specification and representation format levels. The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding
specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would
form the JSON representation format implementation specification. At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it. I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate
expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support). It should be pointed out that for this option to really be practical we would need to move our ontology/data-model spec from its current UML model with text docs
form to a full semantic form with dereferenceable structures (such that the JSON-LD context could do the appropriate mappings in an LD fashion. This is something we have talked about for quite a while as our ultimate goal for many reasons but it has not to
date been something we have put on the roadmap to tackle for STIX 2.0. Does that help put it in context? Anyone familiar with JSON-LD please feel free to point out any errors in my explanation. sean From:
"[hidden email]" <[hidden email]> on
behalf of Mark Davidson <[hidden email]> How does something like JSON-LD fit into the serialization discussion? For the MTI format discussion we are talking about the thing that products will send
to each other (I think, anyway). I did some quick reading on RDF / JSON-LD (complete newbie, forgive my ignorance), and I didn’t get a clear picture on how it would fit. For instance, as a completely trivial example, imagine a tool sending indicators out to sensors: { ‘type’: ‘indicator’, ‘content-type’: ‘snort-signature’, ‘signature’: ‘alert any any’} Would JSON-LD (or something like it) take the place of the JSON listed above? Or would JSON-LD get automagically translated into something that takes the place
of the JSON listed above? Or am I completely off-base in my questions? Thank you. -Mark From: John K. Smith [[hidden email]]
Just my 2 cents … having used RDF, TTL etc for security ontologies, I think leveraging something like JSON-LD will help better adoption
by broader group. Seems like schema.org is using JSON-LD but I’m not sure to what extent. Thanks, JohnS From:[hidden email]
[[hidden email]]
On Behalf Of Shawn Riley Just wanted to share a couple links that might be of interest here for RDF translation. RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. JSON-LD parser and serializer plugins for RDFLib (Python 2.5+) Here is a online example of a RDF to multi-format translator. On Thu, Oct 1, 2015 at 1:39 PM, Cory Casanave <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |