CPE Future Vision

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

CPE Future Vision

Andrew Buttner
Administrator
All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.  At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction.  The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near term we have a new minor release (2.2) scheduled to be official on March 11.  There is also a huge push currently ongoing to clean-up the Official CPE Dictionary.  But where do we go after that?  There is a lot in this email, for that I apologize.  Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered is what are we enumerating?  CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case.  This seems to be the single use case that is shared across all members of the CPE Community.  By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications).  What is a software product?  This could be defined using the following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.  CPE would not try to name web pages, code libraries, functional types, etc.  These areas are still important and we as a community need to address them.  The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission.  A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements.  This in a way relates to the goal of the CPE Language.  The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE.  Should this be removed from CPE and stood up on its own or merged into an existing initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past:

- don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g. product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id

- matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target

What is your reaction to the ideas above?  Are there other ideas that need to be considered?  Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success.  I thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Wolfkiel, Joseph
I'm somewhat bummed that I haven't seen any discussion on this issue.

I'll go ahead and try to link it to earlier discussions about ontologies.

Basically, I think the URI structure imposes unacceptable limits on our
ability to express the names we need.  I'd like to go to a tagged structure
based on a more informed understanding of what a CPE is.  At the end of the
day, I think CPE should be about names for installable software, legal
relationships between those names, and managing a body of community content
that can be used to derive the consensus name for any given product--so we
can build interoperable tools.

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE
should break away from the URI structure and go to a tagged structure that
allows users to populate just the elements they want to communicate.  I'm
also thinking CPE should only address installable software inventories and
not try to differentiate between OS and applications.  I don't think vendor
is a good base for products, particularly given the open source community
and the potential to have a single product distributed by multiple vendors.
I think product is a better base.  I'm also wondering if any type of
transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing
how a given vendor must use the text strings, titles, and other information
tracked in CPE, so I'm thinking it may not have any business being part of
the standard.

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE
might look.  I'm still trying to determine if there's a dependency between
version and update, but I'm almost completely sure there isn't a dependency
between edition and version, or between edition and update.  I'm also
thinking we may need to have subordinate elements of version that break out
major, minor, and sub-minor version information.

I also think deprecation should take place on a per-component basis.  Also
that there's no guarantee of uniqueness for the text names of CPE
components, so they should be assigned unique identifiers, which should be
the basis for managing deprecation.

Let me know what you think.  If we agree on this, we can put an "any" tag at
the end of the standard CPE Component element list and other standards can
expand on cpe core data by bringing in data elements that address function,
family, hash, etc.

That said, the CPE forum is meant to be consensus driven, so I'll bow to the
collective wisdom of contributors to the list.

Also attached an xml schema that implements the ontology as an XML language.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.
At that time the community wanted to see CPE go through a stabilization
period and have the community attempt to use the specification in order to
get a better feel for future direction.  The past year has seen a lot of
conversation within the community about possible direction with some
different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near
term we have a new minor release (2.2) scheduled to be official on March 11.
There is also a huge push currently ongoing to clean-up the Official CPE
Dictionary.  But where do we go after that?  There is a lot in this email,
for that I apologize.  Hopefully some of these points can sparks some
discussion as your views will help us better understand where CPE needs to
go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered
is what are we enumerating?  CPE currently is about enumerating platform
types, but this has proved to be a very broad term, and CPE has struggled to
address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use
cases, one option would be for CPE to focus on the software inventory
technical use case.  This seems to be the single use case that is shared
across all members of the CPE Community.  By narrowing our focus, we can
hopefully deliver a solution that works for those users and not get bogged
down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform
types based on the underlying software products (either operating systems or
applications).  What is a software product?  This could be defined using the
following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network
and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.
CPE would not try to name web pages, code libraries, functional types, etc.
These areas are still important and we as a community need to address them.
The suggestion however is to address them with their own enumerations and
enable CPE to focus on its core mission.  A movement toward multiple
enumerations brings to light the need for a good expression language to tie
everything together and make more complex statements.  This in a way relates
to the goal of the CPE Language.  The CPE Language is currently under-used
(if at all) and really goes against the idea of simplifying CPE.  Should
this be removed from CPE and stood up on its own or merged into an existing
initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the
technical challenges encountered and to try and solve the issues that have
been experienced in version 2. Some of the ideas that have been brought up
in the past:

- don't make any major changes, even with its issues CPE is working well
enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g.
product name changes) and are not consistent (e.g. 'windows_2003_server' and
'window_server_2003'), thus we should move away from the URI and switch to a
numerical id

- matching is the root of CPE's issues (e.g. the version component) and need
to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression
enabling a user to talk about some combination of vendor, product, and
version related to a target

What is your reaction to the ideas above?  Are there other ideas that need
to be considered?  Over the next few weeks I will be putting together a
proposal for where to go with CPE and what changes should be considered, but
I need your input to make sure that Version 3 is a long-term success.  I
thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515

CPE Ontology in UML 5 Mar 09.jpg (94K) Download Attachment
cpe-core.xsd (9K) Download Attachment
smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Ernest Park-2


On Thu, Mar 5, 2009 at 3:50 PM, Wolfkiel, Joseph <[hidden email]> wrote:
I'm somewhat bummed that I haven't seen any discussion on this issue.

I'll go ahead and try to link it to earlier discussions about ontologies.

Basically, I think the URI structure imposes unacceptable limits on our
ability to express the names we need.  I'd like to go to a tagged structure
based on a more informed understanding of what a CPE is.  At the end of the
day, I think CPE should be about names for installable software, legal
relationships between those names, and managing a body of community content
that can be used to derive the consensus name for any given product--so we
can build interoperable tools.

THe URI structure does provide a simple API - a known way of communication.

Agreed - it is not a database, and the one dimensional structure is hard to define the layered interdependancies.
 

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE
should break away from the URI structure and go to a tagged structure that
allows users to populate just the elements they want to communicate.  I'm
also thinking CPE should only address installable software inventories and
not try to differentiate between OS and applications.

It is a reasonable distinction - and any distinction allows further filtering of data. Whether everybody needs the "application" information or not, it can be output as a CPE 1.x or 2.x "string" as the result to a basic query. 

Once the string is satisfied, the additional metadata tags - open, proprietary, community developed - can also be interrogated, assuming a general output XML schema or database table schema is agreed. 

In this way, it is the overall schema with its complex relationships that defines CPE, not a limited, but valuable, string that gets us to narrow down the choices that resolve a more complex query.
 
 I don't think vendor
is a good base for products, particularly given the open source community
and the potential to have a single product distributed by multiple vendors.
I think product is a better base.  I'm also wondering if any type of
transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing
how a given vendor must use the text strings, titles, and other information
tracked in CPE, so I'm thinking it may not have any business being part of
the standard.

For OSS - vendor in reality needs to be the lowest common denominator, or can be a few. In my database, a single product can be joined to multiple vendors due to changes in publishing, development, etc, over the life. I can therefore "fix" dynamic and historically evolving data to be a flat string to serve the needs of CPE constrained reporting, while my database is aware of the complex relations that make up a name definition. This problem is not distinct to OSS, as acquisitions, bankruptcies, lawsuits, all change hte ownership and the provenance of software over its life. 

If a CPE string is a reporting mechanism, then it works. The database has to understand the complexity of the data, but hte string output can be simplified and human friendly.

In this way, by CPE normalized output can be read by another tool that can read CPE, and it can layer additional metadata into a three dimensional output.

I still think that the basic string elements are valid, as long as we  agree to synchronize the highest level of the database, then we all communicate using these strings. 

Additionally, these strings may not be unique, or for any given combination, there may be multiple different results. Understanding this, the query that builds the result set needs to contain additional test logic to further qualifiy the multiple result possibilities for the most likely match.


 

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE
might look.  I'm still trying to determine if there's a dependency between
version and update, but I'm almost completely sure there isn't a dependency
between edition and version, or between edition and update.  I'm also
thinking we may need to have subordinate elements of version that break out
major, minor, and sub-minor version information.

I also think deprecation should take place on a per-component basis.  Also
that there's no guarantee of uniqueness for the text names of CPE
components, so they should be assigned unique identifiers, which should be
the basis for managing deprecation.

Why deprecate at all? I maintain old names - all names, just in case older data is floating around. I can correct the query and I still have all the permutations. I don't care if I have ten different things that resolve to apache:server (made up example), I can drill through a smaller result set of ten records easier than 50,000 and more.


Let me know what you think.  If we agree on this, we can put an "any" tag at
the end of the standard CPE Component element list and other standards can
expand on cpe core data by bringing in data elements that address function,
family, hash, etc.

That said, the CPE forum is meant to be consensus driven, so I'll bow to the
collective wisdom of contributors to the list.

Also attached an xml schema that implements the ontology as an XML language.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.
At that time the community wanted to see CPE go through a stabilization
period and have the community attempt to use the specification in order to
get a better feel for future direction.  The past year has seen a lot of
conversation within the community about possible direction with some
different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near
term we have a new minor release (2.2) scheduled to be official on March 11.
There is also a huge push currently ongoing to clean-up the Official CPE
Dictionary.  But where do we go after that?  There is a lot in this email,
for that I apologize.  Hopefully some of these points can sparks some
discussion as your views will help us better understand where CPE needs to
go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered
is what are we enumerating?  CPE currently is about enumerating platform
types, but this has proved to be a very broad term, and CPE has struggled to
address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use
cases, one option would be for CPE to focus on the software inventory
technical use case.  This seems to be the single use case that is shared
across all members of the CPE Community.  By narrowing our focus, we can
hopefully deliver a solution that works for those users and not get bogged
down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform
types based on the underlying software products (either operating systems or
applications).  What is a software product?  This could be defined using the
following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network
and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.
CPE would not try to name web pages, code libraries, functional types, etc.
These areas are still important and we as a community need to address them.
The suggestion however is to address them with their own enumerations and
enable CPE to focus on its core mission.  A movement toward multiple
enumerations brings to light the need for a good expression language to tie
everything together and make more complex statements.  This in a way relates
to the goal of the CPE Language.  The CPE Language is currently under-used
(if at all) and really goes against the idea of simplifying CPE.  Should
this be removed from CPE and stood up on its own or merged into an existing
initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the
technical challenges encountered and to try and solve the issues that have
been experienced in version 2. Some of the ideas that have been brought up
in the past:

- don't make any major changes, even with its issues CPE is working well
enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g.
product name changes) and are not consistent (e.g. 'windows_2003_server' and
'window_server_2003'), thus we should move away from the URI and switch to a
numerical id

- matching is the root of CPE's issues (e.g. the version component) and need
to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression
enabling a user to talk about some combination of vendor, product, and
version related to a target

What is your reaction to the ideas above?  Are there other ideas that need
to be considered?  Over the next few weeks I will be putting together a
proposal for where to go with CPE and what changes should be considered, but
I need your input to make sure that Version 3 is a long-term success.  I
thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515

Reply | Threaded
Open this post in threaded view
|

Re: *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision

Dawn Adams
In reply to this post by Wolfkiel, Joseph

Hi Joe,

 

So far this seems like a pretty good idea.

 

Do you agree with this statement?

CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target.

By this would the target would also be part of the CPE ID of a product however the naming is resolved?

 

How would hardware based products fit into the CPE standard – if at all?

 

I agree with your discussion of deprecation and URIs.

 

Dawn

 

 

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: March 5, 2009 3:50 PM
To: [hidden email]
Subject: *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision

 

* PGP Bad Signature, Signed by an unverified key

 

I'm somewhat bummed that I haven't seen any discussion on this issue.

 

I'll go ahead and try to link it to earlier discussions about ontologies.

 

Basically, I think the URI structure imposes unacceptable limits on our

ability to express the names we need.  I'd like to go to a tagged structure

based on a more informed understanding of what a CPE is.  At the end of the

day, I think CPE should be about names for installable software, legal

relationships between those names, and managing a body of community content

that can be used to derive the consensus name for any given product--so we

can build interoperable tools.

 

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE

should break away from the URI structure and go to a tagged structure that

allows users to populate just the elements they want to communicate.  I'm

also thinking CPE should only address installable software inventories and

not try to differentiate between OS and applications.  I don't think vendor

is a good base for products, particularly given the open source community

and the potential to have a single product distributed by multiple vendors.

I think product is a better base.  I'm also wondering if any type of

transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing

how a given vendor must use the text strings, titles, and other information

tracked in CPE, so I'm thinking it may not have any business being part of

the standard.

 

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE

might look.  I'm still trying to determine if there's a dependency between

version and update, but I'm almost completely sure there isn't a dependency

between edition and version, or between edition and update.  I'm also

thinking we may need to have subordinate elements of version that break out

major, minor, and sub-minor version information.

 

I also think deprecation should take place on a per-component basis.  Also

that there's no guarantee of uniqueness for the text names of CPE

components, so they should be assigned unique identifiers, which should be

the basis for managing deprecation.

 

Let me know what you think.  If we agree on this, we can put an "any" tag at

the end of the standard CPE Component element list and other standards can

expand on cpe core data by bringing in data elements that address function,

family, hash, etc.

 

That said, the CPE forum is meant to be consensus driven, so I'll bow to the

collective wisdom of contributors to the list.

 

Also attached an xml schema that implements the ontology as an XML language.

 

Lt Col Joseph L. Wolfkiel

Director, Computer Network Defense Research & Technology (CND R&T) Program

Management Office

9800 Savage Rd Ste 6767

Ft Meade, MD 20755-6767

Commercial 410-854-5401 DSN 244-5401

Fax 410-854-6700

 

-----Original Message-----

From: Buttner, Drew [mailto:[hidden email]]

Sent: Wednesday, March 04, 2009 2:07 PM

To: [hidden email]

Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

 

All,

 

Version 2.0 of the CPE Specification was released on September 14th, 2008.

At that time the community wanted to see CPE go through a stabilization

period and have the community attempt to use the specification in order to

get a better feel for future direction.  The past year has seen a lot of

conversation within the community about possible direction with some

different ideas about the best future path to take.

 

I wanted to start discussion on the future vision for CPE.  In the very near

term we have a new minor release (2.2) scheduled to be official on March 11.

There is also a huge push currently ongoing to clean-up the Official CPE

Dictionary.  But where do we go after that?  There is a lot in this email,

for that I apologize.  Hopefully some of these points can sparks some

discussion as your views will help us better understand where CPE needs to

go.

 

Questions below:

 

- What are we enumerating?

- Is software inventory THE target technical use case?

- Should the CPE Language be removed?

- What should we do with CPE Matching?

- Should we keep the URI?

 

-------------------------

 

By name CPE is an enumeration.  Probably the biggest question to be answered

is what are we enumerating?  CPE currently is about enumerating platform

types, but this has proved to be a very broad term, and CPE has struggled to

address what a platform type is.  More on this in a second.

 

Based on the research accomplished this past year regarding technical use

cases, one option would be for CPE to focus on the software inventory

technical use case.  This seems to be the single use case that is shared

across all members of the CPE Community.  By narrowing our focus, we can

hopefully deliver a solution that works for those users and not get bogged

down trying to support fringe cases. Agree?

 

The software inventory technical use case calls for enumerating platform

types based on the underlying software products (either operating systems or

applications).  What is a software product?  This could be defined using the

following characteristics:

 

* A user can download or buy it.

* There is a vendor/organization that produces it.

* An enterprise IT administrator can push it out over the enterprise network

and install it into their environment.

* It is (or can be) recorded by an asset management tool.

 

In other words, every CPE Name should have at its root a software product.

CPE would not try to name web pages, code libraries, functional types, etc.

These areas are still important and we as a community need to address them.

The suggestion however is to address them with their own enumerations and

enable CPE to focus on its core mission.  A movement toward multiple

enumerations brings to light the need for a good expression language to tie

everything together and make more complex statements.  This in a way relates

to the goal of the CPE Language.  The CPE Language is currently under-used

(if at all) and really goes against the idea of simplifying CPE.  Should

this be removed from CPE and stood up on its own or merged into an existing

initiative?  Thoughts?

 

As we address the questions above, CPE might need to evolve to meet the

technical challenges encountered and to try and solve the issues that have

been experienced in version 2. Some of the ideas that have been brought up

in the past:

 

- don't make any major changes, even with its issues CPE is working well

enough, focus on some minor tweaking

 

- the URI is a major problem as the terms used are not permanent (e.g.

product name changes) and are not consistent (e.g. 'windows_2003_server' and

'window_server_2003'), thus we should move away from the URI and switch to a

numerical id

 

- matching is the root of CPE's issues (e.g. the version component) and need

to either be removed or completely rewritten (can we leverage an ontology?)

 

- CPE should not try to be an enumeration but rather should be an expression

enabling a user to talk about some combination of vendor, product, and

version related to a target

 

What is your reaction to the ideas above?  Are there other ideas that need

to be considered?  Over the next few weeks I will be putting together a

proposal for where to go with CPE and what changes should be considered, but

I need your input to make sure that Version 3 is a long-term success.  I

thank you in advance for help you can provide on steering this ship.

 

Thanks

Drew

 

---------

 

Andrew Buttner

The MITRE Corporation

[hidden email]

781-271-3515

 

* Wolfkiel.Joseph.L.0514105171 <[hidden email]>

* Issuer: U.S. Government - Unverified

Reply | Threaded
Open this post in threaded view
|

Re: *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision

Wolfkiel, Joseph
The enumeration concept has been a confusing discussion.  When I talk about "enumeration" with respect to CPE, I'm thinking that CPE should "enumerate" all legal combinations of CPE component names (vendor/product/version/update/edition/targetHW/targetSW/language) along with the names.  When we can fully populate all combinations, we have "enumerated" all software names.
 
With respect to the above explanation, I think having the enumeration allows to share a common lexicon when sharing information about names for, and linkages between vendor, product, and version related to a target.  My thought is that providing a combination of vendor, product, and version (or other components) would be how you would express a valid CPE name. 
 
Alternatively, you could express unique IDs (i.e. alpha-numeric IDs like CPE(vend=666 prod=123 vers=658)  for each vendor, product, and version.  I'm a little ambiguous about the numeric identifiers because they assume you actually know the names of all the software you want to describe beforehand.  When we want to use automated discovery tools, any time a new product shows up, it leaves the tool without a way to communicate the previously unseen product or product component.
 
Within the DoD, we've seen very limited use for describing hardware.  Generally, when something is described as hardware, we're trying to relate it back to firmware apps or firmware OSs.  I'm not aware of any existing vulnerabilities or settings contained in security guidance that are actually targeted at physical hardware (e.g. setting switches, disconnecting cables, installed power supplies) that can be scanned for with automated tools.  I also think going into that domain will cause any number of problems with naming and ontological relationships.
 
Short answer, I would advocate for dispensing with hardware in CPE for the time being.  However, I would allow hardware names to be used to represent the firmware installed on hardware.  That's one of the reasons I would advocate for doing away with the part type attribute, since I can't really think of a place where it adds value, but many where it sows confusion.
 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

 


From: Dawn Adams [mailto:[hidden email]]
Sent: Thursday, March 05, 2009 4:21 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision

Hi Joe,

 

So far this seems like a pretty good idea.

 

Do you agree with this statement?

CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target.

By this would the target would also be part of the CPE ID of a product however the naming is resolved?

 

How would hardware based products fit into the CPE standard – if at all?

 

I agree with your discussion of deprecation and URIs.

 

Dawn

 

 

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: March 5, 2009 3:50 PM
To: [hidden email]
Subject: *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision

 

* PGP Bad Signature, Signed by an unverified key

 

I'm somewhat bummed that I haven't seen any discussion on this issue.

 

I'll go ahead and try to link it to earlier discussions about ontologies.

 

Basically, I think the URI structure imposes unacceptable limits on our

ability to express the names we need.  I'd like to go to a tagged structure

based on a more informed understanding of what a CPE is.  At the end of the

day, I think CPE should be about names for installable software, legal

relationships between those names, and managing a body of community content

that can be used to derive the consensus name for any given product--so we

can build interoperable tools.

 

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE

should break away from the URI structure and go to a tagged structure that

allows users to populate just the elements they want to communicate.  I'm

also thinking CPE should only address installable software inventories and

not try to differentiate between OS and applications.  I don't think vendor

is a good base for products, particularly given the open source community

and the potential to have a single product distributed by multiple vendors.

I think product is a better base.  I'm also wondering if any type of

transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing

how a given vendor must use the text strings, titles, and other information

tracked in CPE, so I'm thinking it may not have any business being part of

the standard.

 

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE

might look.  I'm still trying to determine if there's a dependency between

version and update, but I'm almost completely sure there isn't a dependency

between edition and version, or between edition and update.  I'm also

thinking we may need to have subordinate elements of version that break out

major, minor, and sub-minor version information.

 

I also think deprecation should take place on a per-component basis.  Also

that there's no guarantee of uniqueness for the text names of CPE

components, so they should be assigned unique identifiers, which should be

the basis for managing deprecation.

 

Let me know what you think.  If we agree on this, we can put an "any" tag at

the end of the standard CPE Component element list and other standards can

expand on cpe core data by bringing in data elements that address function,

family, hash, etc.

 

That said, the CPE forum is meant to be consensus driven, so I'll bow to the

collective wisdom of contributors to the list.

 

Also attached an xml schema that implements the ontology as an XML language.

 

Lt Col Joseph L. Wolfkiel

Director, Computer Network Defense Research & Technology (CND R&T) Program

Management Office

9800 Savage Rd Ste 6767

Ft Meade, MD 20755-6767

Commercial 410-854-5401 DSN 244-5401

Fax 410-854-6700

 

-----Original Message-----

From: Buttner, Drew [mailto:[hidden email]]

Sent: Wednesday, March 04, 2009 2:07 PM

To: [hidden email]

Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

 

All,

 

Version 2.0 of the CPE Specification was released on September 14th, 2008.

At that time the community wanted to see CPE go through a stabilization

period and have the community attempt to use the specification in order to

get a better feel for future direction.  The past year has seen a lot of

conversation within the community about possible direction with some

different ideas about the best future path to take.

 

I wanted to start discussion on the future vision for CPE.  In the very near

term we have a new minor release (2.2) scheduled to be official on March 11.

There is also a huge push currently ongoing to clean-up the Official CPE

Dictionary.  But where do we go after that?  There is a lot in this email,

for that I apologize.  Hopefully some of these points can sparks some

discussion as your views will help us better understand where CPE needs to

go.

 

Questions below:

 

- What are we enumerating?

- Is software inventory THE target technical use case?

- Should the CPE Language be removed?

- What should we do with CPE Matching?

- Should we keep the URI?

 

-------------------------

 

By name CPE is an enumeration.  Probably the biggest question to be answered

is what are we enumerating?  CPE currently is about enumerating platform

types, but this has proved to be a very broad term, and CPE has struggled to

address what a platform type is.  More on this in a second.

 

Based on the research accomplished this past year regarding technical use

cases, one option would be for CPE to focus on the software inventory

technical use case.  This seems to be the single use case that is shared

across all members of the CPE Community.  By narrowing our focus, we can

hopefully deliver a solution that works for those users and not get bogged

down trying to support fringe cases. Agree?

 

The software inventory technical use case calls for enumerating platform

types based on the underlying software products (either operating systems or

applications).  What is a software product?  This could be defined using the

following characteristics:

 

* A user can download or buy it.

* There is a vendor/organization that produces it.

* An enterprise IT administrator can push it out over the enterprise network

and install it into their environment.

* It is (or can be) recorded by an asset management tool.

 

In other words, every CPE Name should have at its root a software product.

CPE would not try to name web pages, code libraries, functional types, etc.

These areas are still important and we as a community need to address them.

The suggestion however is to address them with their own enumerations and

enable CPE to focus on its core mission.  A movement toward multiple

enumerations brings to light the need for a good expression language to tie

everything together and make more complex statements.  This in a way relates

to the goal of the CPE Language.  The CPE Language is currently under-used

(if at all) and really goes against the idea of simplifying CPE.  Should

this be removed from CPE and stood up on its own or merged into an existing

initiative?  Thoughts?

 

As we address the questions above, CPE might need to evolve to meet the

technical challenges encountered and to try and solve the issues that have

been experienced in version 2. Some of the ideas that have been brought up

in the past:

 

- don't make any major changes, even with its issues CPE is working well

enough, focus on some minor tweaking

 

- the URI is a major problem as the terms used are not permanent (e.g.

product name changes) and are not consistent (e.g. 'windows_2003_server' and

'window_server_2003'), thus we should move away from the URI and switch to a

numerical id

 

- matching is the root of CPE's issues (e.g. the version component) and need

to either be removed or completely rewritten (can we leverage an ontology?)

 

- CPE should not try to be an enumeration but rather should be an expression

enabling a user to talk about some combination of vendor, product, and

version related to a target

 

What is your reaction to the ideas above?  Are there other ideas that need

to be considered?  Over the next few weeks I will be putting together a

proposal for where to go with CPE and what changes should be considered, but

I need your input to make sure that Version 3 is a long-term success.  I

thank you in advance for help you can provide on steering this ship.

 

Thanks

Drew

 

---------

 

Andrew Buttner

The MITRE Corporation

[hidden email]

781-271-3515

 

* Wolfkiel.Joseph.L.0514105171 <[hidden email]>

* Issuer: U.S. Government - Unverified


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Wolfkiel, Joseph
In reply to this post by Ernest Park-2
Responses in-line. ****
 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

 


From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, March 05, 2009 4:07 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision



On Thu, Mar 5, 2009 at 3:50 PM, Wolfkiel, Joseph <[hidden email]> wrote:
I'm somewhat bummed that I haven't seen any discussion on this issue.

I'll go ahead and try to link it to earlier discussions about ontologies.

Basically, I think the URI structure imposes unacceptable limits on our
ability to express the names we need.  I'd like to go to a tagged structure
based on a more informed understanding of what a CPE is.  At the end of the
day, I think CPE should be about names for installable software, legal
relationships between those names, and managing a body of community content
that can be used to derive the consensus name for any given product--so we
can build interoperable tools.

THe URI structure does provide a simple API - a known way of communication.

Agreed - it is not a database, and the one dimensional structure is hard to define the layered interdependancies. 
 
**** Good point.  I suppose a URI is just a transport format, and ---if we can do away with concepts like the "prefix property" and interpretation of unpopulated spaces to mean something other than "unpopulated"--- it's probably just as good as any other transport.  However, I find it a little distasteful since it violates the XML concept of self-documenting code, and it requires a tool to parse twice, once for the XML, twice to get data out of the URI. ****
 

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE
should break away from the URI structure and go to a tagged structure that
allows users to populate just the elements they want to communicate.  I'm
also thinking CPE should only address installable software inventories and
not try to differentiate between OS and applications.

It is a reasonable distinction - and any distinction allows further filtering of data. Whether everybody needs the "application" information or not, it can be output as a CPE 1.x or 2.x "string" as the result to a basic query. 

Once the string is satisfied, the additional metadata tags - open, proprietary, community developed - can also be interrogated, assuming a general output XML schema or database table schema is agreed. 

In this way, it is the overall schema with its complex relationships that defines CPE, not a limited, but valuable, string that gets us to narrow down the choices that resolve a more complex query. 
 
**** Okay, I'm not really hard over on that one.  It is a good filter for humans.  It just starts getting sticky when you consider that JRE serves as an OS, but runs on an OS and many similar relationships exist between installable plug-ins and their applications.  I just don't think there's much machine-reasoning that can be built into the distinction between OS and app.  I'm much more comfortable with adding a tag to describe target software architecture -- whether that be JRE, windows, OSX, etc. ****
 
 I don't think vendor
is a good base for products, particularly given the open source community
and the potential to have a single product distributed by multiple vendors.
I think product is a better base.  I'm also wondering if any type of
transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing
how a given vendor must use the text strings, titles, and other information
tracked in CPE, so I'm thinking it may not have any business being part of
the standard.

For OSS - vendor in reality needs to be the lowest common denominator, or can be a few. In my database, a single product can be joined to multiple vendors due to changes in publishing, development, etc, over the life. I can therefore "fix" dynamic and historically evolving data to be a flat string to serve the needs of CPE constrained reporting, while my database is aware of the complex relations that make up a name definition. This problem is not distinct to OSS, as acquisitions, bankruptcies, lawsuits, all change hte ownership and the provenance of software over its life. 

If a CPE string is a reporting mechanism, then it works. The database has to understand the complexity of the data, but hte string output can be simplified and human friendly.

In this way, by CPE normalized output can be read by another tool that can read CPE, and it can layer additional metadata into a three dimensional output.

I still think that the basic string elements are valid, as long as we  agree to synchronize the highest level of the database, then we all communicate using these strings. 

Additionally, these strings may not be unique, or for any given combination, there may be multiple different results. Understanding this, the query that builds the result set needs to contain additional test logic to further qualifiy the multiple result possibilities for the most likely match.
 
 **** Again, I don't disagree.  If we can agree to drop the "prefix property" and just note that the URI structure should hold vendor name in the first position, product name in the 2nd, ... then it's just a transport mechanism.  As it is now, with matching and all the added complexity the cpe URI is difficult to deal with.  I also agree that, in a user interface, giving the ability to sort products by the different vendors that have distributed them is a great capability.  But I don't think saying that having a relationship where "vendor" is a "distributed-by" relationship to product would prevent you from doing that.****
 

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE
might look.  I'm still trying to determine if there's a dependency between
version and update, but I'm almost completely sure there isn't a dependency
between edition and version, or between edition and update.  I'm also
thinking we may need to have subordinate elements of version that break out
major, minor, and sub-minor version information.

I also think deprecation should take place on a per-component basis.  Also
that there's no guarantee of uniqueness for the text names of CPE
components, so they should be assigned unique identifiers, which should be
the basis for managing deprecation.

Why deprecate at all? I maintain old names - all names, just in case older data is floating around. I can correct the query and I still have all the permutations. I don't care if I have ten different things that resolve to apache:server (made up example), I can drill through a smaller result set of ten records easier than 50,000 and more. 
 
**** I don't equate "deprecate" with "delete."  I would expect any database to maintain old, outdated names in a deprecated status so you can retain historical relationships.  I'm just not comfortable with "same as" as a way to deal with product names that have been changed as part of an acquisition or other process.  This assumes you may have multiple ways for users to select the same product name.**** 


Let me know what you think.  If we agree on this, we can put an "any" tag at
the end of the standard CPE Component element list and other standards can
expand on cpe core data by bringing in data elements that address function,
family, hash, etc.

That said, the CPE forum is meant to be consensus driven, so I'll bow to the
collective wisdom of contributors to the list.

Also attached an xml schema that implements the ontology as an XML language.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.
At that time the community wanted to see CPE go through a stabilization
period and have the community attempt to use the specification in order to
get a better feel for future direction.  The past year has seen a lot of
conversation within the community about possible direction with some
different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near
term we have a new minor release (2.2) scheduled to be official on March 11.
There is also a huge push currently ongoing to clean-up the Official CPE
Dictionary.  But where do we go after that?  There is a lot in this email,
for that I apologize.  Hopefully some of these points can sparks some
discussion as your views will help us better understand where CPE needs to
go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered
is what are we enumerating?  CPE currently is about enumerating platform
types, but this has proved to be a very broad term, and CPE has struggled to
address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use
cases, one option would be for CPE to focus on the software inventory
technical use case.  This seems to be the single use case that is shared
across all members of the CPE Community.  By narrowing our focus, we can
hopefully deliver a solution that works for those users and not get bogged
down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform
types based on the underlying software products (either operating systems or
applications).  What is a software product?  This could be defined using the
following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network
and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.
CPE would not try to name web pages, code libraries, functional types, etc.
These areas are still important and we as a community need to address them.
The suggestion however is to address them with their own enumerations and
enable CPE to focus on its core mission.  A movement toward multiple
enumerations brings to light the need for a good expression language to tie
everything together and make more complex statements.  This in a way relates
to the goal of the CPE Language.  The CPE Language is currently under-used
(if at all) and really goes against the idea of simplifying CPE.  Should
this be removed from CPE and stood up on its own or merged into an existing
initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the
technical challenges encountered and to try and solve the issues that have
been experienced in version 2. Some of the ideas that have been brought up
in the past:

- don't make any major changes, even with its issues CPE is working well
enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g.
product name changes) and are not consistent (e.g. 'windows_2003_server' and
'window_server_2003'), thus we should move away from the URI and switch to a
numerical id

- matching is the root of CPE's issues (e.g. the version component) and need
to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression
enabling a user to talk about some combination of vendor, product, and
version related to a target

What is your reaction to the ideas above?  Are there other ideas that need
to be considered?  Over the next few weeks I will be putting together a
proposal for where to go with CPE and what changes should be considered, but
I need your input to make sure that Version 3 is a long-term success.  I
thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Tim Keanini
In reply to this post by Wolfkiel, Joseph
If we are really talking about using an ontological approach, then I strongly recommend that we look at representing this domain in RDF/RDFS and maybe OWL although for our purposes RDFS might suffice.

If we are just talking about tagging and adding facets to the data, then XML Schema is all we need and I'm all for using the right tool for the job.  Let me make a bet that if we don't make this move now to RDF/RDFS/OWL, we will be kicking ourselves in a year or less.

Attached is your .xsd as represented in OWL-full.  Again, we don't need to use OWL-full, and the beauty is that we can use only enough OWL as is needed to model the domain.  If you have a tool like Protégé 4 or TopBraidComposer, you can open the .owl file I have attached.

So what?  What is so special about the owl versus the xsd representation?  
Once in RDFS or OWL, we would not only be able to assert RDF triples but also infer them.  
Inference is the force multiplier because anyone who thinks human are able to perform all the assertions to continuously model this complex and changing domain is fooling themselves.  Don't get me started here because let me just end with: we should be inferring vulnerabilities and higher order concepts, NOT asserting them.  

What is the unique value in an ontological representation such as RDF/RDFS/OWL?
1) RDF will finally allow us to model using a graph
2) RDFS afford us ontological modeling.  Features that allow us to manage type constraints, instance and class attributes, subclassof type propagation, binary and n-ary relationships, relation hierarchies, etc
3) Beyond RDFS, we may need these OWL features: disjoint-decomposition, cardinality constraints, binary functions, and we could use all or some of OWL on an as-needed basis.
4) when it comes time to tie it all together (CPE, CWE, CCE, etc) or just some of them, this type of federation is simple and easy to manage if we are modeled at the RDFS/OWL level.

I can go on and on about the benefits of using these higher level W3C standards for our purposes but I'll just leave it at that.  Forgive me for being so passionate about this topic but I probably have more scar tissue than most on this topic.

--tk

Timothy D. Keanini Sr., CTO    nCircle Network Security
Office: +1 (415) 625-5939
www.ncircle.com
blog.ncircle.com

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Thursday, March 05, 2009 2:50 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision

I'm somewhat bummed that I haven't seen any discussion on this issue.

I'll go ahead and try to link it to earlier discussions about ontologies.

Basically, I think the URI structure imposes unacceptable limits on our ability to express the names we need.  I'd like to go to a tagged structure based on a more informed understanding of what a CPE is.  At the end of the day, I think CPE should be about names for installable software, legal relationships between those names, and managing a body of community content that can be used to derive the consensus name for any given product--so we can build interoperable tools.

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE should break away from the URI structure and go to a tagged structure that allows users to populate just the elements they want to communicate.  I'm also thinking CPE should only address installable software inventories and not try to differentiate between OS and applications.  I don't think vendor is a good base for products, particularly given the open source community and the potential to have a single product distributed by multiple vendors.
I think product is a better base.  I'm also wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing how a given vendor must use the text strings, titles, and other information tracked in CPE, so I'm thinking it may not have any business being part of the standard.

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE might look.  I'm still trying to determine if there's a dependency between version and update, but I'm almost completely sure there isn't a dependency between edition and version, or between edition and update.  I'm also thinking we may need to have subordinate elements of version that break out major, minor, and sub-minor version information.

I also think deprecation should take place on a per-component basis.  Also that there's no guarantee of uniqueness for the text names of CPE components, so they should be assigned unique identifiers, which should be the basis for managing deprecation.

Let me know what you think.  If we agree on this, we can put an "any" tag at the end of the standard CPE Component element list and other standards can expand on cpe core data by bringing in data elements that address function, family, hash, etc.

That said, the CPE forum is meant to be consensus driven, so I'll bow to the collective wisdom of contributors to the list.

Also attached an xml schema that implements the ontology as an XML language.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700

-----Original Message-----
From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.
At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction.  The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near term we have a new minor release (2.2) scheduled to be official on March 11.
There is also a huge push currently ongoing to clean-up the Official CPE Dictionary.  But where do we go after that?  There is a lot in this email, for that I apologize.  Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered is what are we enumerating?  CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case.  This seems to be the single use case that is shared across all members of the CPE Community.  By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications).  What is a software product?  This could be defined using the following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.
CPE would not try to name web pages, code libraries, functional types, etc.
These areas are still important and we as a community need to address them.
The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission.  A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements.  This in a way relates to the goal of the CPE Language.  The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE.  Should this be removed from CPE and stood up on its own or merged into an existing initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past:

- don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g.
product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id

- matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target

What is your reaction to the ideas above?  Are there other ideas that need to be considered?  Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success.  I thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515

CPE-core.owl (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Wolfkiel, Joseph
I'd like some feedback on this.  

In general, I consider the ontology discussion a human planning stage to
determine exactly what we want to represent in XML, JSON, CSV, URI, or
whatever transport we want to use.  UML Class diagrams or E-R diagrams are
great ways of representing ontological relationships that humans can
understand, debate, and agree on, then easily implementable in XML schema or
Relational Databases.

OWL and RDF are intended (in my understanding-and consistent with your
explanation) to support machine understanding of data relationships in an
attempt to allow machines to "reason" and make "inferences" with the data.
I'm under the impression that most of the vendors and consumers in the CPE
market space aren't planning to do anything with the ontological discussions
we're having other than to ensure the data structures they/we build are able
to represent the data properly (i.e. by developing appropriate XML schemas
or database table structures).  That's the intent of my internal developers.

However, if there is a large demand in the CPE community to express CPE
ontological "knowledge" in RDF and OWL so it is machine-consumable, then by
all means let's go there.

Of course, not being a machine, I'll still want to see it in UML class
diagrams or E-R diagrams.

The request I'd like to make then, is: "If you're a vendor or consumer of
CPE data and you plan to, or would like to use machine-consumable
ontological data in the form of RDF/RDFS/OWL please share that information
with the list."

Of course, if I don't understand the use/value of the RDF/RDFS/OWL or the
way different vendors/consumers would use it, I would like to have that
information too.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Tim Keanini [mailto:[hidden email]]
Sent: Thursday, March 05, 2009 6:21 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision

If we are really talking about using an ontological approach, then I
strongly recommend that we look at representing this domain in RDF/RDFS and
maybe OWL although for our purposes RDFS might suffice.

If we are just talking about tagging and adding facets to the data, then XML
Schema is all we need and I'm all for using the right tool for the job.  Let
me make a bet that if we don't make this move now to RDF/RDFS/OWL, we will
be kicking ourselves in a year or less.

Attached is your .xsd as represented in OWL-full.  Again, we don't need to
use OWL-full, and the beauty is that we can use only enough OWL as is needed
to model the domain.  If you have a tool like Protégé 4 or TopBraidComposer,
you can open the .owl file I have attached.

So what?  What is so special about the owl versus the xsd representation?  
Once in RDFS or OWL, we would not only be able to assert RDF triples but
also infer them.  
Inference is the force multiplier because anyone who thinks human are able
to perform all the assertions to continuously model this complex and
changing domain is fooling themselves.  Don't get me started here because
let me just end with: we should be inferring vulnerabilities and higher
order concepts, NOT asserting them.  

What is the unique value in an ontological representation such as
RDF/RDFS/OWL?
1) RDF will finally allow us to model using a graph
2) RDFS afford us ontological modeling.  Features that allow us to manage
type constraints, instance and class attributes, subclassof type
propagation, binary and n-ary relationships, relation hierarchies, etc
3) Beyond RDFS, we may need these OWL features: disjoint-decomposition,
cardinality constraints, binary functions, and we could use all or some of
OWL on an as-needed basis.
4) when it comes time to tie it all together (CPE, CWE, CCE, etc) or just
some of them, this type of federation is simple and easy to manage if we are
modeled at the RDFS/OWL level.

I can go on and on about the benefits of using these higher level W3C
standards for our purposes but I'll just leave it at that.  Forgive me for
being so passionate about this topic but I probably have more scar tissue
than most on this topic.

--tk

Timothy D. Keanini Sr., CTO    nCircle Network Security
Office: +1 (415) 625-5939
www.ncircle.com
blog.ncircle.com

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Thursday, March 05, 2009 2:50 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision

I'm somewhat bummed that I haven't seen any discussion on this issue.

I'll go ahead and try to link it to earlier discussions about ontologies.

Basically, I think the URI structure imposes unacceptable limits on our
ability to express the names we need.  I'd like to go to a tagged structure
based on a more informed understanding of what a CPE is.  At the end of the
day, I think CPE should be about names for installable software, legal
relationships between those names, and managing a body of community content
that can be used to derive the consensus name for any given product--so we
can build interoperable tools.

Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE
should break away from the URI structure and go to a tagged structure that
allows users to populate just the elements they want to communicate.  I'm
also thinking CPE should only address installable software inventories and
not try to differentiate between OS and applications.  I don't think vendor
is a good base for products, particularly given the open source community
and the potential to have a single product distributed by multiple vendors.
I think product is a better base.  I'm also wondering if any type of
transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing
how a given vendor must use the text strings, titles, and other information
tracked in CPE, so I'm thinking it may not have any business being part of
the standard.

I'm attaching a UML diagram of how I'm thinking an ontology for single CPE
might look.  I'm still trying to determine if there's a dependency between
version and update, but I'm almost completely sure there isn't a dependency
between edition and version, or between edition and update.  I'm also
thinking we may need to have subordinate elements of version that break out
major, minor, and sub-minor version information.

I also think deprecation should take place on a per-component basis.  Also
that there's no guarantee of uniqueness for the text names of CPE
components, so they should be assigned unique identifiers, which should be
the basis for managing deprecation.

Let me know what you think.  If we agree on this, we can put an "any" tag at
the end of the standard CPE Component element list and other standards can
expand on cpe core data by bringing in data elements that address function,
family, hash, etc.

That said, the CPE forum is meant to be consensus driven, so I'll bow to the
collective wisdom of contributors to the list.

Also attached an xml schema that implements the ontology as an XML language.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial
410-854-5401 DSN 244-5401 Fax 410-854-6700

-----Original Message-----
From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE Future Vision

All,

Version 2.0 of the CPE Specification was released on September 14th, 2008.
At that time the community wanted to see CPE go through a stabilization
period and have the community attempt to use the specification in order to
get a better feel for future direction.  The past year has seen a lot of
conversation within the community about possible direction with some
different ideas about the best future path to take.

I wanted to start discussion on the future vision for CPE.  In the very near
term we have a new minor release (2.2) scheduled to be official on March 11.
There is also a huge push currently ongoing to clean-up the Official CPE
Dictionary.  But where do we go after that?  There is a lot in this email,
for that I apologize.  Hopefully some of these points can sparks some
discussion as your views will help us better understand where CPE needs to
go.

Questions below:

- What are we enumerating?
- Is software inventory THE target technical use case?
- Should the CPE Language be removed?
- What should we do with CPE Matching?
- Should we keep the URI?

-------------------------

By name CPE is an enumeration.  Probably the biggest question to be answered
is what are we enumerating?  CPE currently is about enumerating platform
types, but this has proved to be a very broad term, and CPE has struggled to
address what a platform type is.  More on this in a second.

Based on the research accomplished this past year regarding technical use
cases, one option would be for CPE to focus on the software inventory
technical use case.  This seems to be the single use case that is shared
across all members of the CPE Community.  By narrowing our focus, we can
hopefully deliver a solution that works for those users and not get bogged
down trying to support fringe cases. Agree?

The software inventory technical use case calls for enumerating platform
types based on the underlying software products (either operating systems or
applications).  What is a software product?  This could be defined using the
following characteristics:

* A user can download or buy it.
* There is a vendor/organization that produces it.
* An enterprise IT administrator can push it out over the enterprise network
and install it into their environment.
* It is (or can be) recorded by an asset management tool.

In other words, every CPE Name should have at its root a software product.
CPE would not try to name web pages, code libraries, functional types, etc.
These areas are still important and we as a community need to address them.
The suggestion however is to address them with their own enumerations and
enable CPE to focus on its core mission.  A movement toward multiple
enumerations brings to light the need for a good expression language to tie
everything together and make more complex statements.  This in a way relates
to the goal of the CPE Language.  The CPE Language is currently under-used
(if at all) and really goes against the idea of simplifying CPE.  Should
this be removed from CPE and stood up on its own or merged into an existing
initiative?  Thoughts?

As we address the questions above, CPE might need to evolve to meet the
technical challenges encountered and to try and solve the issues that have
been experienced in version 2. Some of the ideas that have been brought up
in the past:

- don't make any major changes, even with its issues CPE is working well
enough, focus on some minor tweaking

- the URI is a major problem as the terms used are not permanent (e.g.
product name changes) and are not consistent (e.g. 'windows_2003_server' and
'window_server_2003'), thus we should move away from the URI and switch to a
numerical id

- matching is the root of CPE's issues (e.g. the version component) and need
to either be removed or completely rewritten (can we leverage an ontology?)

- CPE should not try to be an enumeration but rather should be an expression
enabling a user to talk about some combination of vendor, product, and
version related to a target

What is your reaction to the ideas above?  Are there other ideas that need
to be considered?  Over the next few weeks I will be putting together a
proposal for where to go with CPE and what changes should be considered, but
I need your input to make sure that Version 3 is a long-term success.  I
thank you in advance for help you can provide on steering this ship.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Booth, Harold
I apologize for the length of this response but please bear with me.

I feel that perhaps a discussion regarding what CPE 3.0 would look like
without a full discussion of the use cases we wish to support as a community
is perhaps premature.  Drew's initial message on this thread touched on what
should be the primary technical use cases and I would like to add my
thoughts on this topic.

The first use case is the ability to unambiguously identify a product.  In
the most commmon context this would be the ability to identify a software
product which is installed on a system.  Or to use this unambiguous
identification as a  means to communicate about this product either between
a human and a machine or between two machines.  To be clear, by unambiguous
identification I mean that the name I provide cannot be used to identify
another product.  I don't think it matters whether multiple names could be
used to unambiguously identify a product I believe it only matters that each
name identifies that product and that product alone.  If we wish, we can
allow co-existence of multiple names, mark one name as preferred, or mark
all but one name as deprecated. I think it does not matter, as long as every
one ends up with the same semantic product, the rest is syntax.  I believe
this use case is equivalent to the software inventory technical use case
identified in Drew's message.

The second use case is the ability to associate product information with
other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).  The type
of product information we wish to associate with other entities is not
necessarily limited in scope.  By this I mean that the specification should
not attempt to bound what product information (domains) can be used, or even
what the range of values for a domain can be.  A specification could (and
probably should) provide a means for software to discover what the current
domains and ranges are.  Examples of the product information would include
the data currently encoded in the components of the current cpe
specification (part, vendor, product, version, update, edition, and
language), but would be expanded to include other areas of interest.
Additional examples of product information would be category type (product
"is a") information like "is a" webserver, "is a" database, or "is an"
operating system.  For products which bundle or include other software (like
operating systems) what are the products distributed with the primary
product.  Finally with the respect to versions and updates, the ability to
make a statement about version/update ranges of software.  The types of
statements that could be made would be:
   This checklist applies to all webservers
   This vulnerability applies to FooProd versions 3.4 to 5.2
   The product Bar 3.3 distributes Foo 2.2.1
   This vulnerability applies to all versions of BarProd prior to 5.4.5
   This vulnerability applies to any software which distributes Foo 2.24
   This vulnerability applies to Foo 1.4 running on Bar 3.5 and Bar 4.2
update 4

A third use case joins the first and second use cases.  Assume a set of
entities have been tagged with product information and then given an
unambiguous product name to determine what entities from the tagged set are
applicable to the given product.

A fourth use case would be something like product discovery.  Some pieces of
information about a software have been determined but not enough to make a
conclusive identification.  This information is processed against the
current set of product information and a set of matching products are
identified perhaps with the goal to conduct further tests to make a positive
identification.

I can also see the third and fourth use cases combined to return all of the
possible entities which apply given some set of product information.

These are just the use cases I am currently dealing with and there are
others I have seen mentioned on this list that I have not touched on.

To respond to Drew's comment about the CPE Language, the NVD is using it
along with the matching algorithm to associate CPEs to CVEs as a means to
determine applicability in order to satisfy the second and third use cases I
identified above.  An issue we have experienced, though, is that the names
of the products must be normalized correctly in order for the matching
algorithm to work.  I believe decoupling the matching algorithm from the
name of the product would alleviate the pain of coming up with a normalized
name and allow for a wider range of matching possibilities to better satisfy
the second use case.

As for a solution to all of my use cases I tend to believe at the moment
that a CPE with a numerical ID with matching handled through an ontological
solution (RDFS/OWL) would be the best way forward, but I say that with only
my use cases in mind.

At NIST we have been exploring creating an onotology for CPE data that would
allow us to better support the use cases described above.  Our current
modeling efforts have been focused on RDFS because we have not yet found the
need for the additional power (and overhead) of OWL.

Based on our current efforts we feel that the first use case can be
trivially satisifed without the use of any technological solution by the use
of a plain identifier (i.e. CPE-0001).  An ontological model could just use
this identifier as part of a URI uniquely identifying the product.  The URI
in the ontological model would not be the CPE name.

The compositional nature of RDF/RDFS/OWL provides the capabilities to
satisfy the second use case of associating product information with other
entities.  These entities can be modeled outside the scope of CPE, but a CPE
onotoloy would describe how to make these associations.  This would allow us
to focus all of our attention on our problem domain of capturing all
information related to a unique product while allowing others to focus on
their domains, such as vulnerabilities and checklists.

As TK has mentioned in a previous message, the open world assumption of
RDF/RDFS allows for the expansion of the model in a way that does not change
the underlying schema of the data, since all data is represented in RDF
triples.

The three main benefits we see with using an ontological model are:

1. The inferencing capabilities which would allow us to correctly return
that Bar 3.4 is affected by Vulnerability CVE-2010-0003 given:
Bar 3.4 distributes Foo 2.24
Vulnerability CVE-2010-0003 applies to any software which distributes Foo
2.24

2. The schemaless design allowing for the addition of new relationships over
time without necessitating database schema changes.

3. Ability to query multiple data sets across a federated network since all
data would be represented in the RDF triple format.

These are our current thoughts regarding CPE, we would appreciate any
comments or suggestions.

Thanks,

-Harold
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Tim Keanini
In reply to this post by Andrew Buttner

>>>>>>>>>>>>>>>>> 

>>In general, I consider the ontology discussion a human planning stage to determine exactly what we want

>>to represent in XML, JSON, CSV, URI, or whatever transport we want to use.  UML Class diagrams or E-R

>>diagrams are great ways of representing ontological relationships that humans can understand, debate,

>>and agree on, then easily implementable in XML schema or Relational Databases.

>>OWL and RDF are intended (in my understanding-and consistent with your

>>explanation) to support machine understanding of data relationships in an attempt to allow machines to

>>"reason" and make "inferences" with the data.

>>I'm under the impression that most of the vendors and consumers in the CPE market space aren't planning

>>to do anything with the ontological discussions we're having other than to ensure the data structures

>>they/we build are able to represent the data properly (i.e. by developing appropriate XML schemas or

>>database table structures).  That's the intent of my internal developers.

>>>>>>>>>>>>>>>>> 

 

Unfortunately the terms ontology is “semantically” overloaded and this clarification is helpful. I understand now your ‘human planning stage’ perspective so thank you. 

 

Our design principle at this standards level is to ensure as late of a binding as possible.  The attractiveness to a core representation in RDF/RDFS/OWL is that source-material allow for “as late” a transformation (late-binding) to a form that is appropriate for consumption/production by humans and/or machines. 

 

To your impressions with vendors and consumers in this CPE market space: the unfortunate situation is that customers care about multi-vendor interoperability much more than vendors yet they have no means to articulate this in a technical manner; vendors are biased toward lock-in and technical differentiation and for the most part have not seen multi-vendor interoperability as a priority for which to commit resources.  Sure, you can say that customers can make their demands with their wallets but remember, vendors are driven by market dynamics and not individual opportunities – at the very least, it is a slow and painful process.  This is a long way of saying that waiting for vendors to lead this process is going to be painful and translating consumers needs to RDF/RDFS/Owl is nowhere near obvious.

 

>>>>>>>>>>>>>>>>> 

>>However, if there is a large demand in the CPE community to express CPE ontological "knowledge" in RDF >and OWL so it is machine-consumable, then by all means let's go there.

>>Of course, not being a machine, I'll still want to see it in UML class diagrams or E-R diagrams.

>>>>>>>>>>>>>>>>> 

 

The awkwardness of this discussion is in the fact that at the surface, CPE is all about producing a name or a unique identifier for the thing named.  Just give me the damn name and let me be on my way.  J

Through all the fits and start of CPE, I think we can all agree that this is necessary but not sufficient. 

Not only do we need globally unique identifiers for the things being named, but we also need the same formalism for the relationships between the things being named.  The ‘R’ in RDF is for resource before we go about inventing another way to describe resources, we must consider RDF (and thus RDFS and OWL).

 

Please note that this discussion is about the content capabilities of the formalism we choose (XML, RDF, RDFS and OWL).  Even after we choose this, there is still hard work to be done. (ie. we can still screw up)

I’m asking that we move up the semantic stack slowly and carefully so that we can have more firepower to deal with problems we face today and ones that we will face in the future.

 

The requirement is that it be in a form that facilitates both human and computers; we cannot compromise one for the other.  My personal feeling is that if we make this investment now, we will be creating much more capabilities-cases for the future.  But to do this, we need to lead.

 

>>>>>>>>>>>>>>>>> 

>>The request I'd like to make then, is: "If you're a vendor or consumer of CPE data and you plan to, or >would like to use machine-consumable ontological data in the form of RDF/RDFS/OWL please share that >information with the list."

>>Of course, if I don't understand the use/value of the RDF/RDFS/OWL or the way different >vendors/consumers would use it, I would like to have that information too.

>>>>>>>>>>>>>>>>> 

 

To ask the value-prop question of a consumer of RDF/RDFS/OWL is going to be hard to understand unless the explanation includes the entire value-chain.  I take a shot at this from a few perspectives.

 

To the market (many consumers) in general, value can materialize as:

·         Multi-vendor interoperability beyond that of syntax. 

·         Richer Service Oriented Descriptors

To the content author, value can materialize as:

·         More precision in modeling the objects AND their relationships

·         Less dependency on a _single_ authoritative structure (federation is build-in)

To the vendor community, it can materialize as:

·         Multi-vendor interoperability means less friction for consolidation (and lord knows we need to consolidate!)

·         More innovation with less of a barrier to entry && new markets

 

I’ve said too much already so I’ll stop.

Thanks for the discussion. 

 

--tk

Timothy D. Keanini Sr., CTO    nCircle Network Security
Office: +1 (415) 625-5939
www.ncircle.com

blog.ncircle.com

 

Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Andrew Buttner
Administrator
In reply to this post by Booth, Harold
>-----Original Message-----
>From: Harold Booth [mailto:[hidden email]]
>Sent: Friday, March 06, 2009 10:18 AM


>The first use case is the ability to unambiguously identify a product.
>In the most common context this would be the ability to identify a
>software product which is installed on a system.  Or to use this
>unambiguous identification as a  means to communicate about this
>product either between a human and a machine or between two machines.
>To be clear, by unambiguous identification I mean that the name I
>provide cannot be used to identify another product.

Right on.  I think this aligns with what we have heard and have termed the software inventory technical use case.  In other words ... a product vendor uses CPE Names to tag data elements within their product's data model.




>The second use case is the ability to associate product information with
>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).

I agree that this is a valid and needed use case within the community, but is this outside the scope of CPE?  Should this be left up to some other initiative then enables a user to leverage enumerated entities and build a complex description of something?  Should CPE focus on the task of enumerating the different software products and avoid the task of associating those entities with other enumerations?  For example, the statement "Windows XP has vulnerability CVE-1234" is not something that CPE should maintain, right?




>A third use case joins the first and second use cases.  Assume a set of
>entities have been tagged with product information and then given an
>unambiguous product name to determine what entities from the tagged set
>are applicable to the given product.

This is the matching use case, right?  In other words, given two CPE Names, we want to see if one represents a software product that is a subset of the other.  Do others agree that this is something that CPE want to continue to support?  Maybe the better question is how should CPE support this?  Our discussion on ontologies is a possible direction for CPE in this area.




>A fourth use case would be something like product discovery.  Some
>pieces of information about a software have been determined but not
>enough to make a conclusive identification.  This information is
>processed against the current set of product information and a set of
>matching products are identified perhaps with the goal to conduct
>further tests to make a positive identification.

Again, our ontology discussion could help here.  I think from the rest of your email that you agree with this.





>As for a solution to all of my use cases I tend to believe at the moment
>that a CPE with a numerical ID with matching handled through an
>ontological solution (RDFS/OWL) would be the best way forward, but I say
>that with only my use cases in mind.

I'm interested to hear counters to this idea.  Especially the move away from the URI and toward a numerical id.  Does pushing the information needed for matching into RDF make things easier or harder for users of CPE?  I will try to explore this concept some more and see if it can help solve some of the current unresolved issues with CPE.


Thanks
Drew
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Ernest Park-2


On Mon, Mar 9, 2009 at 8:20 AM, Buttner, Drew <[hidden email]> wrote:
>-----Original Message-----
>From: Harold Booth [mailto:[hidden email]]
>Sent: Friday, March 06, 2009 10:18 AM


>The first use case is the ability to unambiguously identify a product.
>In the most common context this would be the ability to identify a
>software product which is installed on a system.  Or to use this
>unambiguous identification as a  means to communicate about this
>product either between a human and a machine or between two machines.
>To be clear, by unambiguous identification I mean that the name I
>provide cannot be used to identify another product.

Right on.  I think this aligns with what we have heard and have termed the software inventory technical use case.  In other words ... a product vendor uses CPE Names to tag data elements within their product's data model.

I believe that this cannot happen in a name. A single product is an amalgam of  other products. With the prevalence of open source, this is increasing rapidly. There are solutions that will identify and inventory what is discovered within an otherwise unknown set of software.

The trick is to have output formatted in a way that is useful to other applications. CPE output would allow additional reporting and policy creation with access only to the data output in CPE format.






>The second use case is the ability to associate product information with
>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).

I agree that this is a valid and needed use case within the community, but is this outside the scope of CPE?  Should this be left up to some other initiative then enables a user to leverage enumerated entities and build a complex description of something?  Should CPE focus on the task of enumerating the different software products and avoid the task of associating those entities with other enumerations?  For example, the statement "Windows XP has vulnerability CVE-1234" is not something that CPE should maintain, right?

What??? CPE is one of the main data sources for SCAP and it is the one unifying element around which the other data elements tie together. Without reliable names associated to additional information - OVAL, CCE, CVE, XCCDF, etc, then SCAP is just a wish list.
 



>A third use case joins the first and second use cases.  Assume a set of
>entities have been tagged with product information and then given an
>unambiguous product name to determine what entities from the tagged set
>are applicable to the given product.

This is the matching use case, right?  In other words, given two CPE Names, we want to see if one represents a software product that is a subset of the other.  Do others agree that this is something that CPE want to continue to support?  Maybe the better question is how should CPE support this?  Our discussion on ontologies is a possible direction for CPE in this area.




>A fourth use case would be something like product discovery.  Some
>pieces of information about a software have been determined but not
>enough to make a conclusive identification.  This information is
>processed against the current set of product information and a set of
>matching products are identified perhaps with the goal to conduct
>further tests to make a positive identification.

Again, our ontology discussion could help here.  I think from the rest of your email that you agree with this.





>As for a solution to all of my use cases I tend to believe at the moment
>that a CPE with a numerical ID with matching handled through an
>ontological solution (RDFS/OWL) would be the best way forward, but I say
>that with only my use cases in mind.

I'm interested to hear counters to this idea.  Especially the move away from the URI and toward a numerical id.  Does pushing the information needed for matching into RDF make things easier or harder for users of CPE?  I will try to explore this concept some more and see if it can help solve some of the current unresolved issues with CPE.


Thanks
Drew

Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Andrew Buttner
Administrator
In reply to this post by Tim Keanini
Response to the following below ...



>The awkwardness of this discussion is in the fact that at the surface,
>CPE is all about producing a name or a unique identifier for the thing
>named.  Just give me the damn name and let me be on my way.  J
>
>Through all the fits and start of CPE, I think we can all agree that
>this is necessary but not sufficient.
>
>Not only do we need globally unique identifiers for the things being
>named, but we also need the same formalism for the relationships between
>the things being named.  The 'R' in RDF is for resource before we go
>about inventing another way to describe resources, we must consider RDF
>(and thus RDFS and OWL).
>
>Please note that this discussion is about the content capabilities of
>the formalism we choose (XML, RDF, RDFS and OWL).  Even after we choose
>this, there is still hard work to be done. (ie. we can still screw up)
>
>I'm asking that we move up the semantic stack slowly and carefully so
>that we can have more firepower to deal with problems we face today and
>ones that we will face in the future.
>
>The requirement is that it be in a form that facilitates both human and
>computers; we cannot compromise one for the other.  My personal feeling
>is that if we make this investment now, we will be creating much more
>capabilities-cases for the future.  But to do this, we need to lead.



I think the above very well sums up the current state of CPE.  Basically we have realized that even considering one shared use case that we need more than just a common name, and we need more than what the current matching algorithm can provide.

We have on the table the from following idea to consider:

- create vocabularies (numerical based) to enumerate the 'things'
- use RDF Schema to describe the types of info associated with 'things'
- use RDF instance files to represent actual info about the 'things'
- create different views (UML, etc) into the 'things' to support users

We need to hear from others in the community if they think this move away from a URI might have detrimental effect on their use of CPE and why.  Or might this move only be a positive impact on all of our work?

In the meantime, I will try to put together a more thorough example of the RDF idea so that everyone can take a closer look at how it might be used.

Thanks
Drew
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Andrew Buttner
Administrator
In reply to this post by Ernest Park-2
>>>The second use case is the ability to associate product information with
>>>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).

>>I agree that this is a valid and needed use case within the community,
>>but is this outside the scope of CPE?  Should this be left up to some
>>other initiative then enables a user to leverage enumerated entities
>>and build a complex description of something?  Should CPE focus on the
>>task of enumerating the different software products and avoid the task
>>of associating those entities with other enumerations?  For example,
>>the statement "Windows XP has vulnerability CVE-1234" is not something
>>that CPE should maintain, right?

>What??? CPE is one of the main data sources for SCAP and it is the one
>unifying element around which the other data elements tie together.
>Without reliable names associated to additional information - OVAL,
>CCE, CVE, XCCDF, etc, then SCAP is just a wish list.


Sorry, what I meant as that CPE shouldn't try to create all the different associations.  CPE SHOULD make the associations possible by correctly enumerating the platform (or software product) space.  CPE needs to focus on the enumeration and let another initiative stand up the cross associations between all the different standards.  Agree?

Thanks
Drew
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Tim Keanini
In reply to this post by Andrew Buttner
[Harold Booth]
>>The second use case is the ability to associate product information
with
>>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).

[Drew wrote this in response to Harold Booth's post:]
>I agree that this is a valid and needed use case within the community,
>but is this outside the scope of CPE?  Should this be left up to some
>other initiative then enables a user to leverage enumerated entities
>and build a complex description of something?  Should CPE focus on the
>task of enumerating the different software products and avoid the task
>of associating those entities with other enumerations?  For example,
>the statement "Windows XP has vulnerability CVE-1234" is not something
>that CPE should maintain, right?

This content interoperability goal must be not only valid for technology

external to SCAP but also within SCAP.  We cannot afford to be building
more and more silos without the means to put humpty dumpty back together
again.
This is why choosing the presentation layer is so important.  For
instance,
the choice of XML Schema allowed us to not have to be concerned with
serialization and data-types; CPE did not solve this, XML Schema did.
We are now at the point where our needs have gone beyond that which
XML Schema can provide and instead of inventing this on our own, RDF is
our strongest candidate for the job. (and one of the ways to encapsulate
RDF is XML)

While the statement "Windows XP has vulnerability CVE-1234" is not
appropriate within the domain ontology of CPE, these RDF statements are
valid:

cpe:Windows_XP rdf:type  rdfs:Class .
cpe:Operating_System rdf:type rdfs:Class .
cpe:Windows_XP rdfs:subClassOf cpe:Operating_System .

This might not be how we would choose to model it but this is an example
of
'type propagation' because now if we asserted the RDF triple:
cpe:WinXP-SP2 rdf:type cpe:Windows_XP .
we could then infer the RDF triple:
cpe:WinXP-SP2 rdf:type cpe:Operating_System .

Now in the CVE domain ontology, you can make statements that tie
vulnerabilities to applications or shared libraries or the direct
object of that weakness.  CWE's domain ontology can then establish
relationships with CVE's and so on and so on.  

I'm not trying to shove this W3C technology down anyones throat.  Given
that the SCAP community had no problem choosing W3C's XML Schema, it
would
only make sense that we co-evolve with them.   To this degree, the
subject
line should really read: SCAP Future Vision. :-)

--tk
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Booth, Harold
In reply to this post by Andrew Buttner
Responses are inline.

> -----Original Message-----
> From: Buttner, Drew [mailto:[hidden email]]
> Sent: Monday, March 09, 2009 8:21 AM
> To: [hidden email]
> Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision
>
> >-----Original Message-----
> >From: Harold Booth [mailto:[hidden email]]
> >Sent: Friday, March 06, 2009 10:18 AM

HB) The first use case is the ability to unambiguously identify a product.
HB) In the most common context this would be the ability to identify a
HB) software product which is installed on a system.  Or to use this
HB) unambiguous identification as a  means to communicate about this
HB) product either between a human and a machine or between two machines.
HB) To be clear, by unambiguous identification I mean that the name I
HB) provide cannot be used to identify another product.

DREW)Right on.  I think this aligns with what we have heard and
DREW)have termed the software inventory technical use case.  In
DREW)other words ... a product vendor uses CPE Names to tag data
DREW)elements within their product's data model.

What do you mean by "tag data elements within their product's data model?"
I am not sure what you are saying here, could you please explain what you
mean further?

HB)The second use case is the ability to associate product information
HB)with other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...).

DREW)I agree that this is a valid and needed use case within the
DREW)community, but is this outside the scope of CPE?  Should this
DREW)be left up to some other initiative then enables a user to
DREW)leverage enumerated entities and build a complex description
DREW)of something?  Should CPE focus on the task of enumerating
DREW)the different software products and avoid the task of
DREW)associating those entities with other enumerations?  For
DREW)example, the statement "Windows XP has vulnerability
DREW)CVE-1234" is not something that CPE should maintain, right?

I would agree that CPE should not maintain information about what entities
relate to what CPEs.  What I do expect from CPE, or a related specification,
is an agreed upon way to describe a relationship between a CPE and another
entity.  It would be advantageous for the description of this relationship
to be expressed using a common model that can be shared across all entity
types.  This is one half of the matching use case.  Information is
associated with an entity and that information is later used to retrieve the
entity based upon the matching scheme.  A standardized means of representing
this data would facilitate sharing of these relationships amongst the
community without resorting to one-off or proprietary solutions.

HB)A third use case joins the first and second use cases.  Assume a set of
HB)entities have been tagged with product information and then given an
HB)unambiguous product name to determine what entities from the tagged set
HB)are applicable to the given product.

DREW)This is the matching use case, right?  In other words, given
DREW)two CPE Names, we want to see if one represents a software
DREW)product that is a subset of the other.  Do others agree that
DREW)this is something that CPE want to continue to support?  
DREW)Maybe the better question is how should CPE support this?  
DREW)Our discussion on ontologies is a possible direction for CPE
DREW)in this area.

Your scope for the matching use case is too narrow here.  I am not just
interested in whether a product is a subset of another.  I am interested in
querying the associations between CPE data and other entity types.  The
entities could also include other CPEs.  So what would this look like?

Assume there is a repository of CPE data that contains meta-data about CPE
names.
Assume also that I have a collection of other entity types, in this case,
CVE, CCE, and XCCDF which have defined relationships to CPE data.

Here are the questions I need to answer:

Given a CPE Name what Vulnerabilities, Configurations or Checklists apply to
this CPE name?
Given some product meta-data what CPEs match?
Given some product meta-data what Vulnerabilities, Configurations or
Checklists apply?
Given a CPE name what other CPE names distribute this product?
Given a CPE name what other CPE names does this product distribute?

And on and on... The only bounds for a query is based upon what data is
available.

To reiterate from my previous post, product meta-data could include the
basic information of vendor, product, version, update, edition, and
language, but could also include product md5/sha hashes, categorizations,
skus, target platform, etc...  Basically anything that describes a product
in some way.
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Booth, Harold
In reply to this post by Tim Keanini
To summarize my comments in this thread I would like the following from CPE
or one or more other related specifications:

1.  An unambiguous name to communicate about a product.
2.  A standardized way to associate product meta-data information to the
unambiguous name from 1.
3.  A standardized way to associate meta-data information or unambiguous
names with entities (i.e. CVE, CCE, Checklists/XCCDF, CPEs, etc...).
4.  A standardized way to query against the product meta-data specified in 2
and 3.

The current CPE standard attempts (either explicitly or implicitly) to
address all of these concerns in one way or the other.  The points above are
addressed in the current CPE specification respectively:

1. The existence of a CPE name, unfortunately it is not always unambiguous.
2. The CPE name encodes some product information into to it, but it is not
easily extended.  Associating additional information to the product is
possible, but there is no standardized way to share it.
3. Done through the use of a CPE Name or CPE Language.  No standardized way
to use any additional information not already included in a CPE name.
4. This use case is handled on an ad-hoc basis through CPE matching and the
CPE language, but no standardized way to perform this exists.  Currently
queries are limited in scope to only information contained within the name.
Reply | Threaded
Open this post in threaded view
|

Re: CPE Future Vision

Tim Keanini
I would also add that the current URI scheme is problematic when used in
RDF.
I can go into more details but the summary is that the %-encoding
introduces a
dangerous level of ambiguity for RDF libraries; and what was meant to be
human
readable string becomes very distorted.  Example:
cpe:/a:apache:http_server:1.3.30
becomes
<cpe:/a%3Aapache%3Ahttp_server%3A1.3.30>

--tk

-----Original Message-----
From: Harold Booth [mailto:[hidden email]]
Sent: Monday, March 09, 2009 1:25 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision

To summarize my comments in this thread I would like the following from
CPE
or one or more other related specifications:

1.  An unambiguous name to communicate about a product.
2.  A standardized way to associate product meta-data information to the
unambiguous name from 1.
3.  A standardized way to associate meta-data information or unambiguous
names with entities (i.e. CVE, CCE, Checklists/XCCDF, CPEs, etc...).
4.  A standardized way to query against the product meta-data specified
in 2
and 3.

The current CPE standard attempts (either explicitly or implicitly) to
address all of these concerns in one way or the other.  The points above
are
addressed in the current CPE specification respectively:

1. The existence of a CPE name, unfortunately it is not always
unambiguous.
2. The CPE name encodes some product information into to it, but it is
not
easily extended.  Associating additional information to the product is
possible, but there is no standardized way to share it.
3. Done through the use of a CPE Name or CPE Language.  No standardized
way
to use any additional information not already included in a CPE name.
4. This use case is handled on an ad-hoc basis through CPE matching and
the
CPE language, but no standardized way to perform this exists.  Currently
queries are limited in scope to only information contained within the
name.