Naming spec issue: syntax of formatted string binding

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Naming spec issue: syntax of formatted string binding

Brant Cheikes

There was some discussion at yesterday’s CPE developer day session regarding whether or not the new CPE name binding introduced in the Naming specification should be defined as a “formatted string” (as it is at present, cf. Section 6.3) or instead should continue to be defined as a “percent-encoded URI” consistent with v2.2.  The consensus was that the group preferred the formatted string (for a variety of reasons) but wasn’t wild about the current draft binding syntax.

 

In the current draft, the formatted string binding looks like this:

 

cpe-23:/part:vend:prod:ver:upd:ed:lang:sw_ed:target_sw:target_hw:other

 

In designing the syntax of the name, the following decisions need to be made:

 

·         prefix:  What prefix, if any, should be used to clearly indicate that the string represents a CPE name conformant with v2.3?  In the design above, the prefix is “cpe-23:/”.

·         field separator:  What character should be used to delimit the fields comprising the name?  The design above follows past practice in using the colon for this purpose.

·         metacharacters: In v2.3 we intend to support a single-character wildcard as well as a multi-character wildcard.  At present, we define the ‘?’ metacharacter as the single-character wildcard, and ‘*’ as the multi-character wildcard.  But these choices turn out to be inconsistent with most standard regular-expression standards, e.g., Posix, PCRE, and XML Schema regexp.

·         escape character: Given our desire to introduce metacharacters, we need a way to “escape” them when not intended to have their meta meanings.  In the current design, we use a backslash, which appears to be consistent with most regular-expression standards.

·         policy for printable non-alphanumerics: In v2.2, most printable non-alphanumeric characters were required to be percent-encoded if embedded within name components.  The formatted string need not have such requirements, and there is a general sentiment towards making CPE names as directly readable as possible.  More on this shortly.

 

We also heard a request that we try to make the names easy to incorporate into RDF resources, and that we try to maintain compatibility with XML parsers and processors.

 

Regarding the prefix, it seems to us that it continues to be a good idea to have string-based CPE names be clearly distinguishable as such in the first few characters, rather than just depending on proper interpretation in context.  But perhaps we don’t need to embed the CPE version as the current draft does.

 

Regarding the field separator, we heard strong guidance to abandon the colon, as “colon-ized names” cannot be directly handled by many RDF processors.  One alternative is the forward slash (“/”).  But assuming we retain the backslash as the escape character, we could end up with a mix of forward and backslashes in the same string, and that strikes many as hard to read.  Other leading candidates for field separator include: hyphen (if we use a different character for the logical NA), vertical bar, tilde, semicolon.  I’m thinking that the hyphen wouldn’t be a bad choice, if we can find a good alternative for NA.

 

Regarding metacharacters, it turns out that the most common single-character wildcard in use is the period (.), and the most common multi-character wildcard is the two-character sequence period-asterisk (.*).  (See, for example, http://www.regular-expressions.info/tutorial.html.)  But if we adopt the period as our wildcard metacharacter in the formatted string, we will constantly be escaping the periods used to separate version elements.  So we’re inclined to retain the dollar-sign as our equivalent to the period metacharacter.

 

Regarding the multi-character wildcard, we see that regexp common practice appears to be to use the asterisk following a character to mean “zero or more instances of the preceding token”.  For example, “foob*ar” would match “fooar”, “foobar”, “foobbar”, etc.  We rather like that practice so we think we should adopt it.  This means that our use of the two wildcard metacharacters will be consistent with current usage, even if we’ve chosen to substitute a question-mark for a period.

 

Putting all this together so far, we’re leaning towards a name syntax as follows:

 

·         Prefix remains “cpe”;

·         Field separator is the hyphen;

·         Metacharacters are question-mark and asterisk, with usage consistent with common regexp usage including XML schema;

·         Escape character is the backslash;

·         Use the hash-mark instead of the hyphen to indicate NA;

·         Use the asterisk alone between hyphens to indicate ANY.

 

So now a v2.3-conformant name would look like this:

 

cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#

 

In this name, part=o, vendor=Microsoft, product=windows7, version=ANY, update=sp1, edition=NA, language=NA, sw_edition=home_premium, target_sw=NA, target_hw=x64, and other=NA.

 

To illustrate some of the other features, consider:

 

cpe-a-*-?*linux?*-5.?-red\*-*-*-home\\premium-*-*-*

 

In this example, the product is a pattern that would match, e.g., “linux”, “foolinux”, “linuxbar”, “foolinuxbar”, etc.  The version is a pattern that would match, e.g., “5.0”, “5.1”, etc., but not “5.10”.  The update is the constant string “red*”, and sw_edition is the constant string “home\premium”.

 

Finally, let’s address literals vs. non-literals.  “Literals” get to appear without an escaping, and certainly include all the letters.  (We’re going to require that only lowercase letters be used.)  We’d also like to include the period and the underscore among the literals, since they’re so commonly used in attribute values.  What about all the other printable non-alphanumeric characters?

 

We already know we need to escape these characters if they are used within fields: hyphen, asterisk, question-mark, hash-mark, backslash.   We also know that a lot of other non-alphanumerics are used as meta-characters in standard regexp algorithms, meaning that somewhere along the way they’ll need to get escaped.  And we suspect that at some future point we may want to use other non-alpha characters for special purposes.  So we’re leaning strongly towards an “escape all” policy for printable non-alphanumeric characters except for the period and the underscore.  This strikes us as a simple rule to implement as well as to remember.  So this means that instead of percent encoding all those characters, you’ll escape them (with the backslash) in the formatted string binding.

 

In sum, the full proposal for the formatted string binding syntax is this:

 

·         Prefix remains “cpe”;

·         Field separator is the hyphen;

·         Metacharacters are question-mark and asterisk;

·         Escape character is the backslash;

·         Use the hash-mark instead of the hyphen to indicate NA;

·         Use the asterisk alone between hyphens to indicate ANY;

·         Only allow lowercase letters;

·         Include period and underscore among the literals;

·         Require that all other printable non-alphanumeric characters be escaped, except for the metacharacters (which may or may not be escaped depending on intended meaning).

 

Reactions?

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Wolfkiel, Joseph

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?

 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.

 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.

 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.

 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.

 

·         Require that all other printable non-alphanumeric characters be escaped, except for the metacharacters (which may or may not be escaped depending on intended meaning).

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the new CPE name binding introduced in the Naming specification should be defined as a “formatted string” (as it is at present, cf. Section 6.3) or instead should continue to be defined as a “percent-encoded URI” consistent with v2.2.  The consensus was that the group preferred the formatted string (for a variety of reasons) but wasn’t wild about the current draft binding syntax.

 

In the current draft, the formatted string binding looks like this:

 

cpe-23:/part:vend:prod:ver:upd:ed:lang:sw_ed:target_sw:target_hw:other

 

In designing the syntax of the name, the following decisions need to be made:

 

·         prefix:  What prefix, if any, should be used to clearly indicate that the string represents a CPE name conformant with v2.3?  In the design above, the prefix is “cpe-23:/”.

·         field separator:  What character should be used to delimit the fields comprising the name?  The design above follows past practice in using the colon for this purpose.

·         metacharacters: In v2.3 we intend to support a single-character wildcard as well as a multi-character wildcard.  At present, we define the ‘?’ metacharacter as the single-character wildcard, and ‘*’ as the multi-character wildcard.  But these choices turn out to be inconsistent with most standard regular-expression standards, e.g., Posix, PCRE, and XML Schema regexp.

·         escape character: Given our desire to introduce metacharacters, we need a way to “escape” them when not intended to have their meta meanings.  In the current design, we use a backslash, which appears to be consistent with most regular-expression standards.

·         policy for printable non-alphanumerics: In v2.2, most printable non-alphanumeric characters were required to be percent-encoded if embedded within name components.  The formatted string need not have such requirements, and there is a general sentiment towards making CPE names as directly readable as possible.  More on this shortly.

 

We also heard a request that we try to make the names easy to incorporate into RDF resources, and that we try to maintain compatibility with XML parsers and processors.

 

Regarding the prefix, it seems to us that it continues to be a good idea to have string-based CPE names be clearly distinguishable as such in the first few characters, rather than just depending on proper interpretation in context.  But perhaps we don’t need to embed the CPE version as the current draft does.

 

Regarding the field separator, we heard strong guidance to abandon the colon, as “colon-ized names” cannot be directly handled by many RDF processors.  One alternative is the forward slash (“/”).  But assuming we retain the backslash as the escape character, we could end up with a mix of forward and backslashes in the same string, and that strikes many as hard to read.  Other leading candidates for field separator include: hyphen (if we use a different character for the logical NA), vertical bar, tilde, semicolon.  I’m thinking that the hyphen wouldn’t be a bad choice, if we can find a good alternative for NA.

 

Regarding metacharacters, it turns out that the most common single-character wildcard in use is the period (.), and the most common multi-character wildcard is the two-character sequence period-asterisk (.*).  (See, for example, http://www.regular-expressions.info/tutorial.html.)  But if we adopt the period as our wildcard metacharacter in the formatted string, we will constantly be escaping the periods used to separate version elements.  So we’re inclined to retain the dollar-sign as our equivalent to the period metacharacter.

 

Regarding the multi-character wildcard, we see that regexp common practice appears to be to use the asterisk following a character to mean “zero or more instances of the preceding token”.  For example, “foob*ar” would match “fooar”, “foobar”, “foobbar”, etc.  We rather like that practice so we think we should adopt it.  This means that our use of the two wildcard metacharacters will be consistent with current usage, even if we’ve chosen to substitute a question-mark for a period.

 

Putting all this together so far, we’re leaning towards a name syntax as follows:

 

·         Prefix remains “cpe”;

·         Field separator is the hyphen;

·         Metacharacters are question-mark and asterisk, with usage consistent with common regexp usage including XML schema;

·         Escape character is the backslash;

·         Use the hash-mark instead of the hyphen to indicate NA;

·         Use the asterisk alone between hyphens to indicate ANY.

 

So now a v2.3-conformant name would look like this:

 

cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#

 

In this name, part=o, vendor=Microsoft, product=windows7, version=ANY, update=sp1, edition=NA, language=NA, sw_edition=home_premium, target_sw=NA, target_hw=x64, and other=NA.

 

To illustrate some of the other features, consider:

 

cpe-a-*-?*linux?*-5.?-red\*-*-*-home\\premium-*-*-*

 

In this example, the product is a pattern that would match, e.g., “linux”, “foolinux”, “linuxbar”, “foolinuxbar”, etc.  The version is a pattern that would match, e.g., “5.0”, “5.1”, etc., but not “5.10”.  The update is the constant string “red*”, and sw_edition is the constant string “home\premium”.

 

Finally, let’s address literals vs. non-literals.  “Literals” get to appear without an escaping, and certainly include all the letters.  (We’re going to require that only lowercase letters be used.)  We’d also like to include the period and the underscore among the literals, since they’re so commonly used in attribute values.  What about all the other printable non-alphanumeric characters?

 

We already know we need to escape these characters if they are used within fields: hyphen, asterisk, question-mark, hash-mark, backslash.   We also know that a lot of other non-alphanumerics are used as meta-characters in standard regexp algorithms, meaning that somewhere along the way they’ll need to get escaped.  And we suspect that at some future point we may want to use other non-alpha characters for special purposes.  So we’re leaning strongly towards an “escape all” policy for printable non-alphanumeric characters except for the period and the underscore.  This strikes us as a simple rule to implement as well as to remember.  So this means that instead of percent encoding all those characters, you’ll escape them (with the backslash) in the formatted string binding.

 

In sum, the full proposal for the formatted string binding syntax is this:

 

·         Prefix remains “cpe”;

·         Field separator is the hyphen;

·         Metacharacters are question-mark and asterisk;

·         Escape character is the backslash;

·         Use the hash-mark instead of the hyphen to indicate NA;

·         Use the asterisk alone between hyphens to indicate ANY;

·         Only allow lowercase letters;

·         Include period and underscore among the literals;

·         Require that all other printable non-alphanumeric characters be escaped, except for the metacharacters (which may or may not be escaped depending on intended meaning).

 

Reactions?

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Ernest Park-2

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Brant Cheikes
I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.
 
At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.
 
/Brant
 
Brant A. Cheikes
The MITRE Corporation
202 Burlington Road M/S K302, Bedford MA 01730-1420
Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352

From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Keich, Joshua

Brant,

 

If we are going to use an attribute based expression I think the use of XML is most appropriate including the xsd. It would be time well spent.

 

Joshua

 


From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 6:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Wolfkiel, Joseph
In reply to this post by Brant Cheikes

After thinking about this overnight, I’m suggesting we limit discussion to 2 courses of action:

 

1.        Do something that retains backward interoperability (Regular Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining well-formed names and thinking through some of the advanced concepts, but at the same time, just limit changes to extending the existing “edition” field.  If we say that CPE2.3 names will use the tilde separator in the existing edition field, then we can build a 2.2 schema legal CPE format that looks like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:language

 

To support the new values in the edition field, we can simply deprecate the current matching rules, and explain what values should be populated in the new tilde-separated fields as well as how and which ones should be populated (I think providing guidance that the “edition” component be the concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them would be a good compromise).  We should consider deprecating existing 2.2 CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we’re going to make everyone rebuild their XML parsers, let’s fix it for real.

 

In either case, it looks like we should limit the use of the “*” symbol to either the beginning or end of a component name.  We’ve been using the “*” for over a year now and haven’t discovered a compelling use case for allowing the “*” wildcard in the middle of text.  The problems with allowing unrestrained wildcards surfaced during the CPE discussion seem to indicate that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Vladimir Giszpenc

Community,

 

If there is a tool that takes XML and produces CPE 2.2 compliant CPEs haven’t we achieved 1 and 2.  Why is this 1 or 2?  XML certainly lends itself to better readability than any escaped delimited string.  However, if it is a choice, my vote is for XML.

 

Have a nice weekend,

 

Vladimir Giszpenc

Armadillo Technical Lead

DSCI Contractor Supporting

US Army CERDEC S&TCD IAD Tactical Network Protection Branch

(732) 532-8959

 

From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 1:46 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

After thinking about this overnight, I’m suggesting we limit discussion to 2 courses of action:

 

1.       Do something that retains backward interoperability (Regular Expression-wise) with CPE 2.1 and 2.2 or;

2.      Go to attribute-based so we can support a broader range of use cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining well-formed names and thinking through some of the advanced concepts, but at the same time, just limit changes to extending the existing “edition” field.  If we say that CPE2.3 names will use the tilde separator in the existing edition field, then we can build a 2.2 schema legal CPE format that looks like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:language

 

To support the new values in the edition field, we can simply deprecate the current matching rules, and explain what values should be populated in the new tilde-separated fields as well as how and which ones should be populated (I think providing guidance that the “edition” component be the concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them would be a good compromise).  We should consider deprecating existing 2.2 CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we’re going to make everyone rebuild their XML parsers, let’s fix it for real.

 

In either case, it looks like we should limit the use of the “*” symbol to either the beginning or end of a component name.  We’ve been using the “*” for over a year now and haven’t discovered a compelling use case for allowing the “*” wildcard in the middle of text.  The problems with allowing unrestrained wildcards surfaced during the CPE discussion seem to indicate that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Brant Cheikes
In reply to this post by Brant Cheikes
If we pursue course #1, how do we deal with percent-encoding rules? How can we allow asterisk and question-mark to be embedded as wildcards, and how do we block their interpretation as metacharacters?

/Brant


--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry
------Original Message------
From: Wolfkiel, Joseph
To: cpe-discussion-list CPE Community Forum
ReplyTo: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding
Sent: Jun 18, 2010 1:46 PM

After thinking about this overnight, I'm suggesting we limit discussion to 2
courses of action:

 

1.        Do something that retains backward interoperability (Regular
Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use
cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining
well-formed names and thinking through some of the advanced concepts, but at
the same time, just limit changes to extending the existing "edition" field.
If we say that CPE2.3 names will use the tilde separator in the existing
edition field, then we can build a 2.2 schema legal CPE format that looks
like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:lan
guage

 

To support the new values in the edition field, we can simply deprecate the
current matching rules, and explain what values should be populated in the
new tilde-separated fields as well as how and which ones should be populated
(I think providing guidance that the "edition" component be the
concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them
would be a good compromise).  We should consider deprecating existing 2.2
CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we're going to make everyone rebuild their XML parsers, let's
fix it for real.

 

In either case, it looks like we should limit the use of the "*" symbol to
either the beginning or end of a component name.  We've been using the "*"
for over a year now and haven't discovered a compelling use case for
allowing the "*" wildcard in the middle of text.  The problems with allowing
unrestrained wildcards surfaced during the CPE discussion seem to indicate
that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

 

I am open to replacing the formatted string with a simple attribute-based
expression.  I don't think I have time to get both into the spec; it really
needs to be one or the other.  And if I spec out an attribute-based
expression, I wouldn't define it as an XML expression (with an XSD and all
that, mostly due to limited time to get one put together

------Original Message Truncated------

smime.p7s (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Thomas Jones
In reply to this post by Vladimir Giszpenc
Most definitely the xml implementation!! Whole heartedly.

Furthermore, the inevitable question of elements or attributes should be addressed. I have been working on oval and cpe content modeling and this question keeps arising.  

Sent from my iPhone

On Jun 18, 2010, at 12:59 PM, Vladimir Giszpenc <[hidden email]> wrote:

Community,

 

If there is a tool that takes XML and produces CPE 2.2 compliant CPEs haven’t we achieved 1 and 2.  Why is this 1 or 2?  XML certainly lends itself to better readability than any escaped delimited string.  However, if it is a choice, my vote is for XML.

 

Have a nice weekend,

 

Vladimir Giszpenc

Armadillo Technical Lead

DSCI Contractor Supporting

US Army CERDEC S&TCD IAD Tactical Network Protection Branch

(732) 532-8959

 

From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 1:46 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

After thinking about this overnight, I’m suggesting we limit discussion to 2 courses of action:

 

1.       Do something that retains backward interoperability (Regular Expression-wise) with CPE 2.1 and 2.2 or;

2.      Go to attribute-based so we can support a broader range of use cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining well-formed names and thinking through some of the advanced concepts, but at the same time, just limit changes to extending the existing “edition” field.  If we say that CPE2.3 names will use the tilde separator in the existing edition field, then we can build a 2.2 schema legal CPE format that looks like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:language

 

To support the new values in the edition field, we can simply deprecate the current matching rules, and explain what values should be populated in the new tilde-separated fields as well as how and which ones should be populated (I think providing guidance that the “edition” component be the concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them would be a good compromise).  We should consider deprecating existing 2.2 CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we’re going to make everyone rebuild their XML parsers, let’s fix it for real.

 

In either case, it looks like we should limit the use of the “*” symbol to either the beginning or end of a component name.  We’ve been using the “*” for over a year now and haven’t discovered a compelling use case for allowing the “*” wildcard in the middle of text.  The problems with allowing unrestrained wildcards surfaced during the CPE discussion seem to indicate that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Brant Cheikes
Wouldn't attributes make the most sense in CPE's case?

/Brant
--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry


From: Thomas R. Jones <[hidden email]>
To: cpe-discussion-list CPE Community Forum
Sent: Fri Jun 18 15:09:23 2010
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

Most definitely the xml implementation!! Whole heartedly.

Furthermore, the inevitable question of elements or attributes should be addressed. I have been working on oval and cpe content modeling and this question keeps arising.  

Sent from my iPhone

On Jun 18, 2010, at 12:59 PM, Vladimir Giszpenc <[hidden email]> wrote:

Community,

 

If there is a tool that takes XML and produces CPE 2.2 compliant CPEs haven’t we achieved 1 and 2.  Why is this 1 or 2?  XML certainly lends itself to better readability than any escaped delimited string.  However, if it is a choice, my vote is for XML.

 

Have a nice weekend,

 

Vladimir Giszpenc

Armadillo Technical Lead

DSCI Contractor Supporting

US Army CERDEC S&TCD IAD Tactical Network Protection Branch

(732) 532-8959

 

From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 1:46 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

After thinking about this overnight, I’m suggesting we limit discussion to 2 courses of action:

 

1.       Do something that retains backward interoperability (Regular Expression-wise) with CPE 2.1 and 2.2 or;

2.      Go to attribute-based so we can support a broader range of use cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining well-formed names and thinking through some of the advanced concepts, but at the same time, just limit changes to extending the existing “edition” field.  If we say that CPE2.3 names will use the tilde separator in the existing edition field, then we can build a 2.2 schema legal CPE format that looks like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:language

 

To support the new values in the edition field, we can simply deprecate the current matching rules, and explain what values should be populated in the new tilde-separated fields as well as how and which ones should be populated (I think providing guidance that the “edition” component be the concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them would be a good compromise).  We should consider deprecating existing 2.2 CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we’re going to make everyone rebuild their XML parsers, let’s fix it for real.

 

In either case, it looks like we should limit the use of the “*” symbol to either the beginning or end of a component name.  We’ve been using the “*” for over a year now and haven’t discovered a compelling use case for allowing the “*” wildcard in the middle of text.  The problems with allowing unrestrained wildcards surfaced during the CPE discussion seem to indicate that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Wolfkiel, Joseph

That’s certainly my preference.  I think the cost of an element approach (i.e. 9 potential element lines of text per software name) would make an element-based approach both less human readable and higher bandwidth.  I’m curious about what would drive an element-based approach.  Using an attribute-based approach also frees up the use of subordinate elements to give other related software characteristics that aren’t part of the name (e.g. product family, license, MD5 hash, common use, etc.)

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:14 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

Wouldn't attributes make the most sense in CPE's case?

/Brant
--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry

 


From: Thomas R. Jones <[hidden email]>
To: cpe-discussion-list CPE Community Forum
Sent: Fri Jun 18 15:09:23 2010
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

Most definitely the xml implementation!! Whole heartedly.

 

Furthermore, the inevitable question of elements or attributes should be addressed. I have been working on oval and cpe content modeling and this question keeps arising.  

Sent from my iPhone


On Jun 18, 2010, at 12:59 PM, Vladimir Giszpenc <[hidden email]> wrote:

Community,

 

If there is a tool that takes XML and produces CPE 2.2 compliant CPEs haven’t we achieved 1 and 2.  Why is this 1 or 2?  XML certainly lends itself to better readability than any escaped delimited string.  However, if it is a choice, my vote is for XML.

 

Have a nice weekend,

 

Vladimir Giszpenc

Armadillo Technical Lead

DSCI Contractor Supporting

US Army CERDEC S&TCD IAD Tactical Network Protection Branch

(732) 532-8959

 

From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 1:46 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

After thinking about this overnight, I’m suggesting we limit discussion to 2 courses of action:

 

1.       Do something that retains backward interoperability (Regular Expression-wise) with CPE 2.1 and 2.2 or;

2.      Go to attribute-based so we can support a broader range of use cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining well-formed names and thinking through some of the advanced concepts, but at the same time, just limit changes to extending the existing “edition” field.  If we say that CPE2.3 names will use the tilde separator in the existing edition field, then we can build a 2.2 schema legal CPE format that looks like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:language

 

To support the new values in the edition field, we can simply deprecate the current matching rules, and explain what values should be populated in the new tilde-separated fields as well as how and which ones should be populated (I think providing guidance that the “edition” component be the concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them would be a good compromise).  We should consider deprecating existing 2.2 CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we’re going to make everyone rebuild their XML parsers, let’s fix it for real.

 

In either case, it looks like we should limit the use of the “*” symbol to either the beginning or end of a component name.  We’ve been using the “*” for over a year now and haven’t discovered a compelling use case for allowing the “*” wildcard in the middle of text.  The problems with allowing unrestrained wildcards surfaced during the CPE discussion seem to indicate that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

I am open to replacing the formatted string with a simple attribute-based expression.  I don't think I have time to get both into the spec; it really needs to be one or the other.  And if I spec out an attribute-based expression, I wouldn't define it as an XML expression (with an XSD and all that, mostly due to limited time to get one put together and tested), though I'm fine with the XML-like syntax.

 

At this point I would need to hear (a) some broader indications of support for an XML-like attribute-based binding instead of the formatted string, and (b) no compelling arguments AGAINST dropping the formatted string as the defined v2.3 binding.

 

/Brant

 

Brant A. Cheikes

The MITRE Corporation

202 Burlington Road M/S K302, Bedford MA 01730-1420

Email: [hidden email]; Tel: 781-271-7505; Fax: 781-271-2352


From: Ernest Park [[hidden email]]
Sent: Thursday, June 17, 2010 6:03 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I agree with the attribute based expression. If we are going to make significant changes, let's look at a more flexible way to advance this issue.

On Jun 17, 2010 5:55 PM, "Wolfkiel, Joseph" <[hidden email]> wrote:

Brandt,

 

My primary reaction is that the proposed changes are probably way too dramatic in scope to be meaningfully considered in the time remaining between now and when the spec needs to go final.

 

The introduction of new characters and repurposing of existing special characters, particularly the hyphen will require pretty much 100% redevelopment of CPE parsing code.  My opinion on the specific issues:

 

·         Prefix remains “cpe”;

Not sure if this is good of not.  Do we now have to modify some other transport characteristic (such as tags in OVAL/XCCDF/OCIL/ARF) to state whether a platform specifier is now CPE 1.0, 2.1 or 2.2, or 2.3?



 

·         Field separator is the hyphen;

This is a very dramatic change in that it takes a reserved character and completely changes its meaning.  I’m not sure that the return on investment is clear.

 

·         Metacharacters are question-mark and asterisk;

So the asterisk appears the same, but you’ve re-specified its behavior.  The previous definition of “*” is now conveyed by “?*”.



 

·         Escape character is the backslash;

This makes sense when you state its purpose is to enhance readability.



 

·         Use the hash-mark instead of the hyphen to indicate NA;

Why the hash mark?  It serves several other functions in common usage, so I can’t make a case for using it as a “NA” character.

 

·         Use the asterisk alone between hyphens to indicate ANY;

Why not use “?*” to be consistent with the usage when it’s incorporated in text?  Also, if you allow it to mean what you’ve specified, a tool could potentially take “-*” to be an arbitrary length string of “-“s.

 

·         Only allow lowercase letters;

This hasn’t changed, so probably shouldn’t be listed as an issue.



 

·         Include period and underscore among the literals;

Okay.  Appears to make sense since they’re in common usage in the earlier 2.x CPE instances.



 

·         Require that all other printable non-alphanumeric characters be escaped, except for t...

Believe we agreed to this at the meeting.

 

Bottom line, with the changes you propose below, I think the “new” CPE 2.3 is so different from the previous 2.x CPEs that a user couldn’t be expected to be able to look at a prior 2.x CPE and understand how it relates to a 2.3 CPE.  If these changes are actually viable candidates, I propose we convert to an attribute-based CPE and abandon the string representation altogether (or at least relegate it to a co-equal binding alternative).  An attribute-based CPE would be directly human readable and solve many of the existing problems without any special cases.  E.g. “NA” can be stated as “” versus coming up with some arbitrary special character.  I still like the * as an unrestricted wildcard and ? as single character wildcard but don’t see those as critical issues.

 

An attribute-based CPE would be fairly transparent compared to the examples you cite below and much easier to parse as well as being extremely human-reader friendly. 

 

E.g.

 

“cpe-o-microsoft-windows7-*-sp1-*-*-home_premium-#-x64-#”

Becomes

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1” swEdition=”home_premium” tgtSW=”” tgtHW=”x64” other=””/>

 

Otherwise, I would propose we limit our discussion to issues that were identified at, or prior to, the meeting yesterday so we have some chance of fully discussing changes that show up in the final CPE specification. 

 

 

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 4:07 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding



 

There was some discussion at yesterday’s CPE developer day session regarding whether or not the...


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Wolfkiel, Joseph
In reply to this post by Brant Cheikes
I was assuming those would be added in as reserved characters as well.
Sorry if I wasn't clear on that.  A "*" or "?" that isn't percent encoded
would be used as multiple or single wildcard values respectively and their
percent-encoded versions would be treated so that the characters would be
part of the product names.

I think this is consistent in the minor-revision trail since we added the
"-" as a reserved character in version 2.2.  Of course, it isn't really
backwards compatible with 2.2 and 2.1, but we've been using the "*" in ARF
and ASR for a while.  If we add "*" and "?", existing CPE publishers
wouldn't have to change anything--only consumers, since there weren't any
previously allowable wildcards.  I don't think there are very many existing
CPE consumers that would be impacted by this change.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Friday, June 18, 2010 2:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

If we pursue course #1, how do we deal with percent-encoding rules? How can
we allow asterisk and question-mark to be embedded as wildcards, and how do
we block their interpretation as metacharacters?

/Brant


--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry
------Original Message------
From: Wolfkiel, Joseph
To: cpe-discussion-list CPE Community Forum
ReplyTo: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding
Sent: Jun 18, 2010 1:46 PM

After thinking about this overnight, I'm suggesting we limit discussion to 2
courses of action:

 

1.        Do something that retains backward interoperability (Regular
Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use
cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining
well-formed names and thinking through some of the advanced concepts, but at
the same time, just limit changes to extending the existing "edition" field.
If we say that CPE2.3 names will use the tilde separator in the existing
edition field, then we can build a 2.2 schema legal CPE format that looks
like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:lan
guage

 

To support the new values in the edition field, we can simply deprecate the
current matching rules, and explain what values should be populated in the
new tilde-separated fields as well as how and which ones should be populated
(I think providing guidance that the "edition" component be the
concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them
would be a good compromise).  We should consider deprecating existing 2.2
CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we're going to make everyone rebuild their XML parsers, let's
fix it for real.

 

In either case, it looks like we should limit the use of the "*" symbol to
either the beginning or end of a component name.  We've been using the "*"
for over a year now and haven't discovered a compelling use case for
allowing the "*" wildcard in the middle of text.  The problems with allowing
unrestrained wildcards surfaced during the CPE discussion seem to indicate
that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

 

I am open to replacing the formatted string with a simple attribute-based
expression.  I don't think I have time to get both into the spec; it really
needs to be one or the other.  And if I spec out an attribute-based
expression, I wouldn't define it as an XML expression (with an XSD and all
that, mostly due to limited time to get one put together

------Original Message Truncated------

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding (UNCLASSIFIED)

Moore, Scott D CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

I'd also add that I'm not sure we need "-" as a special character for NA if we're now allowing wildcards, etc.
Does the empty string "" have any logical meaning other than NA?
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
https://jeds.gds.disa.mil/jeds/searchAffiliates.action?id=31333835333433313132

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:29 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

I was assuming those would be added in as reserved characters as well.
Sorry if I wasn't clear on that.  A "*" or "?" that isn't percent encoded
would be used as multiple or single wildcard values respectively and their
percent-encoded versions would be treated so that the characters would be
part of the product names.

I think this is consistent in the minor-revision trail since we added the
"-" as a reserved character in version 2.2.  Of course, it isn't really
backwards compatible with 2.2 and 2.1, but we've been using the "*" in ARF
and ASR for a while.  If we add "*" and "?", existing CPE publishers
wouldn't have to change anything--only consumers, since there weren't any
previously allowable wildcards.  I don't think there are very many existing
CPE consumers that would be impacted by this change.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Friday, June 18, 2010 2:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

If we pursue course #1, how do we deal with percent-encoding rules? How can
we allow asterisk and question-mark to be embedded as wildcards, and how do
we block their interpretation as metacharacters?

/Brant


--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry
------Original Message------
From: Wolfkiel, Joseph
To: cpe-discussion-list CPE Community Forum
ReplyTo: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding
Sent: Jun 18, 2010 1:46 PM

After thinking about this overnight, I'm suggesting we limit discussion to 2
courses of action:

 

1.        Do something that retains backward interoperability (Regular
Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use
cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining
well-formed names and thinking through some of the advanced concepts, but at
the same time, just limit changes to extending the existing "edition" field.
If we say that CPE2.3 names will use the tilde separator in the existing
edition field, then we can build a 2.2 schema legal CPE format that looks
like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:lan
guage

 

To support the new values in the edition field, we can simply deprecate the
current matching rules, and explain what values should be populated in the
new tilde-separated fields as well as how and which ones should be populated
(I think providing guidance that the "edition" component be the
concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them
would be a good compromise).  We should consider deprecating existing 2.2
CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we're going to make everyone rebuild their XML parsers, let's
fix it for real.

 

In either case, it looks like we should limit the use of the "*" symbol to
either the beginning or end of a component name.  We've been using the "*"
for over a year now and haven't discovered a compelling use case for
allowing the "*" wildcard in the middle of text.  The problems with allowing
unrestrained wildcards surfaced during the CPE discussion seem to indicate
that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

 

I am open to replacing the formatted string with a simple attribute-based
expression.  I don't think I have time to get both into the spec; it really
needs to be one or the other.  And if I spec out an attribute-based
expression, I wouldn't define it as an XML expression (with an XSD and all
that, mostly due to limited time to get one put together

------Original Message Truncated------
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding (UNCLASSIFIED)

Wolfkiel, Joseph
It's not clear.  In a formatted string, like the current URI, the component
is always present, so it's not possible to determine by whether it contains
an empty value whether it wasn't assessed, or if it was assessed and a null
value was found.  The current spec assumes it is "unknown" and should be
treated as an "any" value.  If we use an "any" tag (i.e. "*"), I suppose it
would be reasonable to treat a "::" as a null.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:32 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd also add that I'm not sure we need "-" as a special character for NA if
we're now allowing wildcards, etc.
Does the empty string "" have any logical meaning other than NA?
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
https://jeds.gds.disa.mil/jeds/searchAffiliates.action?id=313338353334333131
32

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:29 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

I was assuming those would be added in as reserved characters as well.
Sorry if I wasn't clear on that.  A "*" or "?" that isn't percent encoded
would be used as multiple or single wildcard values respectively and their
percent-encoded versions would be treated so that the characters would be
part of the product names.

I think this is consistent in the minor-revision trail since we added the
"-" as a reserved character in version 2.2.  Of course, it isn't really
backwards compatible with 2.2 and 2.1, but we've been using the "*" in ARF
and ASR for a while.  If we add "*" and "?", existing CPE publishers
wouldn't have to change anything--only consumers, since there weren't any
previously allowable wildcards.  I don't think there are very many existing
CPE consumers that would be impacted by this change.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Friday, June 18, 2010 2:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

If we pursue course #1, how do we deal with percent-encoding rules? How can
we allow asterisk and question-mark to be embedded as wildcards, and how do
we block their interpretation as metacharacters?

/Brant


--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry
------Original Message------
From: Wolfkiel, Joseph
To: cpe-discussion-list CPE Community Forum
ReplyTo: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding
Sent: Jun 18, 2010 1:46 PM

After thinking about this overnight, I'm suggesting we limit discussion to 2
courses of action:

 

1.        Do something that retains backward interoperability (Regular
Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use
cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining
well-formed names and thinking through some of the advanced concepts, but at
the same time, just limit changes to extending the existing "edition" field.
If we say that CPE2.3 names will use the tilde separator in the existing
edition field, then we can build a 2.2 schema legal CPE format that looks
like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:lan
guage

 

To support the new values in the edition field, we can simply deprecate the
current matching rules, and explain what values should be populated in the
new tilde-separated fields as well as how and which ones should be populated
(I think providing guidance that the "edition" component be the
concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them
would be a good compromise).  We should consider deprecating existing 2.2
CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we're going to make everyone rebuild their XML parsers, let's
fix it for real.

 

In either case, it looks like we should limit the use of the "*" symbol to
either the beginning or end of a component name.  We've been using the "*"
for over a year now and haven't discovered a compelling use case for
allowing the "*" wildcard in the middle of text.  The problems with allowing
unrestrained wildcards surfaced during the CPE discussion seem to indicate
that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

 

I am open to replacing the formatted string with a simple attribute-based
expression.  I don't think I have time to get both into the spec; it really
needs to be one or the other.  And if I spec out an attribute-based
expression, I wouldn't define it as an XML expression (with an XSD and all
that, mostly due to limited time to get one put together

------Original Message Truncated------
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Thomas Jones
In reply to this post by Wolfkiel, Joseph
On Fri, 2010-06-18 at 15:20 -0400, Wolfkiel, Joseph wrote:
> That’s certainly my preference.  I think the cost of an element
> approach (i.e. 9 potential element lines of text per software name)
> would make an element-based approach both less human readable and
> higher bandwidth.  I’m curious about what would drive an element-based
> approach.  Using an attribute-based approach also frees up the use of
> subordinate elements to give other related software characteristics
> that aren’t part of the name (e.g. product family, license, MD5 hash,
> common use, etc.)

Actually, that is incorrect. I will try to be as brief as possible. We
are beginning to delve into the XML developmental realm. Although, that
may not be a bad idea at this time.

MEMORY USAGE
In DOM implementation, the XML elements are represented internally as
nodes. This uses more memory than the strings commonly used during
processing of attributes. The implementation of the data model even
comes into question. For instance, if either elemental or attributal api
is constructed within classes, then the memory footprint will take up
more to process accordingly for both.

When a XML processor receives input of data during streaming, it is in a
single element order. So process logic may be short-circuited to either
defer for further contextual processing or break with a logical bypass.
Whereas with attributes, all attributes and their values are returned.
This costs more memory due to the inability to short-circuit the
decision model during application processing.

SGML was much more complex, but with XML all elements and attributes are
escaped so this lends nothing to the design criteria.

If multiple attributes of the same name are presented then the memory
usage of elements is more efficient. e.g.

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1”
swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp2”
swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp3”
swEdition=”home_premium” tgtHW=”x64” />

versus

<cpe23 part="a">
<vendor>microsoft</vendor>
<product>windows7</product>
<update>sp1</update>
<update>sp2</update>
<update>sp3</update>
<swedition>home premium</swedition>
<targethw>x64</targethw>
</cpe23>

Each token must be read reiteratively with an attribute based approach.
The XML processor reads the "vendor" attribute 3 times to process the
example data object. With an elemental approach, each element is read
once and once only. The XML processor reads the "vendor" element once
and can infer that all siblings of the parent element are associated
with its content/value.

Obviously the above example is exponentially more important as the data
object continues to increase in size. Imagine a 1,000 record data object
constructed with the CPE XML language model. That is an increase of 999
reads of a common token of an attribute-based approach; versus 1 read
for an elemental-based approach. Logic algorithms require that elements
be used in this example.

EASE-OF-USE
Attributes are more restrictive than elements. All XML data models must
have at least one element, most contain a majority; so an all-element
design is simplest. At least to the end-user it is. Don't read this as
saying it's the best-----as we will see later, the context of the data
means much more than ease-of-use. There are trivial differences
programmatically on whether elements or attributes are easier to develop
and process. This is obviously language dependent and would be at the
the discretion of the application and system engineers on how they feel
personally one way or the other. To the CPE standard, this should be
irrelevant.

We as professional developers and contributors should be able to easily
read both approaches. I think this is a non-issue. However, common
end-user usage of the CPE language lends itself to elements. Someday
that may be a use-case.

DATA MODELING
A common basis for XML development states that if the data is to be
represented to the end-user than use elements. If not, and majority of
data is to be machine-readable only, than utilize attributes. This is
not a concrete decision model, but applies uniformly in most cases. The
application of an entire XML document object to a particular platform or
product is an essential part of the data representation. With this in
mind, it presents the logic that the CPE data is crucial. Principles of
XML design state that if the data should be core content then it should
be easily presented and readable by both machine and human alike.

Another basis is whether multiple whitespace-separated tokens are to be
implemented. Elemental nodes cannot contain whitespace. They must
utilize an alternative XML language structure; usually camel-casing or
an underscore replacement for whitespace. Attributes allow for various
whitespace inclusions-----pre-data, throughout and even post-data.

NATURAL LANGUAGE
The data currently provided within the CPE data dictionary is in a
natural language format. This lends itself to inclusion of the xml:lang
namespace-aware attribute so that i18 compliance may be implemented to
label the language used by CPE authors. Japanese is a natural language
that would require appropriate annotations that are generally
represented using child elements. This cannot be the case with
attributes. R-to-L languages such as Hebrew and Arabic require a
bidirectionality property. We should ensure that we do not deceptively
bottle-neck future developments by requiring only en_US locale and
language additions.

ELEMENTAL USE
If data might appear more than once in a data model, use an element
construct rather than introducing attributes with names like version1,
version2, version3....etc.

If order matters between two pieces of data, use elements for them;
because attributes are inherently unordered. The attribute ordering may
be influenced with normalization transformation application, but this
has not been addressed within the CPE community to my knowledge.

If data has, or might have, its own substructure, as in wildcards; then
it is greatly beneficial to programmatically represent within element
nodes versus attributes. The XML processing of attribute1="value*" is
going to present XML processors a huge hurdle to overcome. And we cannot
expect to require upstream toolkit developers to acquiesce to the CPE
languages improper data modeling. Furthermore, if the data is a
constituent part of a larger construct of data, then it should be placed
within an elemental node due to general data modeling logic.

ATTRIBUTE USE
I won't go into too much here due to the previous posts about
attributes. However, I will note alternate reasoning not presented
already.

The use of attributes lends itself easily to an enumerated list. By
utilizing a controlled vocabulary of allowed values, the XML authors
must conform to CPE standard representations by the XML developers. The
context of the data is most important. If the data is to be considered
metadata than it is preferable to implement within an attribute. Some
examples of this may be: reference or representation of a class or role
of the parent node, method of processing the parent nodes contents. This
lends itself to the logic that if data applied to descendant elements of
a parent node then it would/should be constructed as an attribute.

PRODUCTIVITY AND MAINTAINABILITY
Attributes are designed for expressing simple properties of the data
represented in an element. If you work against the basic architecture of
XML by shoehorning structured information into attributes you may gain
some specious terseness and convenience, but you will probably pay in
maintenance costs.

NOTE: Many of these point have been made time and time again. Many of
these points have already been discussed thoroughly on XML development
resources. I simply have paraphrased and presented what I believe the
most prevalent to the CPE communities needs.

Cheers.
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Ernest Park-2
Could I suggest a "pause" and review?

If we are discussing a change as substantial in implementation as it seems, we should make sure that a good long term decision is made. 


The elemental approach mentioned herein is reasonable - actually looks like JSON, versus the attribute/XML type.


I think either way is an improvement, but needs clear articulation and more input from the community. This would require a wholesale change in the way that we express SCAP and vulnerability information. 


Further, we would be forking our data management tree. Would we need to maintain parallel data sets to support older stuff? What will be done for backward compatibility?

What are the thoughts from the NVD team?





Ernie

On Fri, Jun 18, 2010 at 10:28 PM, Thomas R. Jones <[hidden email]> wrote:
On Fri, 2010-06-18 at 15:20 -0400, Wolfkiel, Joseph wrote:
> That’s certainly my preference.  I think the cost of an element
> approach (i.e. 9 potential element lines of text per software name)
> would make an element-based approach both less human readable and
> higher bandwidth.  I’m curious about what would drive an element-based
> approach.  Using an attribute-based approach also frees up the use of
> subordinate elements to give other related software characteristics
> that aren’t part of the name (e.g. product family, license, MD5 hash,
> common use, etc.)

Actually, that is incorrect. I will try to be as brief as possible. We
are beginning to delve into the XML developmental realm. Although, that
may not be a bad idea at this time.

MEMORY USAGE
In DOM implementation, the XML elements are represented internally as
nodes. This uses more memory than the strings commonly used during
processing of attributes. The implementation of the data model even
comes into question. For instance, if either elemental or attributal api
is constructed within classes, then the memory footprint will take up
more to process accordingly for both.

When a XML processor receives input of data during streaming, it is in a
single element order. So process logic may be short-circuited to either
defer for further contextual processing or break with a logical bypass.
Whereas with attributes, all attributes and their values are returned.
This costs more memory due to the inability to short-circuit the
decision model during application processing.

SGML was much more complex, but with XML all elements and attributes are
escaped so this lends nothing to the design criteria.

If multiple attributes of the same name are presented then the memory
usage of elements is more efficient. e.g.

<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1”
swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp2”
swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp3”
swEdition=”home_premium” tgtHW=”x64” />

versus

<cpe23 part="a">
<vendor>microsoft</vendor>
<product>windows7</product>
<update>sp1</update>
<update>sp2</update>
<update>sp3</update>
<swedition>home premium</swedition>
<targethw>x64</targethw>
</cpe23>

Each token must be read reiteratively with an attribute based approach.
The XML processor reads the "vendor" attribute 3 times to process the
example data object. With an elemental approach, each element is read
once and once only. The XML processor reads the "vendor" element once
and can infer that all siblings of the parent element are associated
with its content/value.

Obviously the above example is exponentially more important as the data
object continues to increase in size. Imagine a 1,000 record data object
constructed with the CPE XML language model. That is an increase of 999
reads of a common token of an attribute-based approach; versus 1 read
for an elemental-based approach. Logic algorithms require that elements
be used in this example.

EASE-OF-USE
Attributes are more restrictive than elements. All XML data models must
have at least one element, most contain a majority; so an all-element
design is simplest. At least to the end-user it is. Don't read this as
saying it's the best-----as we will see later, the context of the data
means much more than ease-of-use. There are trivial differences
programmatically on whether elements or attributes are easier to develop
and process. This is obviously language dependent and would be at the
the discretion of the application and system engineers on how they feel
personally one way or the other. To the CPE standard, this should be
irrelevant.

We as professional developers and contributors should be able to easily
read both approaches. I think this is a non-issue. However, common
end-user usage of the CPE language lends itself to elements. Someday
that may be a use-case.

DATA MODELING
A common basis for XML development states that if the data is to be
represented to the end-user than use elements. If not, and majority of
data is to be machine-readable only, than utilize attributes. This is
not a concrete decision model, but applies uniformly in most cases. The
application of an entire XML document object to a particular platform or
product is an essential part of the data representation. With this in
mind, it presents the logic that the CPE data is crucial. Principles of
XML design state that if the data should be core content then it should
be easily presented and readable by both machine and human alike.

Another basis is whether multiple whitespace-separated tokens are to be
implemented. Elemental nodes cannot contain whitespace. They must
utilize an alternative XML language structure; usually camel-casing or
an underscore replacement for whitespace. Attributes allow for various
whitespace inclusions-----pre-data, throughout and even post-data.

NATURAL LANGUAGE
The data currently provided within the CPE data dictionary is in a
natural language format. This lends itself to inclusion of the xml:lang
namespace-aware attribute so that i18 compliance may be implemented to
label the language used by CPE authors. Japanese is a natural language
that would require appropriate annotations that are generally
represented using child elements. This cannot be the case with
attributes. R-to-L languages such as Hebrew and Arabic require a
bidirectionality property. We should ensure that we do not deceptively
bottle-neck future developments by requiring only en_US locale and
language additions.

ELEMENTAL USE
If data might appear more than once in a data model, use an element
construct rather than introducing attributes with names like version1,
version2, version3....etc.

If order matters between two pieces of data, use elements for them;
because attributes are inherently unordered. The attribute ordering may
be influenced with normalization transformation application, but this
has not been addressed within the CPE community to my knowledge.

If data has, or might have, its own substructure, as in wildcards; then
it is greatly beneficial to programmatically represent within element
nodes versus attributes. The XML processing of attribute1="value*" is
going to present XML processors a huge hurdle to overcome. And we cannot
expect to require upstream toolkit developers to acquiesce to the CPE
languages improper data modeling. Furthermore, if the data is a
constituent part of a larger construct of data, then it should be placed
within an elemental node due to general data modeling logic.

ATTRIBUTE USE
I won't go into too much here due to the previous posts about
attributes. However, I will note alternate reasoning not presented
already.

The use of attributes lends itself easily to an enumerated list. By
utilizing a controlled vocabulary of allowed values, the XML authors
must conform to CPE standard representations by the XML developers. The
context of the data is most important. If the data is to be considered
metadata than it is preferable to implement within an attribute. Some
examples of this may be: reference or representation of a class or role
of the parent node, method of processing the parent nodes contents. This
lends itself to the logic that if data applied to descendant elements of
a parent node then it would/should be constructed as an attribute.

PRODUCTIVITY AND MAINTAINABILITY
Attributes are designed for expressing simple properties of the data
represented in an element. If you work against the basic architecture of
XML by shoehorning structured information into attributes you may gain
some specious terseness and convenience, but you will probably pay in
maintenance costs.

NOTE: Many of these point have been made time and time again. Many of
these points have already been discussed thoroughly on XML development
resources. I simply have paraphrased and presented what I believe the
most prevalent to the CPE communities needs.

Cheers.
Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding

Brant Cheikes

Yes, let’s pause and review.

 

First, an announcement: the pre-release draft specifications are now available here: http://cpe.mitre.org/specification/new_work.html.  These are the documents I posted to the cpe-list on June 9.

 

Second, everyone needs to understand that the CPE v2.3 Naming Specification must be put into near-final form by close of business *this Wednesday*, that is, 23 June.  Next, the document needs to be reviewed by NIST management and is scheduled to be posted for public comment by 30 June.  The public comment period (on Naming only; the other three specs are on a slightly delayed schedule) is scheduled to run from 6/30-7/21.  We will then have a short time to address comments and prepare a final draft, due 7/30.  If the public comment period reveals that the spec is not “stable”—if there are too many significant changes called for, then the whole effort fails and CPE 2.2 will remain the standard for at least another year.  If the spec is found to be stable (if we can satisfactorily address the comments without major changes) then it is expected to become part of SCAP 1.2 and NIST will plan to support the new dictionary format.

 

Given all that, I’ve concluded that introducing an XML or XML-like binding at the 11th hour is off the table.  Last week I thought perhaps it would be doable if it were quick and easy to implement, and if there was widespread support with no compelling counter-arguments.  Judging from the traffic, and after further reflection, it’s clear to me that there’s neither the time to make such a change nor a clear consensus of merit from a sufficiently broad, engaged community.  In addition we have an open question of whether such a binding should be attribute-based or element-based.  Thus I am compelled to declare that CPE 2.3 will continue to use a string-based binding.

 

The question then returns to: what syntax should the string-based binding have?  I’ll take that up in my next message.

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

From: Ernest Park [mailto:[hidden email]]
Sent: Friday, June 18, 2010 10:56 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted string binding

 

Could I suggest a "pause" and review?

 

If we are discussing a change as substantial in implementation as it seems, we should make sure that a good long term decision is made. 

 

 

The elemental approach mentioned herein is reasonable - actually looks like JSON, versus the attribute/XML type.

 

 

I think either way is an improvement, but needs clear articulation and more input from the community. This would require a wholesale change in the way that we express SCAP and vulnerability information. 

 

 

Further, we would be forking our data management tree. Would we need to maintain parallel data sets to support older stuff? What will be done for backward compatibility?

 

What are the thoughts from the NVD team?

 

 

 

 

 

Ernie

On Fri, Jun 18, 2010 at 10:28 PM, Thomas R. Jones <[hidden email]> wrote:

On Fri, 2010-06-18 at 15:20 -0400, Wolfkiel, Joseph wrote:
> That’s certainly my preference.  I think the cost of an element
> approach (i.e. 9 potential element lines of text per software name)
> would make an element-based approach both less human readable and
> higher bandwidth.  I’m curious about what would drive an element-based
> approach.  Using an attribute-based approach also frees up the use of
> subordinate elements to give other related software characteristics
> that aren’t part of the name (e.g. product family, license, MD5 hash,
> common use, etc.)

Actually, that is incorrect. I will try to be as brief as possible. We
are beginning to delve into the XML developmental realm. Although, that
may not be a bad idea at this time.

MEMORY USAGE
In DOM implementation, the XML elements are represented internally as
nodes. This uses more memory than the strings commonly used during
processing of attributes. The implementation of the data model even
comes into question. For instance, if either elemental or attributal api
is constructed within classes, then the memory footprint will take up
more to process accordingly for both.

When a XML processor receives input of data during streaming, it is in a
single element order. So process logic may be short-circuited to either
defer for further contextual processing or break with a logical bypass.
Whereas with attributes, all attributes and their values are returned.
This costs more memory due to the inability to short-circuit the
decision model during application processing.

SGML was much more complex, but with XML all elements and attributes are
escaped so this lends nothing to the design criteria.

If multiple attributes of the same name are presented then the memory
usage of elements is more efficient. e.g.


<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp1”

swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp2”
swEdition=”home_premium” tgtHW=”x64” />
<cpe23 part=”a” vendor=”microsoft” product=”windows7” update=”sp3”
swEdition=”home_premium” tgtHW=”x64” />

versus

<cpe23 part="a">
<vendor>microsoft</vendor>
<product>windows7</product>
<update>sp1</update>
<update>sp2</update>
<update>sp3</update>
<swedition>home premium</swedition>
<targethw>x64</targethw>
</cpe23>

Each token must be read reiteratively with an attribute based approach.
The XML processor reads the "vendor" attribute 3 times to process the
example data object. With an elemental approach, each element is read
once and once only. The XML processor reads the "vendor" element once
and can infer that all siblings of the parent element are associated
with its content/value.

Obviously the above example is exponentially more important as the data
object continues to increase in size. Imagine a 1,000 record data object
constructed with the CPE XML language model. That is an increase of 999
reads of a common token of an attribute-based approach; versus 1 read
for an elemental-based approach. Logic algorithms require that elements
be used in this example.

EASE-OF-USE
Attributes are more restrictive than elements. All XML data models must
have at least one element, most contain a majority; so an all-element
design is simplest. At least to the end-user it is. Don't read this as
saying it's the best-----as we will see later, the context of the data
means much more than ease-of-use. There are trivial differences
programmatically on whether elements or attributes are easier to develop
and process. This is obviously language dependent and would be at the
the discretion of the application and system engineers on how they feel
personally one way or the other. To the CPE standard, this should be
irrelevant.

We as professional developers and contributors should be able to easily
read both approaches. I think this is a non-issue. However, common
end-user usage of the CPE language lends itself to elements. Someday
that may be a use-case.

DATA MODELING
A common basis for XML development states that if the data is to be
represented to the end-user than use elements. If not, and majority of
data is to be machine-readable only, than utilize attributes. This is
not a concrete decision model, but applies uniformly in most cases. The
application of an entire XML document object to a particular platform or
product is an essential part of the data representation. With this in
mind, it presents the logic that the CPE data is crucial. Principles of
XML design state that if the data should be core content then it should
be easily presented and readable by both machine and human alike.

Another basis is whether multiple whitespace-separated tokens are to be
implemented. Elemental nodes cannot contain whitespace. They must
utilize an alternative XML language structure; usually camel-casing or
an underscore replacement for whitespace. Attributes allow for various
whitespace inclusions-----pre-data, throughout and even post-data.

NATURAL LANGUAGE
The data currently provided within the CPE data dictionary is in a
natural language format. This lends itself to inclusion of the xml:lang
namespace-aware attribute so that i18 compliance may be implemented to
label the language used by CPE authors. Japanese is a natural language
that would require appropriate annotations that are generally
represented using child elements. This cannot be the case with
attributes. R-to-L languages such as Hebrew and Arabic require a
bidirectionality property. We should ensure that we do not deceptively
bottle-neck future developments by requiring only en_US locale and
language additions.

ELEMENTAL USE
If data might appear more than once in a data model, use an element
construct rather than introducing attributes with names like version1,
version2, version3....etc.

If order matters between two pieces of data, use elements for them;
because attributes are inherently unordered. The attribute ordering may
be influenced with normalization transformation application, but this
has not been addressed within the CPE community to my knowledge.

If data has, or might have, its own substructure, as in wildcards; then
it is greatly beneficial to programmatically represent within element
nodes versus attributes. The XML processing of attribute1="value*" is
going to present XML processors a huge hurdle to overcome. And we cannot
expect to require upstream toolkit developers to acquiesce to the CPE
languages improper data modeling. Furthermore, if the data is a
constituent part of a larger construct of data, then it should be placed
within an elemental node due to general data modeling logic.

ATTRIBUTE USE
I won't go into too much here due to the previous posts about
attributes. However, I will note alternate reasoning not presented
already.

The use of attributes lends itself easily to an enumerated list. By
utilizing a controlled vocabulary of allowed values, the XML authors
must conform to CPE standard representations by the XML developers. The
context of the data is most important. If the data is to be considered
metadata than it is preferable to implement within an attribute. Some
examples of this may be: reference or representation of a class or role
of the parent node, method of processing the parent nodes contents. This
lends itself to the logic that if data applied to descendant elements of
a parent node then it would/should be constructed as an attribute.

PRODUCTIVITY AND MAINTAINABILITY
Attributes are designed for expressing simple properties of the data
represented in an element. If you work against the basic architecture of
XML by shoehorning structured information into attributes you may gain
some specious terseness and convenience, but you will probably pay in
maintenance costs.

NOTE: Many of these point have been made time and time again. Many of
these points have already been discussed thoroughly on XML development
resources. I simply have paraphrased and presented what I believe the
most prevalent to the CPE communities needs.

Cheers.
Thomas

 

Reply | Threaded
Open this post in threaded view
|

Re: Naming spec issue: syntax of formatted string binding (UNCLASSIFIED)

Brant Cheikes
In reply to this post by Wolfkiel, Joseph
I'll use this part of the thread to return to the topic of binding syntax.
Let's review where the discussion stands.

As drafted, the new Naming spec defines a "formatted string" binding for CPE
that looks somewhat like a URI but isn't defined to be a URI, precisely so
we can define our own syntax rules rather than be bound by the rules of RFC
3986.  We went this route because we (thought we) heard a clear community
consensus against the URI form and its requirements for percent-encoding.
In the end, we discovered we still needed a mechanism for "escaping" a wide
variety of printable non-alphanumeric characters, as well as a way to
distinguish certain characters used as meta-characters (e.g., asterisk and
question-mark used as wildcard characters for pattern-matching) from those
same characters used as part of field values.  So we introduced a simple
escaping mechanism--a single backslash used as an escape character (as in
"\+" to embed a plus-sign in a field value).

As we continued to develop the "formatted string" binding concept, we began
questioning why we should even keep it looking like a URI at all.  We heard
concerns expressed about the use of colons as separators (for some
applications).  We heard a desire expressed that we try to use a standard
escaping mechanism, if we abandon percent encoding.  All this brought me to
propose a cosmetically different formatted string binding (which started
this thread) which looks different from a CPE 2.2 URI, but addresses all the
concerns I heard expressed AND looks more like CVE and CCE ids because it
uses a hyphen as a separator rather than a colon.

The main objection now is that this new form will demand some redevelopment.
That's a legitimate concern--though I have to admit that I have a hard time
imagining a competent programmer taking more than a day or two to make the
necessary changes.  So now we have a proposal for CPE 2.3 to retain the URI
form as the only allowed binding, with percent encoding and all that
baggage, introduce a new rule allowing asterisk and question-mark to appear
with or without percent encoding, and pack the newly-introduced attributes
into the edition field (as the spec already does when binding to a backward
compatible form).

I'm not enthusiastic about that proposal, especially after all the talk
about getting away from URIs.  I'm certainly concerned about failing to
syntactically distinguish 2.3-conformant URIs (which might have embedded
wildcards) from 2.2-conformant URIs.  Would a 2.2-conformant tool break if
it encountered a 2.3-conformant name with embedded wildcards "in the wild"?


On the other hand, I'm beginning to think that too few members of the
community have followed this rather rushed v2.3 effort closely enough for us
to make a big change in the binding.  That argues in favor of the "retain
the URI" proposal, as it runs the least risk of surprise.

As of this moment, the best solution is not fully clear to me.

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:36 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding (UNCLASSIFIED)

It's not clear.  In a formatted string, like the current URI, the component
is always present, so it's not possible to determine by whether it contains
an empty value whether it wasn't assessed, or if it was assessed and a null
value was found.  The current spec assumes it is "unknown" and should be
treated as an "any" value.  If we use an "any" tag (i.e. "*"), I suppose it
would be reasonable to treat a "::" as a null.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:32 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd also add that I'm not sure we need "-" as a special character for NA if
we're now allowing wildcards, etc.
Does the empty string "" have any logical meaning other than NA?
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
https://jeds.gds.disa.mil/jeds/searchAffiliates.action?id=313338353334333131
32

-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Friday, June 18, 2010 3:29 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

I was assuming those would be added in as reserved characters as well.
Sorry if I wasn't clear on that.  A "*" or "?" that isn't percent encoded
would be used as multiple or single wildcard values respectively and their
percent-encoded versions would be treated so that the characters would be
part of the product names.

I think this is consistent in the minor-revision trail since we added the
"-" as a reserved character in version 2.2.  Of course, it isn't really
backwards compatible with 2.2 and 2.1, but we've been using the "*" in ARF
and ASR for a while.  If we add "*" and "?", existing CPE publishers
wouldn't have to change anything--only consumers, since there weren't any
previously allowable wildcards.  I don't think there are very many existing
CPE consumers that would be impacted by this change.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Friday, June 18, 2010 2:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

If we pursue course #1, how do we deal with percent-encoding rules? How can
we allow asterisk and question-mark to be embedded as wildcards, and how do
we block their interpretation as metacharacters?

/Brant


--
Brant A. Cheikes
Cell. 617-694-8180
Sent using BlackBerry
------Original Message------
From: Wolfkiel, Joseph
To: cpe-discussion-list CPE Community Forum
ReplyTo: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding
Sent: Jun 18, 2010 1:46 PM

After thinking about this overnight, I'm suggesting we limit discussion to 2
courses of action:

 

1.        Do something that retains backward interoperability (Regular
Expression-wise) with CPE 2.1 and 2.2 or;

2.       Go to attribute-based so we can support a broader range of use
cases and leverage the flexibility provided by leveraging XML schema

 

In case 1, I think we can use the work already done by your team on defining
well-formed names and thinking through some of the advanced concepts, but at
the same time, just limit changes to extending the existing "edition" field.
If we say that CPE2.3 names will use the tilde separator in the existing
edition field, then we can build a 2.2 schema legal CPE format that looks
like the following:

 

cpe:/o:vendor:product:version:update:edition~swEdition~tgtHW~tgtSW~other:lan
guage

 

To support the new values in the edition field, we can simply deprecate the
current matching rules, and explain what values should be populated in the
new tilde-separated fields as well as how and which ones should be populated
(I think providing guidance that the "edition" component be the
concatenation of the swEdition, tgtHW, and tgtSW with spaces separating them
would be a good compromise).  We should consider deprecating existing 2.2
CPEs to the 2.3 CPE format, which would even support direct string matching.

 

In case 2, if we're going to make everyone rebuild their XML parsers, let's
fix it for real.

 

In either case, it looks like we should limit the use of the "*" symbol to
either the beginning or end of a component name.  We've been using the "*"
for over a year now and haven't discovered a compelling use case for
allowing the "*" wildcard in the middle of text.  The problems with allowing
unrestrained wildcards surfaced during the CPE discussion seem to indicate
that allowing the * in the middle of CPE component names is a bad idea.

 

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 9:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Naming spec issue: syntax of formatted
string binding

 

I am open to replacing the formatted string with a simple attribute-based
expression.  I don't think I have time to get both into the spec; it really
needs to be one or the other.  And if I spec out an attribute-based
expression, I wouldn't define it as an XML expression (with an XSD and all
that, mostly due to limited time to get one put together

------Original Message Truncated------
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (4K) Download Attachment