Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

WOLFKIEL, JOSEPH L CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...

2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.

3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.

- Joe Wolfkiel


On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:


        CPE Community,

         

        We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.

         

        In the Naming specification, there are three decisions I would appreciate quick feedback on.

         

        1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:

         

        cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

         

        It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,

         

        URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

        FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

         

        The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.

         

        2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative approac
 h (ISO 646), but am open to arguments for alternatives.

         

        3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".

         

        In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.

         

        There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.

         

        We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:

         

        cpe:/a:big%2astar:foobar:8.%0b

         

        This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.

         

        The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?

         

        Thanks,

        /Brant

         

        Brant A. Cheikes
        The MITRE Corporation
        202 Burlington Road, M/S K302
        Bedford, MA 01730-1420
        Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

         




--
Joe Wolfkiel







--
Joe Wolfkiel



Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

Thomas Jones


On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:
Classification:  UNCLASSIFIED
Caveats: NONE

1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...
 

Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide]. While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSchema) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.

The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".

One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.

This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.

The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the minimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.

A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.

A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attach a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.

For a single common style:
<cpe:child>content</cpe:child>
...
</element>
 
For greater than one style:
<cpe2:child>content</cpe2:child>
<cpe1:child>content</cpe1:child>
<cpe21:child>content</cpe21:child>
...
</element>
 
Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).
 

2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.

3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.

- Joe Wolfkiel


On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:


       CPE Community,



       We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.



       In the Naming specification, there are three decisions I would appreciate quick feedback on.



       1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:



       cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



       It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,



       URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

       FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



       The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.



       2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative approac
 h (ISO 646), but am open to arguments for alternatives.



       3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".



       In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.



       There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.



       We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:



       cpe:/a:big%2astar:foobar:8.%0b



       This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.



       The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?



       Thanks,

       /Brant



       Brant A. Cheikes
       The MITRE Corporation
       202 Burlington Road, M/S K302
       Bedford, MA 01730-1420
       Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352






--
Joe Wolfkiel







--
Joe Wolfkiel



Classification:  UNCLASSIFIED
Caveats: NONE


Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

WOLFKIEL, JOSEPH L CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

I'm trying to picture how what you're advocating would work.  For example in the NVD CPE dictionary (currently 8.3MB and not even close to complete), we'd need to introduce dedicated namespaces for each cpe "type" currently in use.  Arguably, the types include CPE 2.2, which doesn't support tildes, 2.3 URI-type, which does (and may also support wildcards), and the 2.3 structured string with escaping.  Each would get its own REGEX definition in a small schema and its own namespace, then you'd look at a prefix in the XML tag to figure out how to parse it.

Sample namespaces follow:

xmlns:cpe22="http://scap.nist.gov/schema/cpe-uri/2.2"
xmlns:cpe23u="http://scap.nist.gov/schema/cpe-uri/2.3"
xmlns:cpe23s="http://scap.nist.gov/schema/cpe-formatted_string/2.3"

Example below showing how a 2.2 name would be deprecated to a 2.3 URI-formatted name.

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" cpe22:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23u:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
  <title xml:lang="en-US">Adobe Flash Player 9.0.115.0</title>
  <meta:item-metadata deprecated-by-nvd-id="99110" modification-date="2009-03-05T12:19:19.047-05:00" status="DRAFT" nvd-id="93434" />
</cpe-item>

If this is what you're advocating, then it looks like the additional overhead is pretty much the same.  It just gets tacked onto the tag instead of the name.

Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 5:54 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


        Classification:  UNCLASSIFIED
        Caveats: NONE
       
        1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...
       

 

Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide] <http://www.ibiblio.org/hhalpin/homepage/notes/xvspaper.html#owlguide> . While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSchem
 a) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.

The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".

One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.

This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.

The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the mi
 nimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.

A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.

A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attach
  a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.

For a single common style:
<element xmlns:cpe="http://cpe.nist.gov/namespaces/2010/11">
<cpe:child>content</cpe:child>
...
</element>
 
For greater than one style:
<element xmlns:cpe1="http://cpe.nist.gov/namespaces/2010/11" xmlns:cpe2="http://cpe.nist.gov/namespaces/2011/2" xmlns:cpe21="http://cpe.nist.gov/namespaces/2011/rc1">
<cpe2:child>content</cpe2:child>
<cpe1:child>content</cpe1:child>
<cpe21:child>content</cpe21:child>
...
</element>
 
Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).
 


        2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.
       
        3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.
       
        - Joe Wolfkiel
       
       
        On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:
       
       
               CPE Community,
       
       
       
               We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.
       
       
       
               In the Naming specification, there are three decisions I would appreciate quick feedback on.
       
       
       
               1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:
       
       
       
               cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
       
       
       
               It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,
       
       
       
               URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta
       
               FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
       
       
       
               The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.
       
       
       
               2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative
 approac
         h (ISO 646), but am open to arguments for alternatives.
       
       
       
               3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".
       
       
       
               In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.
       
       
       
               There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.
       
       
       
               We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:
       
       
       
               cpe:/a:big%2astar:foobar:8.%0b
       
       
       
               This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.
       
       
       
               The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?
       
       
       
               Thanks,
       
               /Brant
       
       
       
               Brant A. Cheikes
               The MITRE Corporation
               202 Burlington Road, M/S K302
               Bedford, MA 01730-1420
               Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352
       
       
       
       
       
       
        --
        Joe Wolfkiel
       
       
       
       
       
       
       
        --
        Joe Wolfkiel
       
       
       
        Classification:  UNCLASSIFIED
        Caveats: NONE
       
       


Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

Thomas Jones


On Fri, Nov 12, 2010 at 6:34 AM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:
Classification:  UNCLASSIFIED
Caveats: NONE

I'm trying to picture how what you're advocating would work.  For example in the NVD CPE dictionary (currently 8.3MB and not even close to complete), we'd need to introduce dedicated namespaces for each cpe "type" currently in use.  Arguably, the types include CPE 2.2, which doesn't support tildes, 2.3 URI-type, which does (and may also support wildcards), and the 2.3 structured string with escaping.  Each would get its own REGEX definition in a small schema and its own namespace, then you'd look at a prefix in the XML tag to figure out how to parse it.

Sample namespaces follow:

xmlns:cpe22="http://scap.nist.gov/schema/cpe-uri/2.2"
xmlns:cpe23u="http://scap.nist.gov/schema/cpe-uri/2.3"
xmlns:cpe23s="http://scap.nist.gov/schema/cpe-formatted_string/2.3"

Example below showing how a 2.2 name would be deprecated to a 2.3 URI-formatted name.

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" cpe22:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23u:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
 <title xml:lang="en-US">Adobe Flash Player 9.0.115.0</title>
 <meta:item-metadata deprecated-by-nvd-id="99110" modification-date="2009-03-05T12:19:19.047-05:00" status="DRAFT" nvd-id="93434" />
</cpe-item>

If this is what you're advocating, then it looks like the additional overhead is pretty much the same.  It just gets tacked onto the tag instead of the name.

Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
 
Namespaces are associated with a scope. All unqualified(read as meaning not explicitly declared....not "cpe:cpe-item" structure but simply "cpe-item") children may assume the namespace of the parent; unless explicitly declared with another namespace prefix. So a simple "xmlns:cpe=..." declaration within the root element; may then immediately apply to all children of that element. So every cpe-item would then be qualified; without any need of the dictionary author. All XML parsers should be namespace aware.
 
The backend adoption of a new version is much easier to present to the standard. There is no conditional statements required except for a single function that elicits process control based on namespace. Otherwise, as the standard develops; there will eb more and more version specific functions. e.g. parse-cpe-item22, parse-cpe-item23, parse-cpe-item23a The modularity of code reuse can be utlized to further reduce development overhead for source code developed for processing CPE data.
 
Furthermore, namespace utilization provides the ability for the CPE standard to provide a more precise coverage of future changes:
<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe24:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe22:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe27a:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
 
Can you imagine the conglomeration of code required to present a deprecation history for the above item in the current standard?
 
Another advantage is that, a CPE dictionary can be compiled introducing content from around the globe with different namespace prefixes but common namespace uri's. e.g. cpe-something=http://www.nist.gov and cpe-new=http://www.nist.gov . The XML parsers will see both of these as equivalent. This alleviates the authors with regional or skill-level issues. They simply have to ensure that the namespace URI for a given version is correct...the prefix matters none at all.
 
-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 5:54 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


       Classification:  UNCLASSIFIED
       Caveats: NONE

       1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...




Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide] <http://www.ibiblio.org/hhalpin/homepage/notes/xvspaper.html#owlguide> . While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSchem
 a) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.

The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".

One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.

This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.

The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the mi
 nimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.

A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.

A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attach
 a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.

For a single common style:
<element xmlns:cpe="http://cpe.nist.gov/namespaces/2010/11">
<cpe:child>content</cpe:child>
...
</element>

For greater than one style:
<element xmlns:cpe1="http://cpe.nist.gov/namespaces/2010/11" xmlns:cpe2="http://cpe.nist.gov/namespaces/2011/2" xmlns:cpe21="http://cpe.nist.gov/namespaces/2011/rc1">
<cpe2:child>content</cpe2:child>
<cpe1:child>content</cpe1:child>
<cpe21:child>content</cpe21:child>
...
</element>

Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).



       2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.

       3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.

       - Joe Wolfkiel


       On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:


              CPE Community,



              We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.



              In the Naming specification, there are three decisions I would appreciate quick feedback on.



              1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:



              cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,



              URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

              FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.



              2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative
 approac
        h (ISO 646), but am open to arguments for alternatives.



              3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".



              In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.



              There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.



              We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:



              cpe:/a:big%2astar:foobar:8.%0b



              This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.



              The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?



              Thanks,

              /Brant



              Brant A. Cheikes
              The MITRE Corporation
              202 Burlington Road, M/S K302
              Bedford, MA 01730-1420
              Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352






       --
       Joe Wolfkiel







       --
       Joe Wolfkiel



       Classification:  UNCLASSIFIED
       Caveats: NONE




Classification:  UNCLASSIFIED
Caveats: NONE


Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

WOLFKIEL, JOSEPH L CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

** Can you imagine the conglomeration of code required to present a deprecation history for the above item in the current standard? **

Unfortunately, my imagination IS running with this.  Up until now, I've been mentally boycotting the cpe2.3 structured string format because it can't be used with the 2.2/2.3 URI-like REGEX pattern, and the DoD has already got several million dollars of infrastructure built around the 2.x URI REGEX.  However, if the "community" embraces the 2.3 structured string format, we'll have to either translate/mediate to the 2.3 URI REGEX or rebuild everything to accommodate the 2.3 structured string.  It starts to look extremely complex (AKA expensive) when you consider that existing 2.2-type CPEs may need to be deprecated to both 2.3 URI-type and 2.3 String-type names, parsers will have to be written that can seamlessly move between the 3, and "something" will have to be deployed to allow some subset of these name types to be used in the data sharing infrastructure.

Of course, to some extent, this is true whether you solve today's problem using namespaces or not.


Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 9:10 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Fri, Nov 12, 2010 at 6:34 AM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


        Classification:  UNCLASSIFIED
        Caveats: NONE
       
        I'm trying to picture how what you're advocating would work.  For example in the NVD CPE dictionary (currently 8.3MB and not even close to complete), we'd need to introduce dedicated namespaces for each cpe "type" currently in use.  Arguably, the types include CPE 2.2, which doesn't support tildes, 2.3 URI-type, which does (and may also support wildcards), and the 2.3 structured string with escaping.  Each would get its own REGEX definition in a small schema and its own namespace, then you'd look at a prefix in the XML tag to figure out how to parse it.
       
        Sample namespaces follow:
       
        xmlns:cpe22="http://scap.nist.gov/schema/cpe-uri/2.2"
        xmlns:cpe23u="http://scap.nist.gov/schema/cpe-uri/2.3"
        xmlns:cpe23s="http://scap.nist.gov/schema/cpe-formatted_string/2.3"
       
        Example below showing how a 2.2 name would be deprecated to a 2.3 URI-formatted name.
       
        <cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" cpe22:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23u:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
         <title xml:lang="en-US">Adobe Flash Player 9.0.115.0</title>
         <meta:item-metadata deprecated-by-nvd-id="99110" modification-date="2009-03-05T12:19:19.047-05:00" status="DRAFT" nvd-id="93434" />
        </cpe-item>
       
        If this is what you're advocating, then it looks like the additional overhead is pretty much the same.  It just gets tacked onto the tag instead of the name.
       
        Joseph L. Wolfkiel
        Engineering Group Lead
        DISA PEO MA/IA52
        (703) 882-0772
        [hidden email]
       

 
Namespaces are associated with a scope. All unqualified(read as meaning not explicitly declared....not "cpe:cpe-item" structure but simply "cpe-item") children may assume the namespace of the parent; unless explicitly declared with another namespace prefix. So a simple "xmlns:cpe=..." declaration within the root element; may then immediately apply to all children of that element. So every cpe-item would then be qualified; without any need of the dictionary author. All XML parsers should be namespace aware.
 
The backend adoption of a new version is much easier to present to the standard. There is no conditional statements required except for a single function that elicits process control based on namespace. Otherwise, as the standard develops; there will eb more and more version specific functions. e.g. parse-cpe-item22, parse-cpe-item23, parse-cpe-item23a The modularity of code reuse can be utlized to further reduce development overhead for source code developed for processing CPE data.
 
Furthermore, namespace utilization provides the ability for the CPE standard to provide a more precise coverage of future changes:
<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe24:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe22:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe27a:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
 
Can you imagine the conglomeration of code required to present a deprecation history for the above item in the current standard?
 
Another advantage is that, a CPE dictionary can be compiled introducing content from around the globe with different namespace prefixes but common namespace uri's. e.g. cpe-something=http://www.nist.gov <http://www.nist.gov/>  and cpe-new=http://www.nist.gov <http://www.nist.gov/>  . The XML parsers will see both of these as equivalent. This alleviates the authors with regional or skill-level issues. They simply have to ensure that the namespace URI for a given version is correct...the prefix matters none at all.
 
-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 5:54 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


       Classification:  UNCLASSIFIED
       Caveats: NONE

       1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...






        Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide] <http://www.ibiblio.org/hhalpin/homepage/notes/xvspaper.html#owlguide> . While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSche
 m
       
         a) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.
       
        The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".
       
        One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.
       
        This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.
       
        The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the m
 i
         nimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.
       
        A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.
       
        A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attac
 h
         a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.
       
        For a single common style:
        <element xmlns:cpe="http://cpe.nist.gov/namespaces/2010/11">
        <cpe:child>content</cpe:child>
        ...
        </element>
       
        For greater than one style:
        <element xmlns:cpe1="http://cpe.nist.gov/namespaces/2010/11" xmlns:cpe2="http://cpe.nist.gov/namespaces/2011/2" xmlns:cpe21="http://cpe.nist.gov/namespaces/2011/rc1">
        <cpe2:child>content</cpe2:child>
        <cpe1:child>content</cpe1:child>
        <cpe21:child>content</cpe21:child>
        ...
        </element>
       
        Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).
       
       
       
               2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.
       
               3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.
       
               - Joe Wolfkiel
       
       
               On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:
       
       
                      CPE Community,
       
       
       
                      We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.
       
       
       
                      In the Naming specification, there are three decisions I would appreciate quick feedback on.
       
       
       
                      1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:
       
       
       
                      cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
       
       
       
                      It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,
       
       
       
                      URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta
       
                      FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
       
       
       
                      The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.
       
       
       
                      2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conser
 vative
         approac
                h (ISO 646), but am open to arguments for alternatives.
       
       
       
                      3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".
       
       
       
                      In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.
       
       
       
                      There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.
       
       
       
                      We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:
       
       
       
                      cpe:/a:big%2astar:foobar:8.%0b
       
       
       
                      This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.
       
       
       
                      The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?
       
       
       
                      Thanks,
       
                      /Brant
       
       
       
                      Brant A. Cheikes
                      The MITRE Corporation
                      202 Burlington Road, M/S K302
                      Bedford, MA 01730-1420
                      Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352
       
       
       
       
       
       
               --
               Joe Wolfkiel
       
       
       
       
       
       
       
               --
               Joe Wolfkiel
       
       
       
               Classification:  UNCLASSIFIED
               Caveats: NONE
       
       
       
       
       
        Classification:  UNCLASSIFIED
        Caveats: NONE
       
       


Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

Brant Cheikes
In reply to this post by Thomas Jones

The discussion about namespaces is interesting, but strikes me as off-topic given my original question.  My question related to a decision about the syntax of the CPE 2.3 formatted-string name.  The question deliberately made no assumptions about contexts in which such a name might appear.  The objective is to design the name syntax such that a name consumer can easily figure out what parsing and interpretation rules apply.

 

Thus far, I see arguments for incorporating CPE version information explicitly into the name string.  I also see arguments for keeping the “system” portion consistent as “cpe:”.  So the trend seems to be towards the syntax illustrated in this example:

 

FS:  cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

The benefit here is that, should an eventual CPE 3.x choose to retain a formatted string binding, we will have established a reasonable precedent for distinguishing different flavors of name strings.

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 9:10 AM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

 

 

On Fri, Nov 12, 2010 at 6:34 AM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:

Classification:  UNCLASSIFIED
Caveats: NONE

I'm trying to picture how what you're advocating would work.  For example in the NVD CPE dictionary (currently 8.3MB and not even close to complete), we'd need to introduce dedicated namespaces for each cpe "type" currently in use.  Arguably, the types include CPE 2.2, which doesn't support tildes, 2.3 URI-type, which does (and may also support wildcards), and the 2.3 structured string with escaping.  Each would get its own REGEX definition in a small schema and its own namespace, then you'd look at a prefix in the XML tag to figure out how to parse it.

Sample namespaces follow:

xmlns:cpe22="http://scap.nist.gov/schema/cpe-uri/2.2"
xmlns:cpe23u="http://scap.nist.gov/schema/cpe-uri/2.3"
xmlns:cpe23s="http://scap.nist.gov/schema/cpe-formatted_string/2.3"

Example below showing how a 2.2 name would be deprecated to a 2.3 URI-formatted name.

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" cpe22:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23u:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
 <title xml:lang="en-US">Adobe Flash Player 9.0.115.0</title>
 <meta:item-metadata deprecated-by-nvd-id="99110" modification-date="2009-03-05T12:19:19.047-05:00" status="DRAFT" nvd-id="93434" />
</cpe-item>

If this is what you're advocating, then it looks like the additional overhead is pretty much the same.  It just gets tacked onto the tag instead of the name.

Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]

 

Namespaces are associated with a scope. All unqualified(read as meaning not explicitly declared....not "cpe:cpe-item" structure but simply "cpe-item") children may assume the namespace of the parent; unless explicitly declared with another namespace prefix. So a simple "xmlns:cpe=..." declaration within the root element; may then immediately apply to all children of that element. So every cpe-item would then be qualified; without any need of the dictionary author. All XML parsers should be namespace aware.

 

The backend adoption of a new version is much easier to present to the standard. There is no conditional statements required except for a single function that elicits process control based on namespace. Otherwise, as the standard develops; there will eb more and more version specific functions. e.g. parse-cpe-item22, parse-cpe-item23, parse-cpe-item23a The modularity of code reuse can be utlized to further reduce development overhead for source code developed for processing CPE data.

 

Furthermore, namespace utilization provides the ability for the CPE standard to provide a more precise coverage of future changes:

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe24:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe22:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe27a:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">

 

Can you imagine the conglomeration of code required to present a deprecation history for the above item in the current standard?

 

Another advantage is that, a CPE dictionary can be compiled introducing content from around the globe with different namespace prefixes but common namespace uri's. e.g. cpe-something=http://www.nist.gov and cpe-new=http://www.nist.gov . The XML parsers will see both of these as equivalent. This alleviates the authors with regional or skill-level issues. They simply have to ensure that the namespace URI for a given version is correct...the prefix matters none at all.

 

-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 5:54 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


       Classification:  UNCLASSIFIED
       Caveats: NONE

       1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...



Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide] <http://www.ibiblio.org/hhalpin/homepage/notes/xvspaper.html#owlguide> . While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSchem

 a) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.

The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".

One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.

This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.

The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the mi
 nimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.

A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.

A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attach
 a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.

For a single common style:
<element xmlns:cpe="http://cpe.nist.gov/namespaces/2010/11">
<cpe:child>content</cpe:child>
...
</element>

For greater than one style:
<element xmlns:cpe1="http://cpe.nist.gov/namespaces/2010/11" xmlns:cpe2="http://cpe.nist.gov/namespaces/2011/2" xmlns:cpe21="http://cpe.nist.gov/namespaces/2011/rc1">
<cpe2:child>content</cpe2:child>
<cpe1:child>content</cpe1:child>
<cpe21:child>content</cpe21:child>
...
</element>

Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).



       2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.

       3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.

       - Joe Wolfkiel


       On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:


              CPE Community,



              We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.



              In the Naming specification, there are three decisions I would appreciate quick feedback on.



              1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:



              cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,



              URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

              FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.



              2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative
 approac
        h (ISO 646), but am open to arguments for alternatives.



              3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".



              In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.



              There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.



              We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:



              cpe:/a:big%2astar:foobar:8.%0b



              This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.



              The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?



              Thanks,

              /Brant



              Brant A. Cheikes
              The MITRE Corporation
              202 Burlington Road, M/S K302
              Bedford, MA 01730-1420
              Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352






       --
       Joe Wolfkiel







       --
       Joe Wolfkiel



       Classification:  UNCLASSIFIED
       Caveats: NONE



Classification:  UNCLASSIFIED
Caveats: NONE

 


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

Thomas Jones
The intent of my proposal is that by utilizing XML namespaces; the CPE standard can simply utilize a common system name, such as cpe:// and the structure(and subsequently the parsing and interpretation rules) of the other parts are determined according to namespaces declaration for that object.

Namespaces being the key that provided determination how to process the element or attributes content.
 
Sent from my iPhone

On Nov 12, 2010, at 9:16 AM, "Cheikes, Brant A." <[hidden email]> wrote:

The discussion about namespaces is interesting, but strikes me as off-topic given my original question.  My question related to a decision about the syntax of the CPE 2.3 formatted-string name.  The question deliberately made no assumptions about contexts in which such a name might appear.  The objective is to design the name syntax such that a name consumer can easily figure out what parsing and interpretation rules apply.

 

Thus far, I see arguments for incorporating CPE version information explicitly into the name string.  I also see arguments for keeping the “system” portion consistent as “cpe:”.  So the trend seems to be towards the syntax illustrated in this example:

 

FS:  cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

The benefit here is that, should an eventual CPE 3.x choose to retain a formatted string binding, we will have established a reasonable precedent for distinguishing different flavors of name strings.

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 9:10 AM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)

 

 

On Fri, Nov 12, 2010 at 6:34 AM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:

Classification:  UNCLASSIFIED
Caveats: NONE

I'm trying to picture how what you're advocating would work.  For example in the NVD CPE dictionary (currently 8.3MB and not even close to complete), we'd need to introduce dedicated namespaces for each cpe "type" currently in use.  Arguably, the types include CPE 2.2, which doesn't support tildes, 2.3 URI-type, which does (and may also support wildcards), and the 2.3 structured string with escaping.  Each would get its own REGEX definition in a small schema and its own namespace, then you'd look at a prefix in the XML tag to figure out how to parse it.

Sample namespaces follow:

xmlns:cpe22="http://scap.nist.gov/schema/cpe-uri/2.2"
xmlns:cpe23u="http://scap.nist.gov/schema/cpe-uri/2.3"
xmlns:cpe23s="http://scap.nist.gov/schema/cpe-formatted_string/2.3"

Example below showing how a 2.2 name would be deprecated to a 2.3 URI-formatted name.

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" cpe22:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23u:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">
 <title xml:lang="en-US">Adobe Flash Player 9.0.115.0</title>
 <meta:item-metadata deprecated-by-nvd-id="99110" modification-date="2009-03-05T12:19:19.047-05:00" status="DRAFT" nvd-id="93434" />
</cpe-item>

If this is what you're advocating, then it looks like the additional overhead is pretty much the same.  It just gets tacked onto the tag instead of the name.

Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]

 

Namespaces are associated with a scope. All unqualified(read as meaning not explicitly declared....not "cpe:cpe-item" structure but simply "cpe-item") children may assume the namespace of the parent; unless explicitly declared with another namespace prefix. So a simple "xmlns:cpe=..." declaration within the root element; may then immediately apply to all children of that element. So every cpe-item would then be qualified; without any need of the dictionary author. All XML parsers should be namespace aware.

 

The backend adoption of a new version is much easier to present to the standard. There is no conditional statements required except for a single function that elicits process control based on namespace. Otherwise, as the standard develops; there will eb more and more version specific functions. e.g. parse-cpe-item22, parse-cpe-item23, parse-cpe-item23a The modularity of code reuse can be utlized to further reduce development overhead for source code developed for processing CPE data.

 

Furthermore, namespace utilization provides the ability for the CPE standard to provide a more precise coverage of future changes:

<cpe-item deprecated="true" deprecation_date="2009-03-05T12:19:19.047-05:00" name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe23:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" cpe24:name="cpe:/a:adobe:flash_playe_for_linux:9.0.115.0" deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe22:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~" cpe27a:deprecated_by="cpe:/a:adobe:flash_player:9.0.115.0::~~~~linux~">

 

Can you imagine the conglomeration of code required to present a deprecation history for the above item in the current standard?

 

Another advantage is that, a CPE dictionary can be compiled introducing content from around the globe with different namespace prefixes but common namespace uri's. e.g. cpe-something=http://www.nist.gov and cpe-new=http://www.nist.gov . The XML parsers will see both of these as equivalent. This alleviates the authors with regional or skill-level issues. They simply have to ensure that the namespace URI for a given version is correct...the prefix matters none at all.

 

-----Original Message-----
From: Thomas Jones [mailto:[hidden email]]
Sent: Friday, November 12, 2010 5:54 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification (UNCLASSIFIED)



On Thu, Nov 11, 2010 at 12:02 PM, WOLFKIEL, JOSEPH L CIV DISA PEO-MA <[hidden email]> wrote:


       Classification:  UNCLASSIFIED
       Caveats: NONE

       1.  There does need to be some way to tell a 2.3 CPE from a pre-2.3 or 2.3 URI-bound CPE.  I think, implementation-wise, that trying to use namespaces will make a documents, such as the NIST CPE Dictionary, much too big.  As I play with parsing routines to deal with 2.2 and 2.3 CPEs, it looks like the business logic will be different, so you need to know early in the parsing process what you're dealing with.  However, unless we really do get free from the formatted string/URI-type CPE and go with attribute-based names, we may ultimately need to be able to differentiate between 2.2, 2.3, and later versions of CPEs, so I suppose I would support building the version number into the CPE prefix.   Seems like separating the system and system version with a well-understood separator would also be a good idea.  Maybe system:system_version:datafield:datafield:datafield...?  So you'd have cpe:2.3:a:vendor:product...



Versioning is important: Different versions of a language may specify different application semantics. In practice, there are two general ways to do versioning in XML languages in a given document. The first is to mimic XML itself and use a version attribute on root or arbitrary elements, and the other is to provide a more rich mechanism with links to specify the previous versions. This rich approach is exemplified by the mechanism provided by OWL ontologies to specify prior versions (using the priorVersion predicate) and can specify backwards compatibility, incompatibility, and deprecated classes and properties as well. Note that in OWL, "if owl:backwardCompatibleWith is not declared, then compatibility should not be assumed" [OWL Guide] <http://www.ibiblio.org/hhalpin/homepage/notes/xvspaper.html#owlguide> . While RDDL provides a prior-version purpose, it does not let one specify versions in detail. For example, the nature URI for XML Schema (http://www.w3.org/2001/XMLSchem

 a) does not distinguish if version 1.0 or 1.1 of XML Schema is being used. In fact, neither does the namespace document of XML Schema, as it has as related resources only 1.0 2nd Edition normative references.

The approach of using the value of the version attribute in the root element can become problematic. What about the case in which one wants to use names from two versions of a language that use the same namespace? Should one qualify both elements with differing version attributes? One could specify that every version has its own URI, but this is often not the case, and often minor revisions may want to use the same namespace, and only use namespaces for major revisions. A case example that has attracted attention in the Web Services community, applications may want to revert to a previous version of a language if the they do not have a relevant schema or other resource to process the newest version of the language, even if the document specifies that the processor should use the latest version, in order to "scrape some information out".

One would hope you can just put the version number in the URI, perhaps by writing the year of the specification in the namespace URI. This is done by the W3C in the namespace of XHTML: http://www.w3.org/1999/xhtml. Regardless, the approach of trying to throw all the relevant versioning information in the URI does not solve the problem cleanly. This approach violates the rule of URI Opacity given by the W3C TAG: "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource". The problem of figuring out more information about a language can be solved more easily by letting the URI dereference a namespace document that provides such information.

This statement from w3c shows that CPE is currently in non-compliance with international standards regarding the structure of the CPE URI Grammar. Much of the current implementation in fact does infer properties regarding the CPE standard.

The minimalist reading of namespaces states that anyone can mint a new name by just adding a local name in a namespace in a document they produce. The power of defining the "names in a namespace" is in the hands of the user, not the owner of the namespace URI. Against the intuitions of many people unfamiliar with XML, any namespace sets absolutely no constraints on thenumber and kinds of names in a language. XML parsers do not attempt to retrieve anything at all from a namespace URI. The number of distinct local names that may be attached to a single namespace is infinite. This is the reading sanctioned the XML Namespaces specification. As noted by Henry Thompson, "The minimalist reading is the only one consistent with actual usage -- people mint new namespaces by simply using them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set". While there has been plenty of vigorous debate about namespaces, the mi
 nimalist interpretation is widespread precisely because there is no alternative. This interpretation does not manage the versioning of XML languages or take advantage of the use of namespace documents to check the correctness of the name use in a document.

A maximalist reading of namespaces would states that there is some finite number of names in a language and local names in a namespace with some standard usage. The number of names in a namespace is defined by the owner of the namespace URI as opposed to any user. Furthermore, a true maximalist would prefer each expanded name in a namespace should expand to a unique URI that denotes a secondary resource using some construction rule. Since a URI has a distinct owner, the owner would be the final arbitrator of the language, and as such any non-standard usage of the language, such as minting a new expanded name that wasn't given in the namespace document (or was a valid secondary resource of the namespace URI) would be wrong. While this is obviously very restrictive, it is more or less how names of core constructs work in programming languages. However, this is too stringent and incompatible with most existing work.

A balance can be struck between the maximalist and minimalist readings, creating a pragmatic reading of namespaces that gives the owner of XML languages a way of expressing more information about their language in the namespace document. This would give the user more options by allowing them to discover exactly how the owner of the language wants the language to be used. The owner should choose if they prefer a maximalist, minimalist, or some moderate reading of their namespace. The user does not have to compelled to follow the owner's guidelines, but can at least be aware of them if they so wish. So namespace documents should state whether or not the space of possible names is delimited and whether or not every name in the namespace has a unique expanded that name that maps to a URI. A namespace document should state the version of a language, and keep track of version changes over time. For a particular name, it should state what version ranges it can be used in, and attach
 a human readable description to each name. This would give the user some advantages in exchange for letting them restrict themselves. For example, instead of having to worry about documenting their usage of particular names, by sticking to the namespace document given by the owner of the namespace URI, a user could pass around XML documents to other applications and know that if those applications were not sure how to process a given name, the application could get a namespace document that would tell them how.

For a single common style:
<element xmlns:cpe="http://cpe.nist.gov/namespaces/2010/11">
<cpe:child>content</cpe:child>
...
</element>

For greater than one style:
<element xmlns:cpe1="http://cpe.nist.gov/namespaces/2010/11" xmlns:cpe2="http://cpe.nist.gov/namespaces/2011/2" xmlns:cpe21="http://cpe.nist.gov/namespaces/2011/rc1">
<cpe2:child>content</cpe2:child>
<cpe1:child>content</cpe1:child>
<cpe21:child>content</cpe21:child>
...
</element>

Each namespace declaration is presented once only, and used throughout the document object. Also, as an added benefit, the namespace use allows for the CPE content to be processed within other XML standards and provides for immediate recognition and removal of elemental collisions( even the inclusion of CPE within another XML language as an attribute may be namespace aware).



       2.  I agree with the ISO 646 choice.  To be postured to go international, we'd want to be able to support a more international character set, but then the rules for legal/encoded characters would get correspondingly more complex.  It'd be nice to put that off to CPE 3.x.

       3.  I like having the wildcard characters.  Not sure it really matters which ones we use.  I also note that many of the CPEs in the current NVD dictionary are abstractions and could really use the "*"-equivalent wildcard symbol at the end of the version number to allow people/tools to understand that they are really intended to cover all releases/builds or software with those first version fields.

       - Joe Wolfkiel


       On Wed, Nov 10, 2010 at 6:36 PM, Cheikes, Brant A. <[hidden email]> wrote:


              CPE Community,



              We're preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.



              In the Naming specification, there are three decisions I would appreciate quick feedback on.



              1. The Naming spec supports both a v2.2-style URI binding and a new "formatted string" (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with "cpe23:", but also drops the URI slash, as in:



              cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I'm considering using "cpe:" as the prefix for both forms, e.g.,



              URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

              FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*



              The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don't really see any disadvantages, but I thought I'd put the question to the community.



              2. The draft Naming spec requires that "Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968)."  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 ("ISO 7-bit coded character set for information interchange").  That's probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven't explored at all well within CPE to date.  At this point I'm leaning towards the conservative
 approac
        h (ISO 646), but am open to arguments for alternatives.



              3. A feature of the v2.3 approach is that we've added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the "?" is the single-character wildcard, and the "*" is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there's no obvious way to distinguish whether they are "escaped" or "unescaped".



              In the spec as currently written, we simply lose the distinction in the URI-unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn't support wildcards in matching.  However, there's a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.



              There seems to be a technical solution in the offing that would help these users, and would also eliminate the "lossiness" when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.



              We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, "%09" (horizontal tab) and "%0b" (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:



              cpe:/a:big%2astar:foobar:8.%0b



              This names a product from vendor "big*star" (where an asterisk x2A is embedded in the name), with product name "foobar", and whose version is "8.*"-but here the asterisk actually represents the multi-character wildcard.



              The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?



              Thanks,

              /Brant



              Brant A. Cheikes
              The MITRE Corporation
              202 Burlington Road, M/S K302
              Bedford, MA 01730-1420
              Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352






       --
       Joe Wolfkiel







       --
       Joe Wolfkiel



       Classification:  UNCLASSIFIED
       Caveats: NONE



Classification:  UNCLASSIFIED
Caveats: NONE