Questions re finalizing CPE v2.3 Naming specification

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions re finalizing CPE v2.3 Naming specification

Brant Cheikes

CPE Community,

 

We’re preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.

 

In the Naming specification, there are three decisions I would appreciate quick feedback on.

 

1. The Naming spec supports both a v2.2-style URI binding and a new “formatted string” (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with “cpe23:”, but also drops the URI slash, as in:

 

cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I’m considering using “cpe:” as the prefix for both forms, e.g.,

 

URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don’t really see any disadvantages, but I thought I’d put the question to the community.

 

2. The draft Naming spec requires that “Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968).”  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 (“ISO 7-bit coded character set for information interchange”).  That’s probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven’t explored at all well within CPE to date.  At this point I’m leaning towards the conservative approach (ISO 646), but am open to arguments for alternatives.

 

3. A feature of the v2.3 approach is that we’ve added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the “?” is the single-character wildcard, and the “*” is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there’s no obvious way to distinguish whether they are “escaped” or “unescaped”.

 

In the spec as currently written, we simply lose the distinction in the URI—unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn’t support wildcards in matching.  However, there’s a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.

 

There seems to be a technical solution in the offing that would help these users, and would also eliminate the “lossiness” when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.

 

We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, “%09” (horizontal tab) and “%0b” (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:

 

cpe:/a:big%2astar:foobar:8.%0b

 

This names a product from vendor “big*star” (where an asterisk x2A is embedded in the name), with product name “foobar”, and whose version is “8.*”—but here the asterisk actually represents the multi-character wildcard.

 

The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?

 

Thanks,

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification

Thomas Jones
Notes inline.

Sent from my iPhone

On Nov 10, 2010, at 5:36 PM, "Cheikes, Brant A." <[hidden email]> wrote:

CPE Community,

 

We’re preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.

 

In the Naming specification, there are three decisions I would appreciate quick feedback on.

 

1. The Naming spec supports both a v2.2-style URI binding and a new “formatted string” (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with “cpe23:”, but also drops the URI slash, as in:

 

cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I’m considering using “cpe:” as the prefix for both forms, e.g.,

 

URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don’t really see any disadvantages, but I thought I’d put the question to the community.

I suggest utilizing the existing XML technology of namespaces. It would easily accomodate both forms as well as future versions. Provides for version specific validation; within a single XML data object. And provides CPE parsers the ability to quickly AND RELIABLY determine the cpe format structure. 

 

2. The draft Naming spec requires that “Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968).”  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 (“ISO 7-bit coded character set for information interchange”).  That’s probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven’t explored at all well within CPE to date.  At this point I’m leaning towards the conservative approach (ISO 646), but am open to arguments for alternatives.

I would suggest full ISO 8859-1 implementation. The entire XML community has researched and developed around international contributions. Why not leverage all their great work for our own purposes?Any i18 issues would surely be be noticed by the thousands of developers. As well, in case a problem arises; then the i18 community would determine and adopt a fix. We just need to follow the w3c standards. 

Cheers. Thomas

 

3. A feature of the v2.3 approach is that we’ve added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the “?” is the single-character wildcard, and the “*” is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there’s no obvious way to distinguish whether they are “escaped” or “unescaped”.

 

In the spec as currently written, we simply lose the distinction in the URI—unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn’t support wildcards in matching.  However, there’s a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.

 

There seems to be a technical solution in the offing that would help these users, and would also eliminate the “lossiness” when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.

 

We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, “%09” (horizontal tab) and “%0b” (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:

 

cpe:/a:big%2astar:foobar:8.%0b

 

This names a product from vendor “big*star” (where an asterisk x2A is embedded in the name), with product name “foobar”, and whose version is “8.*”—but here the asterisk actually represents the multi-character wildcard.

 

The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?

 

Thanks,

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

Reply | Threaded
Open this post in threaded view
|

Re: Questions re finalizing CPE v2.3 Naming specification

Brant Cheikes
In reply to this post by Brant Cheikes

After some more research on character sets, it turns out that ISO 646 wouldn’t be a good choice, since it leaves out a lot of important characters, including the tilde—which the new “packing” scheme depends on.  Instead, the choice would be to specify “printable UTF-8 characters between x00 and x7F”.

 

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 

From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Wednesday, November 10, 2010 6:36 PM
To: cpe-discussion-list CPE Community Forum
Subject: [CPE-DISCUSSION-LIST] Questions re finalizing CPE v2.3 Naming specification

 

CPE Community,

 

We’re preparing final drafts of the CPE v2.3 specification suite.  We expect them to become part of SCAP 1.2.

 

In the Naming specification, there are three decisions I would appreciate quick feedback on.

 

1. The Naming spec supports both a v2.2-style URI binding and a new “formatted string” (FS) binding.  We need a syntactic way to easily distinguish the two.  At present, the spec has the FS binding prefixed with “cpe23:”, but also drops the URI slash, as in:

 

cpe23:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

It seems like the presence or absence of the URI slash could be sufficient to distinguish the two forms, so I’m considering using “cpe:” as the prefix for both forms, e.g.,

 

URI: cpe:/a:microsoft:internet_explorer:8.0.6001:beta

FS:  cpe:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*

 

The advantage of the proposal is that the two naming forms are (subtly but reliably) distinguishable, and have a consistent prefix which avoids an ugly version-specific precedent.  I don’t really see any disadvantages, but I thought I’d put the question to the community.

 

2. The draft Naming spec requires that “Value strings assigned to attributes of WFNs SHALL be non-empty contiguous strings of bytes encoded using the American Standard Code for Information Interchange (US-ASCII, also known as ANSI_X3.4-1968).”  The v2.2 specification was silent on the subject of character sets.  We have an opportunity to be clear and precise as we go forward with 2.3 and beyond.  At the very least, we ought to use a more up-to-date reference, such as ISO/IEC 646 (“ISO 7-bit coded character set for information interchange”).  That’s probably the easiest thing to do at this point.  Is there any reason we should consider something broader, e.g., ISO/IEC 8859-1 (Latin-1)?  That would admit a wider but still limited range of product/vendor/etc. names expressed in languages other than English.  But it raises the risk of starting us down an I18N path that we haven’t explored at all well within CPE to date.  At this point I’m leaning towards the conservative approach (ISO 646), but am open to arguments for alternatives.

 

3. A feature of the v2.3 approach is that we’ve added support for single- and multi-character wildcards.  In the well-formed name and in the FS binding, the “?” is the single-character wildcard, and the “*” is the multi-character wildcard.  The specification allows these characters to be escaped with a backslash when they are to be interpreted as regular characters.  The FS binding allows these characters to appear directly, with or without escaping.  The problem is that in the URI form, these characters are required to be percent encoded, so there’s no obvious way to distinguish whether they are “escaped” or “unescaped”.

 

In the spec as currently written, we simply lose the distinction in the URI—unescaped wildcards are dropped.  The argument is that, the URI exists only for backward compatibility with v2.2, which doesn’t support wildcards in matching.  However, there’s a community of CPE users which has invested in code to process URIs, but would still like the ability to express wildcards in the URI format, even if that requires a proprietary matching algorithm.

 

There seems to be a technical solution in the offing that would help these users, and would also eliminate the “lossiness” when binding names that contain wildcards to URIs.  The proposed solution is to map the wildcards to otherwise-prohibited percent-encoding forms.  Recall that, in general, all characters in CPE names must be printable alphanumeric.  Whitespace and all non-printing characters are prohibited, and almost all punctuation/special characters must be percent-encoded when embedded in CPE name components.

 

We could modify the spec to permit two otherwise prohibited percent-encoding forms to be used in URIs, and to represent the unescaped single- and multi-character wildcards.  For example, we might reserve, say, “%09” (horizontal tab) and “%0b” (vertical tab), as the single- and multi-character wildcards, respectively, in URI forms.  The choice is relatively arbitrary.  If we did this, we could allow CPE names such as:

 

cpe:/a:big%2astar:foobar:8.%0b

 

This names a product from vendor “big*star” (where an asterisk x2A is embedded in the name), with product name “foobar”, and whose version is “8.*”—but here the asterisk actually represents the multi-character wildcard.

 

The advantage of this technique is that we can now fully translate URIs to formatted strings and vice versa, with no loss in meaning or capability.  Of course, only CPE v2.3 conformant implementations would be able to fully consume and process these forms; CPE v2.2 conformant implementations would be unaffected.  The disadvantage is that there is a potential new requirement to special-case the percent-decoding of the two re-purposed forms.  Thoughts?

 

Thanks,

/Brant

 

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

 


smime.p7s (4K) Download Attachment