CPE substitution table (UNCLASSIFIED)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

CPE substitution table (UNCLASSIFIED)

WOLFKIEL, JOSEPH L CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

All,

I spent some time this weekend working on how you would go about converting application names found on a native OS into a CPE_2.2-compliant format so we can use the CPE RegEx to transmit them.  Putting vendor, product, and version info in the correct fields was relatively easy, but converting the component text to CPE-"legal" characters was something of a challenge.  As an end result, I put together the attached lookup table to cover all ASCII characters.

It was somewhat challenging since I can't guarantee that non-printing characters won't show up registry/RPM or equivalent entries.  I'm also not clear on what percent-encoding should be for something like Japanese/Chinese characters.

I also noted that there were several "N/A"s in the registries of my computer at home, so you would have to go through first and translate those all into "-" symbols before doing the character-by-character conversion.

If anyone has gone through the effort of building a lookup table like the attached (or has another, better, method), I'm interested in feedback/corrections.

I'm working in Python, so the escaping for hexadecimal is \xHH, with the HH being the two-digit hex characters.


Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
    Classification:  UNCLASSIFIED
Caveats: NONE


cpe subs table1.csv (1K) Download Attachment
Percent Encoding Lookup Chart.docx (19K) Download Attachment
smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE substitution table (UNCLASSIFIED)

Brant Cheikes
This is a timely message.  We're gearing up to finalize the v2.3 draft
specifications, and I've noted that the v2.3 naming specification does not
currently place any limits on the use of percent encoding in the URI
binding.  This permits, e.g., percent-encoding of non-printable characters.
My preliminary response was to enumerate explicitly all the allowed percent
encodings.  I rather like the comprehensive tabular form you've presented
here; it's complete and unambiguous.  I could see incorporating it directly
into the naming specification.

I hadn't considered all the characters beyond the tilde (x7e), and by
default would have mapped them all to, e.g., underscore (x5f).  Now that I
look at the complete table, I don't see a good reason to be so restrictive.
So unless I hear good counter-arguments, I'm inclined to permit percent
encoding of printable characters beyond the tilde.

What is the rationale for dropping (mapping to "") all the non-printables
(e.g., x00 thru x20, x7f, x81, etc.)?  Would it be better to, say, map them
to underscore rather than just dropping them?  This way we at least leave a
trace of those characters in the resulting name.

Thanks,
/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

-----Original Message-----
From: WOLFKIEL, JOSEPH L CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Monday, October 25, 2010 7:24 AM
To: cpe-discussion-list CPE Community Forum
Subject: [CPE-DISCUSSION-LIST] CPE substitution table (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

All,

I spent some time this weekend working on how you would go about converting
application names found on a native OS into a CPE_2.2-compliant format so we
can use the CPE RegEx to transmit them.  Putting vendor, product, and
version info in the correct fields was relatively easy, but converting the
component text to CPE-"legal" characters was something of a challenge.  As
an end result, I put together the attached lookup table to cover all ASCII
characters.

It was somewhat challenging since I can't guarantee that non-printing
characters won't show up registry/RPM or equivalent entries.  I'm also not
clear on what percent-encoding should be for something like Japanese/Chinese
characters.

I also noted that there were several "N/A"s in the registries of my computer
at home, so you would have to go through first and translate those all into
"-" symbols before doing the character-by-character conversion.

If anyone has gone through the effort of building a lookup table like the
attached (or has another, better, method), I'm interested in
feedback/corrections.

I'm working in Python, so the escaping for hexadecimal is \xHH, with the HH
being the two-digit hex characters.


Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
    Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE substitution table (UNCLASSIFIED)

WOLFKIEL, JOSEPH L CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

The rationale for dropping the non-printables is pretty much a normal security-related paranoia.  I can't think of any good reason for someone to put non-printing characters into a product name.  It seems safest to me to delete them out of hand.

I also think that, since we've already mapped " " and "_" to the underscore that mapping more characters to it will make it less semantically meaningful.


Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Monday, October 25, 2010 9:25 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE substitution table (UNCLASSIFIED)

This is a timely message.  We're gearing up to finalize the v2.3 draft
specifications, and I've noted that the v2.3 naming specification does not
currently place any limits on the use of percent encoding in the URI
binding.  This permits, e.g., percent-encoding of non-printable characters.
My preliminary response was to enumerate explicitly all the allowed percent
encodings.  I rather like the comprehensive tabular form you've presented
here; it's complete and unambiguous.  I could see incorporating it directly
into the naming specification.

I hadn't considered all the characters beyond the tilde (x7e), and by
default would have mapped them all to, e.g., underscore (x5f).  Now that I
look at the complete table, I don't see a good reason to be so restrictive.
So unless I hear good counter-arguments, I'm inclined to permit percent
encoding of printable characters beyond the tilde.

What is the rationale for dropping (mapping to "") all the non-printables
(e.g., x00 thru x20, x7f, x81, etc.)?  Would it be better to, say, map them
to underscore rather than just dropping them?  This way we at least leave a
trace of those characters in the resulting name.

Thanks,
/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

-----Original Message-----
From: WOLFKIEL, JOSEPH L CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Monday, October 25, 2010 7:24 AM
To: cpe-discussion-list CPE Community Forum
Subject: [CPE-DISCUSSION-LIST] CPE substitution table (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

All,

I spent some time this weekend working on how you would go about converting
application names found on a native OS into a CPE_2.2-compliant format so we
can use the CPE RegEx to transmit them.  Putting vendor, product, and
version info in the correct fields was relatively easy, but converting the
component text to CPE-"legal" characters was something of a challenge.  As
an end result, I put together the attached lookup table to cover all ASCII
characters.

It was somewhat challenging since I can't guarantee that non-printing
characters won't show up registry/RPM or equivalent entries.  I'm also not
clear on what percent-encoding should be for something like Japanese/Chinese
characters.

I also noted that there were several "N/A"s in the registries of my computer
at home, so you would have to go through first and translate those all into
"-" symbols before doing the character-by-character conversion.

If anyone has gone through the effort of building a lookup table like the
attached (or has another, better, method), I'm interested in
feedback/corrections.

I'm working in Python, so the escaping for hexadecimal is \xHH, with the HH
being the two-digit hex characters.


Joseph L. Wolfkiel
Engineering Group Lead
DISA PEO MA/IA52
(703) 882-0772
[hidden email]
    Classification:  UNCLASSIFIED
Caveats: NONE

Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (7K) Download Attachment