CPE 2.3 Matching (UNCLASSIFIED)

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

CPE 2.3 Matching (UNCLASSIFIED)

Moore, Scott D CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

I'd like to restart the discussion we had during Developer Days about CPE matching in 2.3. I've put a list of specific suggestions at the bottom of this email.

Definitions
-----------
To facilitate this discussion, I'm going to define a few terms here. I'm not suggesting these terms become part of the standard or anything (though they seem to be useful notions).

Lets define a "Fully-Qualified CPE Name" or "FQCPE" as a non-abstract CPE Name; that is, every attribute has either the logical value NA or a sequence of literals with no wildcards. (i.e., a CPE that is actually a unique identifier) (we'll see later that NA could actually be a degenerate case of an attribute with no wildcards)

As the proposal for CPE 2.3 does, we'll then consider that any general CPE name (including the FQCPEs) defines a set of FQCPEs (where an FQCPE defines the singleton set containing itself).


Matching FQCPEs
---------------
Matching FQCPEs is rather straightfoward: an FQCPE matches if and only if two FQCPEs in the same binding lexically match, or alternatively, if every attribute lexically matches between the two FQCPEs in canonical form.


Matching general CPEs that contain ANY but no wildcards
-------------------------------------------------------
To simplify things, I'll start the discussion of matching general CPE names by excluding wildcards, and just considering cases where an attribute (or more) has the logical value ANY (we'll see later this is actually a degenerate case of an attribute with wildcards).

Since we're now talking about general CPE names, we're really talking about matching sets. It seems most useful to expand the notion of match into set relationships (as the current proposal does)--in particular, we're considering the results of intersecting the two sets represented by two CPE names.

The possible results of intersecting two sets are:
the source is EQUAL to the target set
the source is a strict SUBSET of the target set
the source is a strict SUPERSET of the target set
the source and the target set have NO OVERLAP
the source and the target INTERSECT and define a smaller set

Note that I've split the NO OVERLAP/NOT EQUAL case into two cases. This better reflects the semantics and is important to many use cases. For example, it supports the use case of deriving a "best guess" from two guesses provided by different sensors about the same platform. Ie. one can determine 64-bit and the other can determine "professional", so the best guess is 64-bit professional.

This also makes the semantics of "NO OVERLAP/NOT EQUAL" make sense in the context of sets. I suppose if we want to continue ignoring the notion of a non-empty intersection, we could at least call it "NO MATCH".

The "matchness" of each of these cases really then depends on the use case.
For the 2.2-style definition of "matching", you could define EQUAL and either SUBSET or SUPERSET (depending on which you choose to be the source vs target) to mean a match, and the rest to be non-matching.
For other cases, other definitions of matching makes sense... so it may make sense to push "matchness" up to the stack.


Matching general CPEs that contain wildcards
--------------------------------------------
So, what about wildcards?

Adding wildcards doesn't change the possible results of our set intersections, since we're still talking about regular languages of sets.
What it does do is make it more difficult (and somewhat intractable in the general sense) to compute the intersection of an attribute's regular expressions. I won't get into the details here, but you can look up intersecting two finite automata elsewhere.

What are the possible ways to eliminate this problem? Two were presented at Developer Days:
1) Give an error if the regexes for an attribute both contain wildcards
2) If both contain wildcards, do a strict lexical match

Both of these seem very unsatisfactory because they don't match the intuitive semantics of regular expressions/sets. For example, neither fix supports the  use case of recognizing that 9.* should match 9.1.*.

Instead, I suggest that we restrict multi-character wildcards ("*") from occuring anywhere except the end of an attribute. It seems very hard to think of a usecase for wildcards earlier in an attribute, (if anyone can think of one, please let me know), and it captures the very common use cases like version numbers. It also has the advantage of being very computable (ignoring single character wildcards for now). Simply take the shorter string to be a regex and the longer to be the target and do a regex match treating the target as a pure lexical string.

Though I haven't quite worked through it, I think the same approach (restricting their placement) can be taken for single character wildcards. On the other hand, if you don't restrict single character wildcards to the end, you can still easily compute the intersection, but wouldn't be able to use a standard regex library (you wouldn't be able to treat one string as a literal string). I also wonder the particular use cases behind needing a single character wild card versus a multi-character one.


Getting rid of special logical values--removing special cases
-------------------------------------------------------------
Note that now if we support matching on two attributes with wildcards, we can now do away with the distinction between ANY and "*". The only reason we needed a distinction was to at least let you match ANY and a string with wildcards... but now you can do that since "*" (replacing any) is a string with wildcards only at the end.

On a similar note, if the empty string is not considered to have a distinct, useful meaning that is not NA, we can get rid of the "-" representation of NA and just use the empty string.

These two changes have the effect of simplifying the semantics of attribute matching and removing any special cases that are not regular expression matching (though we have introduced the need to pick the "longer" string to treat as literal string if both have wildcards).

This also lets me change my definition of FQCPE vs a general CPE.
Now a FQCPE is just a cpe with no wildcards.


Specific suggestions:
=====================
1. Add an INTERSECT result type to the matching results.
2. Define "matching" in the binary sense much higher in the cpe "stack"
3. Restrict wildcards to occur only at the end of attributes
4. Remove the special representations "-" (NA) and "*" (ANY) from CPE attributes, replacing them with the empty string "" and a wildcard (still "*"), respectively.

Thanks!
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Wolfkiel, Joseph
Scott,

On this one, I pretty much just completely agree with you.  We did have some
use cases that required putting a wildcard at the beginning of an attribute
(e.g. match redhat_linux with ubuntu_linux with linux_kernel), but I think
that may be better addressed with more robust naming enforcement.  It also
might be reasonable to think of CPE matching patterns as different
constructs than CPE names.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 5:38 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd like to restart the discussion we had during Developer Days about CPE
matching in 2.3. I've put a list of specific suggestions at the bottom of
this email.

Definitions
-----------
To facilitate this discussion, I'm going to define a few terms here. I'm not
suggesting these terms become part of the standard or anything (though they
seem to be useful notions).

Lets define a "Fully-Qualified CPE Name" or "FQCPE" as a non-abstract CPE
Name; that is, every attribute has either the logical value NA or a sequence
of literals with no wildcards. (i.e., a CPE that is actually a unique
identifier) (we'll see later that NA could actually be a degenerate case of
an attribute with no wildcards)

As the proposal for CPE 2.3 does, we'll then consider that any general CPE
name (including the FQCPEs) defines a set of FQCPEs (where an FQCPE defines
the singleton set containing itself).


Matching FQCPEs
---------------
Matching FQCPEs is rather straightfoward: an FQCPE matches if and only if
two FQCPEs in the same binding lexically match, or alternatively, if every
attribute lexically matches between the two FQCPEs in canonical form.


Matching general CPEs that contain ANY but no wildcards
-------------------------------------------------------
To simplify things, I'll start the discussion of matching general CPE names
by excluding wildcards, and just considering cases where an attribute (or
more) has the logical value ANY (we'll see later this is actually a
degenerate case of an attribute with wildcards).

Since we're now talking about general CPE names, we're really talking about
matching sets. It seems most useful to expand the notion of match into set
relationships (as the current proposal does)--in particular, we're
considering the results of intersecting the two sets represented by two CPE
names.

The possible results of intersecting two sets are:
the source is EQUAL to the target set
the source is a strict SUBSET of the target set
the source is a strict SUPERSET of the target set
the source and the target set have NO OVERLAP
the source and the target INTERSECT and define a smaller set

Note that I've split the NO OVERLAP/NOT EQUAL case into two cases. This
better reflects the semantics and is important to many use cases. For
example, it supports the use case of deriving a "best guess" from two
guesses provided by different sensors about the same platform. Ie. one can
determine 64-bit and the other can determine "professional", so the best
guess is 64-bit professional.

This also makes the semantics of "NO OVERLAP/NOT EQUAL" make sense in the
context of sets. I suppose if we want to continue ignoring the notion of a
non-empty intersection, we could at least call it "NO MATCH".

The "matchness" of each of these cases really then depends on the use case.
For the 2.2-style definition of "matching", you could define EQUAL and
either SUBSET or SUPERSET (depending on which you choose to be the source vs
target) to mean a match, and the rest to be non-matching.
For other cases, other definitions of matching makes sense... so it may make
sense to push "matchness" up to the stack.


Matching general CPEs that contain wildcards
--------------------------------------------
So, what about wildcards?

Adding wildcards doesn't change the possible results of our set
intersections, since we're still talking about regular languages of sets.
What it does do is make it more difficult (and somewhat intractable in the
general sense) to compute the intersection of an attribute's regular
expressions. I won't get into the details here, but you can look up
intersecting two finite automata elsewhere.


What are the possible ways to eliminate this problem? Two were presented at
Developer Days:
1) Give an error if the regexes for an attribute both contain wildcards
2) If both contain wildcards, do a strict lexical match

Both of these seem very unsatisfactory because they don't match the
intuitive semantics of regular expressions/sets. For example, neither fix
supports the  use case of recognizing that 9.* should match 9.1.*.

Instead, I suggest that we restrict multi-character wildcards ("*") from
occuring anywhere except the end of an attribute. It seems very hard to
think of a usecase for wildcards earlier in an attribute, (if anyone can
think of one, please let me know), and it captures the very common use cases
like version numbers. It also has the advantage of being very computable
(ignoring single character wildcards for now). Simply take the shorter
string to be a regex and the longer to be the target and do a regex match
treating the target as a pure lexical string.

Though I haven't quite worked through it, I think the same approach
(restricting their placement) can be taken for single character wildcards.
On the other hand, if you don't restrict single character wildcards to the
end, you can still easily compute the intersection, but wouldn't be able to
use a standard regex library (you wouldn't be able to treat one string as a
literal string). I also wonder the particular use cases behind needing a
single character wild card versus a multi-character one.


Getting rid of special logical values--removing special cases
-------------------------------------------------------------
Note that now if we support matching on two attributes with wildcards, we
can now do away with the distinction between ANY and "*". The only reason we
needed a distinction was to at least let you match ANY and a string with
wildcards... but now you can do that since "*" (replacing any) is a string
with wildcards only at the end.

On a similar note, if the empty string is not considered to have a distinct,
useful meaning that is not NA, we can get rid of the "-" representation of
NA and just use the empty string.

These two changes have the effect of simplifying the semantics of attribute
matching and removing any special cases that are not regular expression
matching (though we have introduced the need to pick the "longer" string to
treat as literal string if both have wildcards).

This also lets me change my definition of FQCPE vs a general CPE.
Now a FQCPE is just a cpe with no wildcards.


Specific suggestions:
=====================
1. Add an INTERSECT result type to the matching results.
2. Define "matching" in the binary sense much higher in the cpe "stack"
3. Restrict wildcards to occur only at the end of attributes
4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

Thanks!
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Brant Cheikes
These were the specific suggestions:

1. Add an INTERSECT result type to the matching results.

I'll defer this to Mary to consider.  Personally I suspect our use of
set-theory terms is causing more harm than good.  While in principle there
is a set "intersection" when comparing two CPE names, we cannot make such a
determination without reference to a defined universe of individuals.  CPE
matching has always be defined as a comparison of two names.  The dictionary
is never an input.  If it were, you could resolve the two names being
compared against the dictionary then assess the relationship, if any,
between the two sets of returned results.  But when comparing two names, we
don't have that ability and so it isn't possible to distinguish an
"intersection" condition.

2. Define "matching" in the binary sense much higher in the cpe "stack"

Perhaps this means that matching really ought to be above the dictionary so
that it can be entirely reconceived in true set-theoretic terms with the
dictionary as an input.  I agree, but that was not something we could do in
the time we had to prepare 2.3.

3. Restrict wildcards to occur only at the end of attributes

I've heard a proposal to restrict wildcards to only occur at the beginning
or the end.  Are we now saying we can be even more restrictive?

4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

This actually breaks backward compatibility with 2.2.  CPE 2.2 names
explicitly distinguished NA and ANY.  If we drop NA we have by definition
broken backward compatibility with 2.2.

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Monday, June 21, 2010 5:26 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Scott,

On this one, I pretty much just completely agree with you.  We did have some
use cases that required putting a wildcard at the beginning of an attribute
(e.g. match redhat_linux with ubuntu_linux with linux_kernel), but I think
that may be better addressed with more robust naming enforcement.  It also
might be reasonable to think of CPE matching patterns as different
constructs than CPE names.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 5:38 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd like to restart the discussion we had during Developer Days about CPE
matching in 2.3. I've put a list of specific suggestions at the bottom of
this email.

Definitions
-----------
To facilitate this discussion, I'm going to define a few terms here. I'm not
suggesting these terms become part of the standard or anything (though they
seem to be useful notions).

Lets define a "Fully-Qualified CPE Name" or "FQCPE" as a non-abstract CPE
Name; that is, every attribute has either the logical value NA or a sequence
of literals with no wildcards. (i.e., a CPE that is actually a unique
identifier) (we'll see later that NA could actually be a degenerate case of
an attribute with no wildcards)

As the proposal for CPE 2.3 does, we'll then consider that any general CPE
name (including the FQCPEs) defines a set of FQCPEs (where an FQCPE defines
the singleton set containing itself).


Matching FQCPEs
---------------
Matching FQCPEs is rather straightfoward: an FQCPE matches if and only if
two FQCPEs in the same binding lexically match, or alternatively, if every
attribute lexically matches between the two FQCPEs in canonical form.


Matching general CPEs that contain ANY but no wildcards
-------------------------------------------------------
To simplify things, I'll start the discussion of matching general CPE names
by excluding wildcards, and just considering cases where an attribute (or
more) has the logical value ANY (we'll see later this is actually a
degenerate case of an attribute with wildcards).

Since we're now talking about general CPE names, we're really talking about
matching sets. It seems most useful to expand the notion of match into set
relationships (as the current proposal does)--in particular, we're
considering the results of intersecting the two sets represented by two CPE
names.

The possible results of intersecting two sets are:
the source is EQUAL to the target set
the source is a strict SUBSET of the target set
the source is a strict SUPERSET of the target set
the source and the target set have NO OVERLAP
the source and the target INTERSECT and define a smaller set

Note that I've split the NO OVERLAP/NOT EQUAL case into two cases. This
better reflects the semantics and is important to many use cases. For
example, it supports the use case of deriving a "best guess" from two
guesses provided by different sensors about the same platform. Ie. one can
determine 64-bit and the other can determine "professional", so the best
guess is 64-bit professional.

This also makes the semantics of "NO OVERLAP/NOT EQUAL" make sense in the
context of sets. I suppose if we want to continue ignoring the notion of a
non-empty intersection, we could at least call it "NO MATCH".

The "matchness" of each of these cases really then depends on the use case.
For the 2.2-style definition of "matching", you could define EQUAL and
either SUBSET or SUPERSET (depending on which you choose to be the source vs
target) to mean a match, and the rest to be non-matching.
For other cases, other definitions of matching makes sense... so it may make
sense to push "matchness" up to the stack.


Matching general CPEs that contain wildcards
--------------------------------------------
So, what about wildcards?

Adding wildcards doesn't change the possible results of our set
intersections, since we're still talking about regular languages of sets.
What it does do is make it more difficult (and somewhat intractable in the
general sense) to compute the intersection of an attribute's regular
expressions. I won't get into the details here, but you can look up
intersecting two finite automata elsewhere.


What are the possible ways to eliminate this problem? Two were presented at
Developer Days:
1) Give an error if the regexes for an attribute both contain wildcards
2) If both contain wildcards, do a strict lexical match

Both of these seem very unsatisfactory because they don't match the
intuitive semantics of regular expressions/sets. For example, neither fix
supports the  use case of recognizing that 9.* should match 9.1.*.

Instead, I suggest that we restrict multi-character wildcards ("*") from
occuring anywhere except the end of an attribute. It seems very hard to
think of a usecase for wildcards earlier in an attribute, (if anyone can
think of one, please let me know), and it captures the very common use cases
like version numbers. It also has the advantage of being very computable
(ignoring single character wildcards for now). Simply take the shorter
string to be a regex and the longer to be the target and do a regex match
treating the target as a pure lexical string.

Though I haven't quite worked through it, I think the same approach
(restricting their placement) can be taken for single character wildcards.
On the other hand, if you don't restrict single character wildcards to the
end, you can still easily compute the intersection, but wouldn't be able to
use a standard regex library (you wouldn't be able to treat one string as a
literal string). I also wonder the particular use cases behind needing a
single character wild card versus a multi-character one.


Getting rid of special logical values--removing special cases
-------------------------------------------------------------
Note that now if we support matching on two attributes with wildcards, we
can now do away with the distinction between ANY and "*". The only reason we
needed a distinction was to at least let you match ANY and a string with
wildcards... but now you can do that since "*" (replacing any) is a string
with wildcards only at the end.

On a similar note, if the empty string is not considered to have a distinct,
useful meaning that is not NA, we can get rid of the "-" representation of
NA and just use the empty string.

These two changes have the effect of simplifying the semantics of attribute
matching and removing any special cases that are not regular expression
matching (though we have introduced the need to pick the "longer" string to
treat as literal string if both have wildcards).

This also lets me change my definition of FQCPE vs a general CPE.
Now a FQCPE is just a cpe with no wildcards.


Specific suggestions:
=====================
1. Add an INTERSECT result type to the matching results.
2. Define "matching" in the binary sense much higher in the cpe "stack"
3. Restrict wildcards to occur only at the end of attributes
4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

Thanks!
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Moore, Scott D CIV DISA PEO-MA
Classification:  UNCLASSIFIED
Caveats: NONE

Comments inline. (>>)

V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Monday, June 21, 2010 6:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

These were the specific suggestions:

1. Add an INTERSECT result type to the matching results.

I'll defer this to Mary to consider.  Personally I suspect our use of
set-theory terms is causing more harm than good.  While in principle there
is a set "intersection" when comparing two CPE names, we cannot make such a
determination without reference to a defined universe of individuals.  CPE
matching has always be defined as a comparison of two names.  The dictionary
is never an input.  If it were, you could resolve the two names being
compared against the dictionary then assess the relationship, if any,
between the two sets of returned results.  But when comparing two names, we
don't have that ability and so it isn't possible to distinguish an
"intersection" condition.

>> I think there is still some use even without the well-defined universe
>> (ie, the dictionary). Since there is not a well-defined universe, we can
>> consider it to be all well-formed CPE names. This supports the use-case
>> I described with two sensors, one reporting 64-bit and the other reporting
>> english. Its useful to be able to construct the best guess, 64-bit and
>> english. On the other hand, that may not be considered "matching".

>> Thoughts anyone?

2. Define "matching" in the binary sense much higher in the cpe "stack"

Perhaps this means that matching really ought to be above the dictionary so
that it can be entirely reconceived in true set-theoretic terms with the
dictionary as an input.  I agree, but that was not something we could do in
the time we had to prepare 2.3.

>> See comment above.

3. Restrict wildcards to occur only at the end of attributes

I've heard a proposal to restrict wildcards to only occur at the beginning
or the end.  Are we now saying we can be even more restrictive?

>> My thinking is that restricting them only to the end gives a _fairly_
>> simple strategy for evaluating matches (picking the longer one to be
>> the "fixed" entry and then doing a regex match), and even that seems
>> like it may rule out very simple SQL implementations; but is probably
>> worth it for having good semantics for a common use case.

>> Having wildcards at the front too would rule out that implemenation, and
>> require a true wildcard-to-wildcard matching algorithm, which does not appear
>> to be widely available.

>> Based on Joe's comment, it seems like maybe this is an acceptable compromise;
>> however, I think the wider community, particularly vendors, need to weigh in.

4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

This actually breaks backward compatibility with 2.2.  CPE 2.2 names
explicitly distinguished NA and ANY.  If we drop NA we have by definition
broken backward compatibility with 2.2.

>> Ah. On re-reading 2.2, you're right. "-" is taken as NA and "" as ANY, so we
>> can't switch them to "" and "*", respectively. Frustrating given that it would
>> mean you could always use the same regex comparison logic. Maybe in 3.0 (if
>> it ends up looking close enough that it would matter).

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Monday, June 21, 2010 5:26 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Scott,

On this one, I pretty much just completely agree with you.  We did have some
use cases that required putting a wildcard at the beginning of an attribute
(e.g. match redhat_linux with ubuntu_linux with linux_kernel), but I think
that may be better addressed with more robust naming enforcement.  It also
might be reasonable to think of CPE matching patterns as different
constructs than CPE names.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 5:38 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd like to restart the discussion we had during Developer Days about CPE
matching in 2.3. I've put a list of specific suggestions at the bottom of
this email.

Definitions
-----------
To facilitate this discussion, I'm going to define a few terms here. I'm not
suggesting these terms become part of the standard or anything (though they
seem to be useful notions).

Lets define a "Fully-Qualified CPE Name" or "FQCPE" as a non-abstract CPE
Name; that is, every attribute has either the logical value NA or a sequence
of literals with no wildcards. (i.e., a CPE that is actually a unique
identifier) (we'll see later that NA could actually be a degenerate case of
an attribute with no wildcards)

As the proposal for CPE 2.3 does, we'll then consider that any general CPE
name (including the FQCPEs) defines a set of FQCPEs (where an FQCPE defines
the singleton set containing itself).


Matching FQCPEs
---------------
Matching FQCPEs is rather straightfoward: an FQCPE matches if and only if
two FQCPEs in the same binding lexically match, or alternatively, if every
attribute lexically matches between the two FQCPEs in canonical form.


Matching general CPEs that contain ANY but no wildcards
-------------------------------------------------------
To simplify things, I'll start the discussion of matching general CPE names
by excluding wildcards, and just considering cases where an attribute (or
more) has the logical value ANY (we'll see later this is actually a
degenerate case of an attribute with wildcards).

Since we're now talking about general CPE names, we're really talking about
matching sets. It seems most useful to expand the notion of match into set
relationships (as the current proposal does)--in particular, we're
considering the results of intersecting the two sets represented by two CPE
names.

The possible results of intersecting two sets are:
the source is EQUAL to the target set
the source is a strict SUBSET of the target set
the source is a strict SUPERSET of the target set
the source and the target set have NO OVERLAP
the source and the target INTERSECT and define a smaller set

Note that I've split the NO OVERLAP/NOT EQUAL case into two cases. This
better reflects the semantics and is important to many use cases. For
example, it supports the use case of deriving a "best guess" from two
guesses provided by different sensors about the same platform. Ie. one can
determine 64-bit and the other can determine "professional", so the best
guess is 64-bit professional.

This also makes the semantics of "NO OVERLAP/NOT EQUAL" make sense in the
context of sets. I suppose if we want to continue ignoring the notion of a
non-empty intersection, we could at least call it "NO MATCH".

The "matchness" of each of these cases really then depends on the use case.
For the 2.2-style definition of "matching", you could define EQUAL and
either SUBSET or SUPERSET (depending on which you choose to be the source vs
target) to mean a match, and the rest to be non-matching.
For other cases, other definitions of matching makes sense... so it may make
sense to push "matchness" up to the stack.


Matching general CPEs that contain wildcards
--------------------------------------------
So, what about wildcards?

Adding wildcards doesn't change the possible results of our set
intersections, since we're still talking about regular languages of sets.
What it does do is make it more difficult (and somewhat intractable in the
general sense) to compute the intersection of an attribute's regular
expressions. I won't get into the details here, but you can look up
intersecting two finite automata elsewhere.


What are the possible ways to eliminate this problem? Two were presented at
Developer Days:
1) Give an error if the regexes for an attribute both contain wildcards
2) If both contain wildcards, do a strict lexical match

Both of these seem very unsatisfactory because they don't match the
intuitive semantics of regular expressions/sets. For example, neither fix
supports the  use case of recognizing that 9.* should match 9.1.*.

Instead, I suggest that we restrict multi-character wildcards ("*") from
occuring anywhere except the end of an attribute. It seems very hard to
think of a usecase for wildcards earlier in an attribute, (if anyone can
think of one, please let me know), and it captures the very common use cases
like version numbers. It also has the advantage of being very computable
(ignoring single character wildcards for now). Simply take the shorter
string to be a regex and the longer to be the target and do a regex match
treating the target as a pure lexical string.

Though I haven't quite worked through it, I think the same approach
(restricting their placement) can be taken for single character wildcards.
On the other hand, if you don't restrict single character wildcards to the
end, you can still easily compute the intersection, but wouldn't be able to
use a standard regex library (you wouldn't be able to treat one string as a
literal string). I also wonder the particular use cases behind needing a
single character wild card versus a multi-character one.


Getting rid of special logical values--removing special cases
-------------------------------------------------------------
Note that now if we support matching on two attributes with wildcards, we
can now do away with the distinction between ANY and "*". The only reason we
needed a distinction was to at least let you match ANY and a string with
wildcards... but now you can do that since "*" (replacing any) is a string
with wildcards only at the end.

On a similar note, if the empty string is not considered to have a distinct,
useful meaning that is not NA, we can get rid of the "-" representation of
NA and just use the empty string.

These two changes have the effect of simplifying the semantics of attribute
matching and removing any special cases that are not regular expression
matching (though we have introduced the need to pick the "longer" string to
treat as literal string if both have wildcards).

This also lets me change my definition of FQCPE vs a general CPE.
Now a FQCPE is just a cpe with no wildcards.


Specific suggestions:
=====================
1. Add an INTERSECT result type to the matching results.
2. Define "matching" in the binary sense much higher in the cpe "stack"
3. Restrict wildcards to occur only at the end of attributes
4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

Thanks!
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
Classification:  UNCLASSIFIED
Caveats: NONE

Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Thomas Jones
On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Eric Walker
I have avoided this discussion up to now.  But I'm glad to see that someone else agrees with me here.

We only do ourselves a disservice by attempting to specify the details that go into pattern matching.  Let's be very frank with ourselves:  we're not at the level to be writing RFCs, we're not at the level to be writing HTML or CSS specifications, and we're not at the level to be getting into the details of regular expression syntax.  We demonstrate our competence by refraining from specifying details in topics that we're not clear on and making use of work that other competent people have done.  We raise questions about our competence by venturing into areas that we're not very clear on and trying to reinvent the wheel.

My feeling is that we should adopt an existing regular expression API and stick with it (my vote: PCRE).  This will include escaping.  Escaping in such an API can be idiosyncratic, and it may not simply be a case of adding a backslash to everything non-alphanumeric.  Sometimes you might not add a backslash in such a syntax where a simple rule of "add a backslash to everything" would dictate otherwise.

To add to this, we should specifically not adopt a SQL-equivalent syntax.  Doing so would require that implementers both preserve SQL-equivalent wildcards while removing potential SQL-injection attacks.

Please forgive my frankness.  I'm just trying to help you guys here.

Eric


-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 9:14 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Brant Cheikes
In reply to this post by Moore, Scott D CIV DISA PEO-MA
Regarding placement of wildcards, here is my current ABNF for strings which
may appear in the output binding.  This grammar (a) uses escaping instead of
percent encoding, (b) uses "*" as both the multi-character wildcard and the
logical value ANY, (c) uses "?" as the single-character wildcard, (d)
restricts "wildcard sequences" to the end of the string.  A wildcard
sequence is one or more sequential question-marks or a single asterisk.

Options at this point include allowing the wcard_seq to also appear at the
front of the string, or even dropping the single-character wildcard
entirely.  Comments?

value = logical / string
logical = "*" / "-"
string = +(unreserved / quoted) 0*1wcard_seq
unreserved = ALPHA / DIGIT / "_" / "."
quoted = escape (escape / sgl_wcard / multi_wcard / punc)
escape = "\"
wcard_seq = +(sgl_wcard) / multi_wcard
sgl_wcard = "?"
multi_wcard = "*"
punc = "." / "-" / ":" / "/" / "#" / "[" / "]" / "@" / "^"
punc /= "~" / "!" / "$" / "&" / "'" / "(" / ")" / "+"
punc /= "," / ";" / "=" / "{" / "}" / "|" / "`" / "%"
punc /= "<" / ">"
ALPHA = lowercase letters only
DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 11:13 AM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

Comments inline. (>>)

V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405

-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Monday, June 21, 2010 6:30 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

These were the specific suggestions:

1. Add an INTERSECT result type to the matching results.

I'll defer this to Mary to consider.  Personally I suspect our use of
set-theory terms is causing more harm than good.  While in principle there
is a set "intersection" when comparing two CPE names, we cannot make such a
determination without reference to a defined universe of individuals.  CPE
matching has always be defined as a comparison of two names.  The dictionary
is never an input.  If it were, you could resolve the two names being
compared against the dictionary then assess the relationship, if any,
between the two sets of returned results.  But when comparing two names, we
don't have that ability and so it isn't possible to distinguish an
"intersection" condition.

>> I think there is still some use even without the well-defined universe
>> (ie, the dictionary). Since there is not a well-defined universe, we can
>> consider it to be all well-formed CPE names. This supports the use-case
>> I described with two sensors, one reporting 64-bit and the other
reporting
>> english. Its useful to be able to construct the best guess, 64-bit and
>> english. On the other hand, that may not be considered "matching".

>> Thoughts anyone?

2. Define "matching" in the binary sense much higher in the cpe "stack"

Perhaps this means that matching really ought to be above the dictionary so
that it can be entirely reconceived in true set-theoretic terms with the
dictionary as an input.  I agree, but that was not something we could do in
the time we had to prepare 2.3.

>> See comment above.

3. Restrict wildcards to occur only at the end of attributes

I've heard a proposal to restrict wildcards to only occur at the beginning
or the end.  Are we now saying we can be even more restrictive?

>> My thinking is that restricting them only to the end gives a _fairly_
>> simple strategy for evaluating matches (picking the longer one to be
>> the "fixed" entry and then doing a regex match), and even that seems
>> like it may rule out very simple SQL implementations; but is probably
>> worth it for having good semantics for a common use case.

>> Having wildcards at the front too would rule out that implemenation, and
>> require a true wildcard-to-wildcard matching algorithm, which does not
appear
>> to be widely available.

>> Based on Joe's comment, it seems like maybe this is an acceptable
compromise;
>> however, I think the wider community, particularly vendors, need to weigh
in.

4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

This actually breaks backward compatibility with 2.2.  CPE 2.2 names
explicitly distinguished NA and ANY.  If we drop NA we have by definition
broken backward compatibility with 2.2.

>> Ah. On re-reading 2.2, you're right. "-" is taken as NA and "" as ANY, so
we
>> can't switch them to "" and "*", respectively. Frustrating given that it
would
>> mean you could always use the same regex comparison logic. Maybe in 3.0
(if
>> it ends up looking close enough that it would matter).

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Wolfkiel, Joseph [mailto:[hidden email]]
Sent: Monday, June 21, 2010 5:26 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Scott,

On this one, I pretty much just completely agree with you.  We did have some
use cases that required putting a wildcard at the beginning of an attribute
(e.g. match redhat_linux with ubuntu_linux with linux_kernel), but I think
that may be better addressed with more robust naming enforcement.  It also
might be reasonable to think of CPE matching patterns as different
constructs than CPE names.

Lt Col Joseph L. Wolfkiel
Director, Computer Network Defense Research & Technology (CND R&T) Program
Management Office
9800 Savage Rd Ste 6767
Ft Meade, MD 20755-6767
Commercial 410-854-5401 DSN 244-5401
Fax 410-854-6700

-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Thursday, June 17, 2010 5:38 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I'd like to restart the discussion we had during Developer Days about CPE
matching in 2.3. I've put a list of specific suggestions at the bottom of
this email.

Definitions
-----------
To facilitate this discussion, I'm going to define a few terms here. I'm not
suggesting these terms become part of the standard or anything (though they
seem to be useful notions).

Lets define a "Fully-Qualified CPE Name" or "FQCPE" as a non-abstract CPE
Name; that is, every attribute has either the logical value NA or a sequence
of literals with no wildcards. (i.e., a CPE that is actually a unique
identifier) (we'll see later that NA could actually be a degenerate case of
an attribute with no wildcards)

As the proposal for CPE 2.3 does, we'll then consider that any general CPE
name (including the FQCPEs) defines a set of FQCPEs (where an FQCPE defines
the singleton set containing itself).


Matching FQCPEs
---------------
Matching FQCPEs is rather straightfoward: an FQCPE matches if and only if
two FQCPEs in the same binding lexically match, or alternatively, if every
attribute lexically matches between the two FQCPEs in canonical form.


Matching general CPEs that contain ANY but no wildcards
-------------------------------------------------------
To simplify things, I'll start the discussion of matching general CPE names
by excluding wildcards, and just considering cases where an attribute (or
more) has the logical value ANY (we'll see later this is actually a
degenerate case of an attribute with wildcards).

Since we're now talking about general CPE names, we're really talking about
matching sets. It seems most useful to expand the notion of match into set
relationships (as the current proposal does)--in particular, we're
considering the results of intersecting the two sets represented by two CPE
names.

The possible results of intersecting two sets are:
the source is EQUAL to the target set
the source is a strict SUBSET of the target set
the source is a strict SUPERSET of the target set
the source and the target set have NO OVERLAP
the source and the target INTERSECT and define a smaller set

Note that I've split the NO OVERLAP/NOT EQUAL case into two cases. This
better reflects the semantics and is important to many use cases. For
example, it supports the use case of deriving a "best guess" from two
guesses provided by different sensors about the same platform. Ie. one can
determine 64-bit and the other can determine "professional", so the best
guess is 64-bit professional.

This also makes the semantics of "NO OVERLAP/NOT EQUAL" make sense in the
context of sets. I suppose if we want to continue ignoring the notion of a
non-empty intersection, we could at least call it "NO MATCH".

The "matchness" of each of these cases really then depends on the use case.
For the 2.2-style definition of "matching", you could define EQUAL and
either SUBSET or SUPERSET (depending on which you choose to be the source vs
target) to mean a match, and the rest to be non-matching.
For other cases, other definitions of matching makes sense... so it may make
sense to push "matchness" up to the stack.


Matching general CPEs that contain wildcards
--------------------------------------------
So, what about wildcards?

Adding wildcards doesn't change the possible results of our set
intersections, since we're still talking about regular languages of sets.
What it does do is make it more difficult (and somewhat intractable in the
general sense) to compute the intersection of an attribute's regular
expressions. I won't get into the details here, but you can look up
intersecting two finite automata elsewhere.


What are the possible ways to eliminate this problem? Two were presented at
Developer Days:
1) Give an error if the regexes for an attribute both contain wildcards
2) If both contain wildcards, do a strict lexical match

Both of these seem very unsatisfactory because they don't match the
intuitive semantics of regular expressions/sets. For example, neither fix
supports the  use case of recognizing that 9.* should match 9.1.*.

Instead, I suggest that we restrict multi-character wildcards ("*") from
occuring anywhere except the end of an attribute. It seems very hard to
think of a usecase for wildcards earlier in an attribute, (if anyone can
think of one, please let me know), and it captures the very common use cases
like version numbers. It also has the advantage of being very computable
(ignoring single character wildcards for now). Simply take the shorter
string to be a regex and the longer to be the target and do a regex match
treating the target as a pure lexical string.

Though I haven't quite worked through it, I think the same approach
(restricting their placement) can be taken for single character wildcards.
On the other hand, if you don't restrict single character wildcards to the
end, you can still easily compute the intersection, but wouldn't be able to
use a standard regex library (you wouldn't be able to treat one string as a
literal string). I also wonder the particular use cases behind needing a
single character wild card versus a multi-character one.


Getting rid of special logical values--removing special cases
-------------------------------------------------------------
Note that now if we support matching on two attributes with wildcards, we
can now do away with the distinction between ANY and "*". The only reason we
needed a distinction was to at least let you match ANY and a string with
wildcards... but now you can do that since "*" (replacing any) is a string
with wildcards only at the end.

On a similar note, if the empty string is not considered to have a distinct,
useful meaning that is not NA, we can get rid of the "-" representation of
NA and just use the empty string.

These two changes have the effect of simplifying the semantics of attribute
matching and removing any special cases that are not regular expression
matching (though we have introduced the need to pick the "longer" string to
treat as literal string if both have wildcards).

This also lets me change my definition of FQCPE vs a general CPE.
Now a FQCPE is just a cpe with no wildcards.


Specific suggestions:
=====================
1. Add an INTERSECT result type to the matching results.
2. Define "matching" in the binary sense much higher in the cpe "stack"
3. Restrict wildcards to occur only at the end of attributes
4. Remove the special representations "-" (NA) and "*" (ANY) from CPE
attributes, replacing them with the empty string "" and a wildcard (still
"*"), respectively.

Thanks!
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405
Classification:  UNCLASSIFIED
Caveats: NONE

Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Moore, Scott D CIV DISA PEO-MA
In reply to this post by Eric Walker
Classification:  UNCLASSIFIED
Caveats: NONE

I think the question comes down to whether or not we want to support matching two CPE names that contain wildcards, or just error out. The issue comes up because as far as I am aware, there are no regular expression implementations that support it. If they do, it is very inefficient in the general case (where wildcards can be anywhere). So, if we want to support it, how can we restrict it so that 1) with a minor hack, you can use an existing regular expression implementation, and 2) it is reasonably efficient.

But ultimately, the deciding factor is whether or not you want 9.* to match 9.1.*. If there are regular expression libraries that do that, then you're right, its not an issue we should worry about.

V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405

-----Original Message-----
From: Eric Walker [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:27 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

I have avoided this discussion up to now.  But I'm glad to see that someone else agrees with me here.

We only do ourselves a disservice by attempting to specify the details that go into pattern matching.  Let's be very frank with ourselves:  we're not at the level to be writing RFCs, we're not at the level to be writing HTML or CSS specifications, and we're not at the level to be getting into the details of regular expression syntax.  We demonstrate our competence by refraining from specifying details in topics that we're not clear on and making use of work that other competent people have done.  We raise questions about our competence by venturing into areas that we're not very clear on and trying to reinvent the wheel.

My feeling is that we should adopt an existing regular expression API and stick with it (my vote: PCRE).  This will include escaping.  Escaping in such an API can be idiosyncratic, and it may not simply be a case of adding a backslash to everything non-alphanumeric.  Sometimes you might not add a backslash in such a syntax where a simple rule of "add a backslash to everything" would dictate otherwise.

To add to this, we should specifically not adopt a SQL-equivalent syntax.  Doing so would require that implementers both preserve SQL-equivalent wildcards while removing potential SQL-injection attacks.

Please forgive my frankness.  I'm just trying to help you guys here.

Eric


-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 9:14 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Brant Cheikes
In reply to this post by Thomas Jones
Regarding pattern matching in CPE, a couple of things are going on at the moment.  First, we're coming to some agreement on which regex standard to reference.  That's looking like it will be PCRE.  We have no intention of redefining regex matching behavior or writing our own implementation.  That said, we DO have the intention to restrict allowed pattern-matching meta-characters to just a single-character wildcard and a multi-character wildcard.  We simply don't want to throw open the doors and allow the full power of regex pattern matching into CPE names.  We don't see the use cases for "full monty regex", so we're moving very deliberately and cautiously, introducing just one or two matching capabilities and then seeing how that goes.

Second, we're discussing what meta-characters to use in CPE for the single- and multi-character wildcards, and further, if we abandon percent-encoding, how to handle non-alphanumerics in CPE names.  The PCRE standard defines the dot as the single-character wildcard, and the two-character sequence ".*" as the multi-character wildcard.  We don't feel compelled to mirror these choices in CPE, even though it would make it possible to feed portions of CPE names directly to PCRE implementations.  What we're thinking of doing instead is following the grep standard of question-mark and asterisk, then defining an appropriate transform in the matching spec suitable for whichever regex algorithm we settle on.  So a CPE name might use an embedded asterisk, and this would be replaced with ".*" for PCRE when doing matching.  It's probably a good idea to avoid hard-wiring a particular regex algorithm's meta-character choices into the CPE standard.  Rather, we define our own meta-characters, then transform them as needed to whatever pattern-matching standard we want to use.

I hope that clarifies our thinking.

Cheers,
/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:14 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Brant Cheikes
One more point on regex matching.  OVAL researched this carefully and built quite a bit of regex matching support into the language, as detailed here:
https://oval.mitre.org/language/about/re_support_5.6.html

In CPE we're not prepared yet to wade in this far.  If we were to follow OVAL practice for our two limited needs (match any single character, and match 0 or more characters), we would choose the dot (rather than the question mark) for the former, and ".*" (rather than just the asterisk) for the latter.  But we should keep in mind that CPE and OVAL are doing different things.  If we want to be consistent for consistency's sake, then we should follow OVAL's lead here.  But I also think it's a reasonable choice to define a pair of single-character alternatives and then transform them appropriately before feeding strings to a standard matching algorithm.

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Cheikes, Brant A. [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:56 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Regarding pattern matching in CPE, a couple of things are going on at the moment.  First, we're coming to some agreement on which regex standard to reference.  That's looking like it will be PCRE.  We have no intention of redefining regex matching behavior or writing our own implementation.  That said, we DO have the intention to restrict allowed pattern-matching meta-characters to just a single-character wildcard and a multi-character wildcard.  We simply don't want to throw open the doors and allow the full power of regex pattern matching into CPE names.  We don't see the use cases for "full monty regex", so we're moving very deliberately and cautiously, introducing just one or two matching capabilities and then seeing how that goes.

Second, we're discussing what meta-characters to use in CPE for the single- and multi-character wildcards, and further, if we abandon percent-encoding, how to handle non-alphanumerics in CPE names.  The PCRE standard defines the dot as the single-character wildcard, and the two-character sequence ".*" as the multi-character wildcard.  We don't feel compelled to mirror these choices in CPE, even though it would make it possible to feed portions of CPE names directly to PCRE implementations.  What we're thinking of doing instead is following the grep standard of question-mark and asterisk, then defining an appropriate transform in the matching spec suitable for whichever regex algorithm we settle on.  So a CPE name might use an embedded asterisk, and this would be replaced with ".*" for PCRE when doing matching.  It's probably a good idea to avoid hard-wiring a particular regex algorithm's meta-character choices into the CPE standard.  Rather, we define our own meta-characters, then transform them as needed to whatever pattern-matching standard we want to use.

I hope that clarifies our thinking.

Cheers,
/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:14 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Brant Cheikes
In reply to this post by Moore, Scott D CIV DISA PEO-MA
Yes, this is exactly the hard case.  Once we introduce wildcards of any kind
into the language, we have to be prepared for a matching scenario in which
both inputs to the algorithm contain wildcards.  If we restrict regex
meta-characters to the end of a string, we ought to be able to inspect the
constant prefixes to determine that 9.* represents a superset of 9.1.?.
That determination would be made by the matching algorithm, before regex
matching is invoked.

If we truly need to allow meta-characters at the front of the string (or
worse, anywhere within), it's probably not feasible to determine the
relationship between, say, "?.1.?" and "*.1.*".

The only specific example I've heard calling for a wildcard at the front is
the pattern "*linux*" ostensibly to find all products containing "linux" in
the product name.

There's actually a reasonable argument for allowing free use of
meta-characters while defining the wildcard-to-wildcard comparison as an
error condition.  Matching actually can be quite useful if one argument is
always a constant (no wildcards, i.e., a name from a dictionary) and the
other contains a pattern.  When both arguments contain wildcards, one really
needs to compare the sets of results returned by matching each argument
individually against an (implicit) dictionary.  The error condition returned
when both matching arguments are wildcards could really be considered a
useful signal that the two arguments demand a different matching approach.
(An approach, by the way, which would reuse the matching capabilities
defined in the spec.)

/Brant

Brant A. Cheikes
The MITRE Corporation
202 Burlington Road, M/S K302
Bedford, MA 01730-1420
Tel. 781-271-7505; Cell. 617-694-8180; Fax. 781-271-2352


-----Original Message-----
From: Moore, Scott D CIV DISA PEO-MA [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:54 PM
To: cpe-discussion-list CPE Community Forum
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

Classification:  UNCLASSIFIED
Caveats: NONE

I think the question comes down to whether or not we want to support
matching two CPE names that contain wildcards, or just error out. The issue
comes up because as far as I am aware, there are no regular expression
implementations that support it. If they do, it is very inefficient in the
general case (where wildcards can be anywhere). So, if we want to support
it, how can we restrict it so that 1) with a minor hack, you can use an
existing regular expression implementation, and 2) it is reasonably
efficient.

But ultimately, the deciding factor is whether or not you want 9.* to match
9.1.*. If there are regular expression libraries that do that, then you're
right, its not an issue we should worry about.

V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405

-----Original Message-----
From: Eric Walker [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 12:27 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

I have avoided this discussion up to now.  But I'm glad to see that someone
else agrees with me here.

We only do ourselves a disservice by attempting to specify the details that
go into pattern matching.  Let's be very frank with ourselves:  we're not at
the level to be writing RFCs, we're not at the level to be writing HTML or
CSS specifications, and we're not at the level to be getting into the
details of regular expression syntax.  We demonstrate our competence by
refraining from specifying details in topics that we're not clear on and
making use of work that other competent people have done.  We raise
questions about our competence by venturing into areas that we're not very
clear on and trying to reinvent the wheel.

My feeling is that we should adopt an existing regular expression API and
stick with it (my vote: PCRE).  This will include escaping.  Escaping in
such an API can be idiosyncratic, and it may not simply be a case of adding
a backslash to everything non-alphanumeric.  Sometimes you might not add a
backslash in such a syntax where a simple rule of "add a backslash to
everything" would dictate otherwise.

To add to this, we should specifically not adopt a SQL-equivalent syntax.
Doing so would require that implementers both preserve SQL-equivalent
wildcards while removing potential SQL-injection attacks.

Please forgive my frankness.  I'm just trying to help you guys here.

Eric


-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 9:14 AM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 11:12 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Classification:  UNCLASSIFIED
> Caveats: NONE
>
> Comments inline. (>>)

A few questions.

1) What advantages are we hoping to achieve by redefining regular
expression matching? There has been a vast amount of research done on
regex both commercially and academically. How are we advancing the
construction or interpretation of regex that we cannot simply utilize
existing resources.

2) Can't we just utilize existing regex defines and use the wildcard and
matching algorithms already constructed? Most pattern matching api's are
already being utilized during software analysis and engineering.

3) What value is the discussion about Discrete Mathematics applying to
CPE? That is exactly what we are discussing. I took DM theory from
DePaul University, and im having a difficult time putting my head around
the direction and purpose of the applicability of the two.

Seems to me that we are just reinventing the wheel here. My 2 cents.

Thomas
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Thomas Jones
In reply to this post by Moore, Scott D CIV DISA PEO-MA
On Tue, 2010-06-22 at 12:54 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> But ultimately, the deciding factor is whether or not you want 9.* to
> match 9.1.*. If there are regular expression libraries that do that,
> then you're right, its not an issue we should worry about.
>


OK. Here's my quick use-cases:

Match "9.1". Test expression = ^9\..*$
Match "9.1" proof-of-concept code.
<code language="python">
import re

# common variables

rawstr = r"""^9\..*$"""
embedded_rawstr = r"""(?x)^9\..*$"""
matchstr = """9.1.0"""

# method 1: using a compile object
compile_obj = re.compile(rawstr,  re.VERBOSE)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr,  re.VERBOSE)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)
</code>


Do NOT match "9.1". Test expression = ^((?!9\.1(\.)).)*$
Do NOT match "9.1" proof-of-concept code.
<code language="python">
import re

# common variables

rawstr = r"""^((?!9\.1(\.)).)*$"""
embedded_rawstr = r"""(?x)^((?!9\.1(\.)).)*$"""
matchstr = """8.1.0"""

# method 1: using a compile object
compile_obj = re.compile(rawstr,  re.VERBOSE)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr,  re.VERBOSE)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)

# Retrieve group(s) from match_obj
all_groups = match_obj.groups()

# Retrieve group(s) by index
group_1 = match_obj.group(1)
group_2 = match_obj.group(2)
</code>

Do NOT match "9.1" substitutions
matchstr = """9.2.0"""
matchstr = """9.11.0"""

PCRE is so flexible and strong. Feel free to run these on your own
systems to verify.

Cheers.
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Thomas Jones
In reply to this post by Brant Cheikes
On Tue, 2010-06-22 at 13:25 -0400, Cheikes, Brant A. wrote:
> The only specific example I've heard calling for a wildcard at the
> front is
> the pattern "*linux*" ostensibly to find all products containing
> "linux" in
> the product name.
>
>
^.*linux.*$

matches redhat_linux
matches ubuntu_linux
matches linux_kernel

In fact it will match ANY value that contains "linux" in it.

However, I prefer a little more precise of code:
^((redhat|ubuntu)_linux)|(linux_kernel)$

This matches as well, but provides protection against improper code.
e.g. linuxdistribution

<code language="python">
import re

# common variables

rawstr = r"""^((redhat|ubuntu)_linux)|(linux_kernel)$"""
embedded_rawstr = r"""^((redhat|ubuntu)_linux)|(linux_kernel)$"""
matchstr = """linux_kernel"""

# method 1: using a compile object
compile_obj = re.compile(rawstr)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)

# Retrieve group(s) from match_obj
all_groups = match_obj.groups()

# Retrieve group(s) by index
group_1 = match_obj.group(1)
group_2 = match_obj.group(2)
group_3 = match_obj.group(3)
</code>

Cheers.
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Moore, Scott D CIV DISA PEO-MA
In reply to this post by Thomas Jones
Classification:  UNCLASSIFIED
Caveats: NONE

Is there an example for matching 9.* to 9.1.*?
These examples only show one or other string with a wildcard, if I'm reading them correctly.

Again, if the community doesn't want to support this match, that's fine.
V/r,
Scott Moore
DISA PEO-MA IA5
CND Enclave Security Division
[hidden email]
([hidden email])
703.882.2405

-----Original Message-----
From: Thomas R. Jones [mailto:[hidden email]]
Sent: Tuesday, June 22, 2010 6:19 PM
To: [hidden email]
Subject: Re: [CPE-DISCUSSION-LIST] CPE 2.3 Matching (UNCLASSIFIED)

On Tue, 2010-06-22 at 12:54 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> But ultimately, the deciding factor is whether or not you want 9.* to
> match 9.1.*. If there are regular expression libraries that do that,
> then you're right, its not an issue we should worry about.
>


OK. Here's my quick use-cases:

Match "9.1". Test expression = ^9\..*$
Match "9.1" proof-of-concept code.
<code language="python">
import re

# common variables

rawstr = r"""^9\..*$"""
embedded_rawstr = r"""(?x)^9\..*$"""
matchstr = """9.1.0"""

# method 1: using a compile object
compile_obj = re.compile(rawstr,  re.VERBOSE)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr,  re.VERBOSE)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)
</code>


Do NOT match "9.1". Test expression = ^((?!9\.1(\.)).)*$
Do NOT match "9.1" proof-of-concept code.
<code language="python">
import re

# common variables

rawstr = r"""^((?!9\.1(\.)).)*$"""
embedded_rawstr = r"""(?x)^((?!9\.1(\.)).)*$"""
matchstr = """8.1.0"""

# method 1: using a compile object
compile_obj = re.compile(rawstr,  re.VERBOSE)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr,  re.VERBOSE)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)

# Retrieve group(s) from match_obj
all_groups = match_obj.groups()

# Retrieve group(s) by index
group_1 = match_obj.group(1)
group_2 = match_obj.group(2)
</code>

Do NOT match "9.1" substitutions
matchstr = """9.2.0"""
matchstr = """9.11.0"""

PCRE is so flexible and strong. Feel free to run these on your own
systems to verify.

Cheers.
Thomas
Classification:  UNCLASSIFIED
Caveats: NONE


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CPE 2.3 Matching (UNCLASSIFIED)

Thomas Jones
On Wed, 2010-06-23 at 09:14 -0400, Moore, Scott D CIV DISA PEO-MA wrote:
> Is there an example for matching 9.* to 9.1.*?
> These examples only show one or other string with a wildcard, if I'm
> reading them correctly.
>
>

I think there is some discrepancy on how regular expressions work. The
search facilitated by a regex is not "9.*" but "9.".  The asterisk(*) is
meant to be a wildcard metacharacter. You only require that
metacharacters be in the search values not the input. What value is it
that 9.* is in the code? For instance:

if we were given a record in a CPE dictionary that had
<cpe version="9.*"/>

What is the meaning of the version attributes value? 9.*? This reads
Nine point asterisk. Not Nine point anything. This is the input not the
regex. Rather it should be
<cpe version="9.1"/>

Also, the "9.1" is a literal string of characters. Given that the "."
also can be a metacharacter. Thus to match the literal dot(.) we must
escape its metacharacter meaning from the regex engine.

So to match the "9.1" literal string set, we utilize the following regex

^9\..*$

This reads(in our case using the version attribute value as input into
the PCRE regex engine):
1) match beginning of the line;
2) followed by a numeral 9;
3) followed by a dot(recall that a dot has metacharacter interpretation
as well----we don't want that meaning this time so we escape(prepend) it
with a backslash;
4) followed by any number of characters(a metacharacter dot meaning any
character---immediately followed by a metacharacter asterisk---meaning
any number of the previous character);
5) and finally match the end of line.

Thats it.

I might mention that step 4 has many hidden difficulties such as
embedded control characters but that is another discussion altogether.

Please feel free to discuss with me on or off-list regarding regex. I
truly think we are making this far more difficult than it needs to be. I
would hate to see the standard suffer because we were becoming to
bogged-down in irrelevant and unneeded implementations.

Hopefully, I have maybe brought a different light to the problem and
helped to stabilize our communities efforts in a more productive
direction. That was my intention, not to step on toes.

Cheers.
Thomas