CWE Draft 9 Major Schema Changes, and Outstanding Issues

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

CWE Draft 9 Major Schema Changes, and Outstanding Issues

Steven M. Christey-2
All,

We've significantly modified the schema for Draft 9.  The primary
driver was to improve support for multiple views, and to better
distinguish between the different types of elements that we are
covering in CWE.  Thanks to Sean Barnum for figuring out the bulk of
this.  The MITRE team took his inputs and made some small tweaks here
and there.

As CWE is at a crossroads with respect to the schema, we welcome any
feedback or alternatives to our current approaches.  Specifically,
while we have chosen XML so far, we are open to leveraging other
techniques to storing and working with the data, if those techniques
are more effective.  For example, if it makes sense to store CWE in a
database and use an application server to help present and link
everything together, we are open to pursuing that.  We also plan to
investigate RDF, XGGML, and other languages that might be more
directly supportive of graph-based relationships.

Please note that even if we stay with XML and related technologies, we
expect that the schema will still need to change a little bit.
However, we believe that one requirement for "CWE 1.0" is to have
stable schema.  In Draft 9, we are definitely a lot closer than we
were.  Bob and I will post our requirements for "CWE 1.0" once they've
been finalized.

For Draft 9, some of the highest level schema changes are covered
here:

  http://cwe.mitre.org/data/reports/diff_xsd_10_3.0.html

The rest of this document assumes that you read the preceding
document.

Any and all feedback would be appreciated, especially if there are
still outstanding issues in the schema that prevent you from using CWE
as extensively as you would want to.


Schema Evaluation Criteria
--------------------------

Here are some of the criteria that I think we should be applying while
finalizing the schema:

  - Expressiveness: we should be able to express everything that we
    want to.  In Draft 9, some examples of this are the creation of
    explicit views, and the requirement for relationships to specify
    the views they are part of.  But, we still don't have a way of
    saying things like "this issue theoretically affects any language
    that performs direct memory management, but it's especially common
    in C."  That's important, because if C is not explicitly mentioned
    in an element, then that element won't be part of the C language
    view.

  - Extraction: it should be as easy as possible for CWE users to
    extract the data that they want, using commonly available XML
    parsers and related tools.  In Draft 9, the relevant data for
    named chains are not necessarily easy to extract.

  - Maintenance

    - minimize maintenance costs: the MITRE team, and outside
      contributors, should be able to quickly represent the necessary
      information.

        - minimize preventable errors in data entry: we want to minimize
      errors in the CWE representation that cannot be caught by an XML
      validator, but nonetheless require consistency.

        - minimize XML "bloat": this is hopefully self-explanatory.  The
      relationships in Draft 9 might exhibit some bloat, although at
      the same time, there's a major benefit to their increased
      expressiveness.

  - Flexibility: ideally, the schema would remain stable, while
    allowing us to build in additional capabilities.  For Draft 9, we
    believe that we've added flexibility for defining new kinds of
    relationships and views.  The introduction of compound elements
    will hopefully allow us to support other kinds of concepts besides
    chains and composites that might arise in the future; for example,
    some CWE nodes are really talking about multiple distinct issues
    and could be called "loose composites."

In light of these criteria, I wanted to explain some of the rationale
for the schema changes, and what we have left ahead of us for CWE 1.0.



Views
-----

We added a number of views to CWE Draft 9.  For the most part, this
involved converting weakness/"groupings" from Draft 8, into the new
Views type for draft 9.

See http://cwe.mitre.org/data/index.html for a list of views.

Slices are basically lists of elements, without any relationships
between them.  Membership in a slice can be explicit or implicit.  In
explicit slices, all the relevant entries have some ChildOf
relationship where the View node is the parent; see CWE-630
(Weaknesses Examined by SAMATE) and CWE-635 (Weaknesses Used by NVD)
for examples.

In implicit slices, the slice has some filtering criteria that define
membership, and there aren't any relationships within the XML that are
explicitly defined.  For example, CWE-658 is a slice that covers
weaknesses found in the C language.  This implicit slice has a Filter
that specifies that member entries have "C" under the
Applicable_Platforms field.

The Comprehensive CWE Dictionary view, CWE-2000, is actually an
implicit slice that selects everything from CWE by using a filter that
always returns true.

Views can also be graphs, such as CWE-1000 (Natural Hierarchy).
Currently, graphs are expected to have explicit ChildOf relationships
within the member elements.  Before Draft 9, everything was
effectively under the Natural Hierarchy.  In Draft 9, however, some of
those elements have been removed from the Natural Hierarchy
altogether, like deprecated nodes and the resource-based view.

We suspect that some individual views might be best described as a
combination of slices *and* graphs, with a combination of implicit or
explicit membership.  A view might be best expressed via some set of
explicit relationships (maybe between some implicit slices), then
defaulting to the relationships of a different view at some point.

The most concrete example of this is CWE-631 (Resource-specific
Weaknesses), at:

  http://cwe.mitre.org/data/graphs/631.html

The higher-level nodes have explicit relationships defined within View
631.  Its children - such as the Category node CWE-632 (Weaknesses
that Affect Files or Directories) - have explicitly specified children
such as CWE-22 (Path Traversal).  That is, Path Traversal has an
explicit "ChildOf CWE-632" relationship.  However, instead of the
explicit relationships, CWE-632 could potentially be defined as an
implicit slice of "all elements that have an Affected_Resource field
of File/Directory."  That would reduce maintenance costs and improve
accuracy, but it is not possible in Draft 9, because CWE-632 is a
Category type - it's *in* a view, but not a view itself.

In addition, the resource-based view, CWE-631, could be more
comprehensive by "view hopping."  In Draft 9, CWE-631 stops at CWE-22
(Path Traversal), but there are several children under CWE-22 that
would also match - except those children are only listed under the
natural hierarchy (view CWE-1000).  It would probably be quite tedious
and error-prone just to copy all the natural hierarchy relationships
over to this new view.  This might be best handled by allowing views
to link to each other, but this is not possible in Draft 9.  In
addition, the "hops" might wind up including elements that were not
intended.

Finally, we have encountered some difficulties in generating a
"Comprehensive Graph" that merges all views together - the natural
hierarchy, the resource-based graph, the language-specific slices,
etc.  So, there isn't a single graph on the CWE web site that covers
the entire CWE.  We do have a PDF file that contains most nodes; it
focuses on the natural hierarchy (CWE-1000), and all other nodes are
effectively "orphans."  We don't necessarily have to solve this
problem for a comprehensive view - after all, it's not clear who would
have a need for such a thing - but I thought it was worthwhile to
mention.


Relationships
-------------

The expression of relationships has changed significantly for Draft 9.
Much of this is covered by the schema diff report listed at the top of
this document, but there are some fields that I wanted to highlight.

Relationship_Type:

  The Draft 8 version of "Relationship_Type" has been renamed to
  "Relationship_Nature".  The Draft 9 version of this field is
  intended to identify the type of the entry that is being linked to.
  Since we now have multiple types of entries in CWE, this field might
  be useful in simplifying some extraction and presentation logic for
  XSLT's.  We have not needed this field in generating the web site
  for Draft 9, although it might be convenient for others.  However,
  this field is currently being manually maintained, and this value
  was often incorrect, because we changed the types of a number of
  elements in Draft 9, which immediately invalidated this field in
  dozens of relationships.  We are able to perform a consistency check
  to ensure that these values are correct before release, but it's
  still a little bit of labor.

  As a result, we will be looking at this field more closely, trying
  to balance utility to the community with maintenance costs to the
  CWE team.

Relationship_View_IDs:

  We anticipate that, in the future, we will have multiple views that
  share a lot of the same structure.  As one example - CWE's Natural
  Hierarchy (CWE-1000) is beginning to diverge more from the Seven
  Pernicious Kingdoms (SPK) way of organizing the world, so it might
  be reasonable to create a view into CWE that's useful for people who
  are knowledgeable about SPK.  The Natural Hierarchy and an SPK view
  would probably have a lot of different elements near the top of the
  tree, but they would share a lot at a lower level.

  With closely overlapping views, this would produce a large number of
  duplicate relationships that might contribute significantly to XML
  bloat.  The MITRE team decided that allowing multiple
  Relationship_View_IDs would be a useful shorthand that might be
  easier to maintain.



Current Challenges
------------------

Here are some of the current challenges that we still face, and plan
to resolve by CWE 1.0.

1) The Draft 9 schema does not have the expressiveness to define the
   more complex views, and there are some associated maintenance
   costs, as outlined in the previous sections.

2) Chains and composites, views, and categories all have some
   overlapping uses that we'd like to clarify and, to the degree
   possible, unify.

   For example, both chains and composites involve a small selection
   of entries from CWE, and dictate relationships between them.  In
   this sense, they can be regarded as views - perhaps micro-views.
   Yet, we expect that they will have a distinct and important role
   throughout CWE.

   As another example, the resource-based view (CWE-632) has children
   that are categories.  These categories might be best described by
   defining what their membership should be, but in Draft 9, this type
   of automatic population is only possible through filters in View
   elements.  So, we had to manually create ChildOf relationships.

3) Relationship Directionality

   Some views, like CWE-635 (Weaknesses Used by NVD), are defined more
   by external criteria than anything that is implicit within
   individual nodes, so these are explicit slices.  In terms of
   maintenance costs and ease of extraction, it might be best for
   CWE-635 to explicitly state what its "members" are.  Instead, each
   member has a ChildOf relationship, with View_ID=635, that is a
   ChildOf 635.  Thus, maintenance of the NVD slice is done not by
   operating on the slice itself, but by operating on its individual
   members.  This proved to be moderately expensive for us to do when
   we changed the membership of the SAMATE view in Draft 8 - it took
   an hour or so to edit some nodes to remove the SAMATE relationship,
   and then edit other nodes to add the SAMATE relationship; if we
   could just edit the SAMATE list directly, it would have been a
   5-minute task.  However, as I understand it, one of the mantras of
   knowledge management is that data is kept as close to individual
   nodes as possible; but relationships "belong" to multiple nodes,
   even though in Draft 9 they are only explicit in one node.

   It would be possible for us to automate some of those maintenance
   tasks, but that would involve additional development.

   Also, we have multiple relationships that are mutual, but only
   expressed in one direction.  For example, "X ChildOf Y" might be
   specified in the XML, which implies "Y ParentOf X" - but we have no
   ParentOf relationships that are explicitly stated.  The same thing
   applies for relationships that support chains and composites.  As a
   result, extraction logic can be complicated, because an entry
   doesn't explicitly know what its children are.  As a result of this
   complexity, the extraction logic can be hard to maintain, and
   sometimes computationally expensive.  We have encountered this
   problem in various ways while generating web site pages.

   One possibility would be to create separate XML files and
   representations for the relationships (and maybe for views),
   possibly with separate schema.  This might preserve expressiveness
   and simplify maintenance, but it might make it more difficult for
   some people to extract.

4) Named Chains

   There are a couple issues with named chains.  See
   http://cwe.mitre.org/data/reports/chains_and_composites.html for
   background.

   All the data that's required to determine the links of a named
   chain are within the XML, so there is sufficient expressiveness.
   However, extraction is a little more difficult.  For a named chain
   X, the code has to search throughout all of CWE for entries with
   all the CanPrecede relationships with a Chain_ID of X, then order
   them appropriately.  If you want to know what elements are in a
   named chain, you HAVE to do this navigation throughout CWE - a
   named chain does not explicitly state what its links are.  Just
   like composites explicitly state which items they require, it might
   be reasonable to have named chains explicitly know what their
   starting links are.

   In addition, named chains can be difficult to classify, especially
   under the natural hierarchy.  Because named chains are new, we
   decided not to create a separate view to handle them.  We do have a
   view that lists chain elements (CWE-679), but that view is actually
   an implicit slice for extracting components of all the CanPrecede
   relationships, whether they're related to a Named Chain or not.

   The extraction and presentation logic for presenting chains in
   general was too complicated for us to handle cleanly by the release
   of Draft 9, so they are generated by external programs, instead of
   through XSLT.