Mapping Compiler output to CWE's

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Mapping Compiler output to CWE's

Steve Grubb
Hello,

I am going through a large corpus of static analysis warnings from multiple
tools. In order to compare their abilities, I'd like to assign as many
warnings as possible to a CWE. In some cases, I don't find a good fit. So, I
assign it to a category that should probably have a specific CWE for it. This
gets things closer, but categories are not very precise. I'd like to describe
some warnings  and see if there is a better fit that I'm missing or if a new
CWE could be assigned.

* An enumerated type is used in a switch statement for which there is no
explicit case for the enumerated type.

I looked at 478 - Missing Default Case in Switch Statement, but this is not
quite the same issue. In this one, something is going to be handle by the
default case when it probably shouldn't be. It would probably belong under
CWE-1023.

* The destination of a call to a raw memory function such as "memset",
"memmove", or "memcpy" is an object of class type.

There are a lot of CWE's around buffer length and improper initialization,
but I don't see anything around direct memory manipulation for a non-trivial
object.

* A const char * string literal gets assigned to a char * argument where the
function tries to overwrite it - which will cause a segfault since it's
readonly memory.

There are a lot of CWE's related to string manipulation. But I can't find a
good candidate.

* A program is compiled with stack smashing protection enabled, but the code
uses a variable length array on the stack with length defined at runtime, so
stack smashing protection becomes disabled for that one function.

I looked at CWE-691 Insufficient Control Flow Management. But the child nodes
all seem to be unrelated. I also loked at CWE-119: Improper Restriction of
Operations within the Bounds of a Memory Buffer. But this is about a security
mitigation being disabled because of the programming style.

* A value is shifted by a negative number, or by a calculation that results
in a number larger than the number of bits for that variable.

I looked at CWE-682: Incorrect Calculation which is very close, but has no
child nodes that describe shift operations going wrong.

* C++11 introduced move semantics, but the code in question did not define one
and an assignment operator was found to be used.

There is CWE-480: Use of Incorrect Operator, which might be a good parent
node. And there is CWE-710 Improper Adherence to Coding Standards. Neither
seem like a good fit.

* A call was made to a function which performs a lookup or calculation, but
the returned value was forgotten to be assigned to a variable. The intended
variable is later used as if it were assigned.

This has different semantics from unchecked return code (252), because the
whole point of calling the function was to obtain the return value.  An
example might be a bare  strdup("test");  The whole point of calling strdup
is to have a character pointer assigned to use later. There are other
examples where a math calculation is being done on floating point numbers and
the returned calculation is not caught but the code pretends it has the
results.

Any discussion would be appreciated.

Thanks,
-Steve


Reply | Threaded
Open this post in threaded view
|

Re: Mapping Compiler output to CWE's

Martin Sebor
On 3/16/21 2:55 PM, Steve Grubb wrote:

> Hello,
>
> I am going through a large corpus of static analysis warnings from multiple
> tools. In order to compare their abilities, I'd like to assign as many
> warnings as possible to a CWE. In some cases, I don't find a good fit. So, I
> assign it to a category that should probably have a specific CWE for it. This
> gets things closer, but categories are not very precise. I'd like to describe
> some warnings  and see if there is a better fit that I'm missing or if a new
> CWE could be assigned.
>
> * An enumerated type is used in a switch statement for which there is no
> explicit case for the enumerated type.
>
> I looked at 478 - Missing Default Case in Switch Statement, but this is not
> quite the same issue. In this one, something is going to be handle by the
> default case when it probably shouldn't be. It would probably belong under
> CWE-1023.
>
> * The destination of a call to a raw memory function such as "memset",
> "memmove", or "memcpy" is an object of class type.
>
> There are a lot of CWE's around buffer length and improper initialization,
> but I don't see anything around direct memory manipulation for a non-trivial
> object.
>
> * A const char * string literal gets assigned to a char * argument where the
> function tries to overwrite it - which will cause a segfault since it's
> readonly memory.
>
> There are a lot of CWE's related to string manipulation. But I can't find a
> good candidate.
>
> * A program is compiled with stack smashing protection enabled, but the code
> uses a variable length array on the stack with length defined at runtime, so
> stack smashing protection becomes disabled for that one function.
>
> I looked at CWE-691 Insufficient Control Flow Management. But the child nodes
> all seem to be unrelated. I also loked at CWE-119: Improper Restriction of
> Operations within the Bounds of a Memory Buffer. But this is about a security
> mitigation being disabled because of the programming style.
>
> * A value is shifted by a negative number, or by a calculation that results
> in a number larger than the number of bits for that variable.
>
> I looked at CWE-682: Incorrect Calculation which is very close, but has no
> child nodes that describe shift operations going wrong.
>
> * C++11 introduced move semantics, but the code in question did not define one
> and an assignment operator was found to be used.
>
> There is CWE-480: Use of Incorrect Operator, which might be a good parent
> node. And there is CWE-710 Improper Adherence to Coding Standards. Neither
> seem like a good fit.
>
> * A call was made to a function which performs a lookup or calculation, but
> the returned value was forgotten to be assigned to a variable. The intended
> variable is later used as if it were assigned.
>
> This has different semantics from unchecked return code (252), because the
> whole point of calling the function was to obtain the return value.  An
> example might be a bare  strdup("test");  The whole point of calling strdup
> is to have a character pointer assigned to use later. There are other
> examples where a math calculation is being done on floating point numbers and
> the returned calculation is not caught but the code pretends it has the
> results.
>
> Any discussion would be appreciated.

The slides below describe the problem and some of the solutions
(work in progress):

   https://samate.nist.gov/SATE6/14.Framework.pdf

Martin

>
> Thanks,
> -Steve
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Mapping Compiler output to CWE's

Steven M Christey
Thank you, Martin and Steve, for raising this.

Outside of a more framework-oriented approach as covered in the SATE document that Martin listed (which I believe has lots of potential), here are a couple additional comments on Steve's original inquiry.

> * An enumerated type is used in a switch statement for which there is
> no explicit case for the enumerated type.
>
> I looked at 478 - Missing Default Case in Switch Statement, but this
> is not quite the same issue. In this one, something is going to be
> handle by the default case when it probably shouldn't be. It would
> probably belong under CWE-1023.

Agreed that CWE-478 is inappropriate.

CWE-1023 doesn't quite seem like a great match either, because that entry is about "missing factors" and here we have something like an "incomplete set of possibilities being considered for one factor," under which CWE-478 would also fall.  That leaves us with CWE-697, which is the Pillar-level "Incorrect Comparison."  So, we might need to create a new entry here.

> * The destination of a call to a raw memory function such as "memset",
> "memmove", or "memcpy" is an object of class type.
>
> There are a lot of CWE's around buffer length and improper
> initialization, but I don't see anything around direct memory
> manipulation for a non-trivial object.

Thinking out loud, I wonder if this can be regarded as equivalent to CWE-843: Access of Resource Using Incompatible Type ('Type Confusion').

> * A const char * string literal gets assigned to a char * argument
> where the function tries to overwrite it - which will cause a segfault
> since it's readonly memory.
>
> There are a lot of CWE's related to string manipulation. But I can't
> find a good candidate.

Nothing springs immediately to mind. We will investigate.

> * A program is compiled with stack smashing protection enabled, but
> the code uses a variable length array on the stack with length defined
> at runtime, so stack smashing protection becomes disabled for that one function.
>
> I looked at CWE-691 Insufficient Control Flow Management. But the
> child nodes all seem to be unrelated. I also loked at CWE-119:
> Improper Restriction of Operations within the Bounds of a Memory
> Buffer. But this is about a security mitigation being disabled because of the programming style.

This is also interesting. I'm not sure we have anything, which might live under the Pillar CWE-693: Protection Mechanism Failure.

> * A value is shifted by a negative number, or by a calculation that
> results in a number larger than the number of bits for that variable.
>
> I looked at CWE-682: Incorrect Calculation which is very close, but
> has no child nodes that describe shift operations going wrong.

Thank you for bringing this up. This is a known kind of issue I've wanted to cover for a while; we'll add it to the to-do list.

> * C++11 introduced move semantics, but the code in question did not
> define one and an assignment operator was found to be used.
>
> There is CWE-480: Use of Incorrect Operator, which might be a good
> parent node. And there is CWE-710 Improper Adherence to Coding
> Standards. Neither seem like a good fit.

In general, I think we will need to improve our coverage of assignment.

> * A call was made to a function which performs a lookup or
> calculation, but the returned value was forgotten to be assigned to a
> variable. The intended variable is later used as if it were assigned.
>
> This has different semantics from unchecked return code (252), because
> the whole point of calling the function was to obtain the return
> value.  An example might be a bare  strdup("test");  The whole point
> of calling strdup is to have a character pointer assigned to use
> later. There are other examples where a math calculation is being done
> on floating point numbers and the returned calculation is not caught
> but the code pretends it has the results.
>
> Any discussion would be appreciated.

I think there's a relationship with CWE-456: Missing Initialization of a Variable, but this CWE might be re-evaluated or even deprecated because it is used in multiple ways. One approach would be to specifically talk about "assignment to a variable" as distinct from "initialization of the contents of [memory or other resource that has been assigned]."

Thanks again for the inquiry, as this will help us fill in some additional gaps within CWE.

- Steve

Reply | Threaded
Open this post in threaded view
|

Re: Mapping Compiler output to CWE's

Steve Grubb
In reply to this post by Martin Sebor
On Wednesday, March 17, 2021 11:05:59 AM EDT Martin Sebor wrote:
> > Any discussion would be appreciated.
>
> The slides below describe the problem and some of the solutions
> (work in progress):
>
>    https://samate.nist.gov/SATE6/14.Framework.pdf

That is an interesting read. Thanks for sharing.

But I wonder if the causes and consequences can truly ever be understood? The
cause could simply be that the developer got interrupted by a phone call and
lost their place in the code.

I agree that it would be nice to have a well defined taxonomy that is
structured and describes everything so that any software fault can be
accurately and easily classified.

For my purposes, I need to map everything back to a top level node under
CWE-699 since that seems to have about the right high level abstraction. I
have looked at the CWE XML files and really wished that "resource" in fact
pointed back to that top level node so that CWE's can be easily grouped for
study.

Best Regards,
-Steve


Reply | Threaded
Open this post in threaded view
|

Re: Mapping Compiler output to CWE's

Steve Grubb
In reply to this post by Steven M Christey
Hello,

On Wednesday, March 17, 2021 12:04:50 PM EDT Steven M Christey wrote:
> Thank you, Martin and Steve, for raising this.
>
> Outside of a more framework-oriented approach as covered in the SATE
> document that Martin listed (which I believe has lots of potential), here
> are a couple additional comments on Steve's original inquiry.

Thank you for the discussion below. Much appreciated

<snip>

> > * The destination of a call to a raw memory function such as "memset",
> > "memmove", or "memcpy" is an object of class type.
> >
> > There are a lot of CWE's around buffer length and improper
> > initialization, but I don't see anything around direct memory
> > manipulation for a non-trivial object.
>
>
> Thinking out loud, I wonder if this can be regarded as equivalent to
> CWE-843: Access of Resource Using Incompatible Type ('Type Confusion').
 
I could see that as a possibility. But I tend to think of type confusion as
something where the software looses track of what kind of object its dealing
with. Where this is knowingly copying/clearing classes without taking care to
get all the pieces.  :-)  I'll pencil in type confusion.

<snip>

> > * A program is compiled with stack smashing protection enabled, but
> > the code uses a variable length array on the stack with length defined
> > at runtime, so stack smashing protection becomes disabled for that one
> > function.
 
> > I looked at CWE-691 Insufficient Control Flow Management. But the
> > child nodes all seem to be unrelated. I also loked at CWE-119:
> > Improper Restriction of Operations within the Bounds of a Memory
> > Buffer. But this is about a security mitigation being disabled because of
> > the programming style.
>
> This is also interesting. I'm not sure we have anything, which might live
> under the Pillar CWE-693: Protection Mechanism Failure.

There are a couple other low level compiler warnings that I wonder where they
might live. Examples are function attributes getting discarded, breaking
strict aliasing, and macro expansion including "defined". These account for
roughly 9% of all warnings.  What I'm doing for the moment is defining my own  
CWE in the 9000 range to "park" the classification somewhere.

<snip>

> > * A call was made to a function which performs a lookup or
> > calculation, but the returned value was forgotten to be assigned to a
> > variable. The intended variable is later used as if it were assigned.
> >
> > This has different semantics from unchecked return code (252), because
> > the whole point of calling the function was to obtain the return
> > value.  An example might be a bare  strdup("test");  The whole point
> > of calling strdup is to have a character pointer assigned to use
> > later. There are other examples where a math calculation is being done
> > on floating point numbers and the returned calculation is not caught
> > but the code pretends it has the results.
> >
> > Any discussion would be appreciated.
>
>
> I think there's a relationship with CWE-456: Missing Initialization of a
> Variable, but this CWE might be re-evaluated or even deprecated because it
> is used in multiple ways. One approach would be to specifically talk about
> "assignment to a variable" as distinct from "initialization of the
> contents of [memory or other resource that has been assigned]."

I suppose you could see it that way. I'll pencil that in for the moment.

> Thanks again for the inquiry, as this will help us fill in some additional
> gaps within CWE.

I have a whole other discussion I'd like to have around classifying shell
script defects. I find that picking the right CWE when faced with shell script
findings leads to some interesting tradeoffs. I'll start that another day.

Best Regards,
-Steve