Missing CWE category -- Lockup

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Missing CWE category -- Lockup

Jiri Slaby
Hello,

I'm working on a system which contains kernel bugs. It refers to the
types of errors present in the CWE database. Unfortunately I cannot find
one error type in there.

I call it "inatomic operation in atomic". This one leads to a lockup of
the system. Hence I would expect some kind of "Lockup" error in CWE. But
neither that one I could find.

A simplest example is as follows:
void lockup(void)
{
  unsigned long flags;

  local_irq_save(flags);
  msleep(10);
  local_irq_restore(flags);
}

In that case interrupts are disabled (including timer), so there is
nobody to wake up the sleeping process. And system locks up. On
uniprocessor, there is no recovery possible.

For instance, a patch fixing such bug in the kernel is at:
https://lkml.org/lkml/2011/4/20/62

Would "Lockup" be a candidate for adding to the CWE error types database?

thanks,
--
js
Reply | Threaded
Open this post in threaded view
|

Re: Missing CWE category -- Lockup

Kurt Seifried
Agreed, there's quite a few lockup issues in CVE:

http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=lockup

and so on, but we also have ones like:

CVE-2007-0997,Candidate,"Race condition in the tee (sys_tee) system
call in the Linux kernel 2.6.17 through 2.6.17.6 might allow local
users to cause a denial of service (system crash), obtain sensitive
information (kernel memory contents), or gain privileges via
unspecified vectors related to a potentially dropped ipipe lock during
a race between two pipe readers."

So there would appear to be at least a few types of lockups, however
some may already be covered by things like "race condition", so maybe
update the descriptions for them?

2011/10/21 Jiri Slaby <[hidden email]>:

> Hello,
>
> I'm working on a system which contains kernel bugs. It refers to the
> types of errors present in the CWE database. Unfortunately I cannot find
> one error type in there.
>
> I call it "inatomic operation in atomic". This one leads to a lockup of
> the system. Hence I would expect some kind of "Lockup" error in CWE. But
> neither that one I could find.
>
> A simplest example is as follows:
> void lockup(void)
> {
>  unsigned long flags;
>
>  local_irq_save(flags);
>  msleep(10);
>  local_irq_restore(flags);
> }
>
> In that case interrupts are disabled (including timer), so there is
> nobody to wake up the sleeping process. And system locks up. On
> uniprocessor, there is no recovery possible.
>
> For instance, a patch fixing such bug in the kernel is at:
> https://lkml.org/lkml/2011/4/20/62
>
> Would "Lockup" be a candidate for adding to the CWE error types database?
>
> thanks,
> --
> js
>



--
Kurt Seifried
[hidden email]
skype: (206) 905-9462
Reply | Threaded
Open this post in threaded view
|

Re: Missing CWE category -- Lockup

Jiri Slaby
On 10/21/2011 10:28 PM, Kurt Seifried wrote:

> Agreed, there's quite a few lockup issues in CVE:
>
> http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=lockup
>
> and so on, but we also have ones like:
>
> CVE-2007-0997,Candidate,"Race condition in the tee (sys_tee) system
> call in the Linux kernel 2.6.17 through 2.6.17.6 might allow local
> users to cause a denial of service (system crash), obtain sensitive
> information (kernel memory contents), or gain privileges via
> unspecified vectors related to a potentially dropped ipipe lock during
> a race between two pipe readers."
>
> So there would appear to be at least a few types of lockups, however
> some may already be covered by things like "race condition", so maybe
> update the descriptions for them?

Sorry, I fail to see the lockup keyword in CVE-2007-0997. Race
conditions are rather connected to (missing) locks/locking.

But lockup is that a system becomes completely dead. E.g. because of
infinite loop or the case I described below.

> 2011/10/21 Jiri Slaby <[hidden email]>:
>> Hello,
>>
>> I'm working on a system which contains kernel bugs. It refers to the
>> types of errors present in the CWE database. Unfortunately I cannot find
>> one error type in there.
>>
>> I call it "inatomic operation in atomic". This one leads to a lockup of
>> the system. Hence I would expect some kind of "Lockup" error in CWE. But
>> neither that one I could find.
>>
>> A simplest example is as follows:
>> void lockup(void)
>> {
>>  unsigned long flags;
>>
>>  local_irq_save(flags);
>>  msleep(10);
>>  local_irq_restore(flags);
>> }
>>
>> In that case interrupts are disabled (including timer), so there is
>> nobody to wake up the sleeping process. And system locks up. On
>> uniprocessor, there is no recovery possible.
>>
>> For instance, a patch fixing such bug in the kernel is at:
>> https://lkml.org/lkml/2011/4/20/62
>>
>> Would "Lockup" be a candidate for adding to the CWE error types database?
>>
>> thanks,
--
js
Reply | Threaded
Open this post in threaded view
|

Re: Missing CWE category -- Lockup

Steven M. Christey-3
In reply to this post by Jiri Slaby
On Fri, 21 Oct 2011, Jiri Slaby wrote:

> I call it "inatomic operation in atomic". This one leads to a lockup of
> the system. Hence I would expect some kind of "Lockup" error in CWE. But
> neither that one I could find.

This is a very intriguing example that touches on a few areas of CWE that
are not well-researched.  I agree that there are gaps in CWE that don't
really cover the core problems in this code example.  But, it's also
difficult to decide what the core problems are (which seems to happen with
a lot of logic-related issues.)

Note - I would treat "lockup" as a description of the consequence of
exploiting a weakness - the software basically "hangs" and does not
progress to execute new instructions.  We have CWE's for things like
infinite loops (CWE-835) and excessive iteration (CWE-834); seems like the
*implementation* of msleep has an infinite loop, so you could use that for
your code example.

However, using CWE-835 would only cover the far end of a weakness chain,
i.e. the infinite loop from msleep() is resultant; other "primary" CWEs
might describe the root cause that leads to the lockup.

The concepts you're describing seem to be similar, but not exactly the
same as, CWE-667: Improper Locking (and its children, such as CWE-833:
Deadlock).  Disabling hard interrupts effectively "locks" access to a
resource (e.g. the CPU) - note that here, the "lock" term is a different
concept than than "lockup".

There are probably also relationships with CWE-662 (Improper
Synchronization) since disabling interrupts is effectively a form of
synchronization.

CWE does not have much coverage related to interrupts/timers/events,
although I've suspected we need to.  We have a very generic CWE-691:
Insufficient Control Flow Management, which might be appropriate here.
But still not perfect.

CWE also does not really cover atomic/non-atomic operations except in how
they are related to synchronization/race conditions.  This is one area of
CWE that could use some theoretical research.  It's not clear to me why
your example is "inatomic operation in atomic" - atomicity doesn't seem to
be a significant element of this example.

One critical aspect of this example that isn't covered in CWE, is when the
software inadvertently restricts access too much.  Much of CWE is about
giving too much access (see the very-general CWE-402, CWE-610, CWE-668,
CWE-669, etc.)  Here, by disabling interrupts, the program is effectively
removing its own "privileges" that are needed for the msleep().  I think
the only CWE we have for "restricting access too much" is CWE-280:
Improper Handling of Insufficient Permissions or Privileges.  But that's
not what's going on with the msleep example.

Moving this to a higher level than interrupts/msleep, it seems we could
create a new CWE entry that involves "excessive restriction of access to a
resource."  That's more appropriate if we want to say that the use of
local_irq_save is the weakness.  If we want to blame the use of msleep,
then that's a different story... maybe something similar to CWE-280:
Improper Handling of Insufficient Permissions or Privileges.

- Steve



> A simplest example is as follows:
> void lockup(void)
> {
>  unsigned long flags;
>
>  local_irq_save(flags);
>  msleep(10);
>  local_irq_restore(flags);
> }
>
> In that case interrupts are disabled (including timer), so there is
> nobody to wake up the sleeping process. And system locks up. On
> uniprocessor, there is no recovery possible.
>
> For instance, a patch fixing such bug in the kernel is at:
> https://lkml.org/lkml/2011/4/20/62
>
> Would "Lockup" be a candidate for adding to the CWE error types database?
>
> thanks,
> --
> js
>
Reply | Threaded
Open this post in threaded view
|

Re: Missing CWE category -- Lockup

Jiri Slaby
On 10/21/2011 11:30 PM, Steven M. Christey wrote:

>
> On Fri, 21 Oct 2011, Jiri Slaby wrote:
>
>> I call it "inatomic operation in atomic". This one leads to a lockup of
>> the system. Hence I would expect some kind of "Lockup" error in CWE. But
>> neither that one I could find.
>
> This is a very intriguing example that touches on a few areas of CWE
> that are not well-researched.  I agree that there are gaps in CWE that
> don't really cover the core problems in this code example.  But, it's
> also difficult to decide what the core problems are (which seems to
> happen with a lot of logic-related issues.)

Aha, makes sense. The lockup is only a resulting behavior, agreed.

> Note - I would treat "lockup" as a description of the consequence of
> exploiting a weakness - the software basically "hangs" and does not
> progress to execute new instructions.  We have CWE's for things like
> infinite loops (CWE-835) and excessive iteration (CWE-834); seems like
> the *implementation* of msleep has an infinite loop, so you could use
> that for your code example.
>
> However, using CWE-835 would only cover the far end of a weakness chain,
> i.e. the infinite loop from msleep() is resultant; other "primary" CWEs
> might describe the root cause that leads to the lockup.
>
> The concepts you're describing seem to be similar, but not exactly the
> same as, CWE-667: Improper Locking (and its children, such as CWE-833:
> Deadlock).  Disabling hard interrupts effectively "locks" access to a
> resource (e.g. the CPU) - note that here, the "lock" term is a different
> concept than than "lockup".
>
> There are probably also relationships with CWE-662 (Improper
> Synchronization) since disabling interrupts is effectively a form of
> synchronization.
>
> CWE does not have much coverage related to interrupts/timers/events,
> although I've suspected we need to.  We have a very generic CWE-691:
> Insufficient Control Flow Management, which might be appropriate here.
> But still not perfect.
>
> CWE also does not really cover atomic/non-atomic operations except in
> how they are related to synchronization/race conditions.  This is one
> area of CWE that could use some theoretical research.  It's not clear to
> me why your example is "inatomic operation in atomic" - atomicity
> doesn't seem to be a significant element of this example.

In the kernel, spinlocks or disabled interrupts mean you are in an
atomic context. You are allowed to perform atomic (non-sleeping)
operations only.

> One critical aspect of this example that isn't covered in CWE, is when
> the software inadvertently restricts access too much.  Much of CWE is
> about giving too much access (see the very-general CWE-402, CWE-610,
> CWE-668, CWE-669, etc.)  Here, by disabling interrupts, the program is
> effectively removing its own "privileges" that are needed for the
> msleep().  I think the only CWE we have for "restricting access too
> much" is CWE-280: Improper Handling of Insufficient Permissions or
> Privileges.  But that's not what's going on with the msleep example.
>
> Moving this to a higher level than interrupts/msleep, it seems we could
> create a new CWE entry that involves "excessive restriction of access to
> a resource."  That's more appropriate if we want to say that the use of
> local_irq_save is the weakness.  If we want to blame the use of msleep,
> then that's a different story... maybe something similar to CWE-280:
> Improper Handling of Insufficient Permissions or Privileges.

Maybe I would call for something like "performing operation in an
invalid context" for msleep. Like some functions are allowed to be
called from one context, but not from another. I found "CONSPEC:
Context-specific Issues" which might be it, but it's not specifically a
CWE. And it's not strictly what I mean. But anyway it may fit.

thanks,
--
js