Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Wheeler, David A
I propose that a new CWE be created to cover "Machine learning algorithm vulnerable to adversarial input" (or some similar name).  More details below.

On July 18, 2017, I sent an email stating that "classifiers based on machine learning (ML) can be vulnerable to adversarial data, such as adversarial images" & pointed to a number of articles demonstrating this.  As a result, I proposed that a CWE be added to cover this case.  Unfortunately, I may have accidentally sent that email to the wrong email address, so I'm incorporating that email below in case it got lost.

In any case, I think the situation is getting even *worse* than when I wrote that email.  See the paper "Adversarial Patch" by Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer, submitted on 27 Dec 2017, <https://arxiv.org/abs/1712.09665>.  In this case, you can just print out some odd-looking figure, and it will force something to be classified as something else.

We're now building vehicles that will *depend* on machine learning, yet these algorithms have *known* issues - especially if they are not designed to counter adversarial images.

I think it is *vital* that CWE add one or more weaknesses that cover "Machine learning algorithm vulnerable to adversarial input" (or some similar name).  This weakness may kill people if unaddressed, and I think putting this into the CWE set will increase the likelihood that people will address it.

--- David A. Wheeler

======

Here's the previous email "RE: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning" I originally sent on July 18, 2017:

There's been spectacular growth in the use of machine learning techniques (e.g., neural networks) to classify images and other data.  These are core to self-driving cars, for example.

Unfortunately, classifiers based on machine learning (ML) can be vulnerable to adversarial data, such as adversarial images.  In many cases, people are training ML on "good" data but not thinking about what happens when an attacker shows up, and the resulting systems are vulnerable.  Sounds familiar :-).

The basic problems is that attackers can create adversarial inputs that fool classifiers based on machine learning.  In years past it was thought that adversarial data would only work in "lab conditions" but unfortunately that appears to be false.  Some additional reading is here, which I hope will be convincing evidence that this is a widely-known weakness that repeatedly appears:
  https://blog.openai.com/robust-adversarial-inputs/
  https://blog.openai.com/adversarial-example-research/
  http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html
  https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-fooling-attacks-artificial-intelligence
  http://www.evolvingai.org/fooling

Like many problems, there doesn't seem to be an easy perfect solution.  However, part of the solution appears to be that developers using ML classifiers need to track adversarial input research, develop their own adversarial data (such as adversarial images), and train their systems to be less likely to be fooled by them (this sometimes has the side-effect of improving non-adversarial classification).  Having a CWE would make it easier to capture & improve on countermeasure best practices.

Before trying to write up a detailed CWE entry - can people agree that there's a real weakness here?  Not everyone is using ML classifiers, but they've become widespread in important domains, and are likely to become important to human safety.  What's more, countermeasures exist - but people have to know that there's a problem.  Given all that, I think it's time to add them to the CWE list.

Thoughts? Thanks.

--- David A. Wheeler

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Christey, Steven M.
David,

This is a very interesting suggestion, and maybe a new entry is appropriate.

Here's a walkthrough of some of my thinking on the topic, which will hopefully help demonstrate some of the classification/description process I try to use for new CWE entries.

One might crudely say that this is a problem with "input validation" (CWE-20), but that's rather simplistic.

If the classifiers are used in authentication, then maybe it is just an interesting, domain-specific example of CWE-287 (Improper Authentication).  But, "adversarial inputs" could be used in many different scenarios, not just authentication.

We have CWE-804 (Guessable CAPTCHA), which outlines various CAPTCHA methods, including audio/visual methods; but that weakness involves the generation of *easily recognizable* images, versus *incorrectly-recognizable* images as you're outlining.

Finally, there's CWE-843 ("Type Confusion") which sounds like it could be related, but is explicitly about programming language data types.

So none of these are a good match, and we could write a new entry.

First, we want to look beyond the attack-oriented names and descriptions, and try to identify the underlying weakness(es) based on (potentially-incorrect) behaviors, as they apply to resources, which are expected to have certain properties.  The "Adversarial Patch" paper (and term) is attack-oriented.  What is being attacked?  There's a behavior of "classifying" visual data (an image resource) whose property is incorrectly calculated to be X ("a toaster") when it is actually Y ("a dog").

From the perspective of an affected application, the different techniques for generating an adversarial patch are effectively part of the attack - analogous, perhaps, to how somebody can construct a buffer overflow exploit using a long string of "A" characters, versus a nop sled and shellcode, versus ROP programming - all those overflow techniques are ways of modifying the inputs to give the attacker greater control after the overflow has occurred (what vuln theory calls "facilitator manipulations.")

Where possible, a higher-level CWE should stay away from talking about a specific algorithmic technique.  So, for example, whether a "classifier" technique is implemented using machine learning or some other algorithm is relatively irrelevant.

So, a new entry might look something like:

  Name: "Incorrect Automated Image Classification"

  Summary: the product automatically performs image classification, but it does not correctly classify a modified image, which may have security-relevant impacts.

This obviously would need better verbiage, more detailed explanations, and some specific examples or references like what you've provided.  And we'd want to be careful to ensure that the entry is written in a way that separates it from simplistic, automated type detection such as "the program looks at the magic bytes of a file to determine whether it's a GIF or JPG."  (Or maybe that should be acceptable under this entry?)  And the "PoC||GTFO" publications come to mind.

There's an abstraction question here too, since other types of "input stimuli" such as audio might also be used in security-relevant contexts (say, speech recognition).  It's not immediately clear to me how we could term such general phenomena while omitting simple types like strings and integers.  And having a generic parent named something like "Incorrect Classification" seems too likely to be misused since it gets too close to being assumed to be talking about classified data and information assurance.

Hope that's enough to get some more discussion going.  Thanks, David!

- Steve


> -----Original Message-----
> From: [hidden email] [mailto:owner-cwe-research-
> [hidden email]] On Behalf Of Wheeler, David A
> Sent: Wednesday, January 10, 2018 4:01 PM
> To: cwe-research-list CWE Research Discussion <cwe-research-
> [hidden email]>
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
> adversarial inputs (adversarial machine learning)
>
> I propose that a new CWE be created to cover "Machine learning algorithm
> vulnerable to adversarial input" (or some similar name).  More details below.
>
> On July 18, 2017, I sent an email stating that "classifiers based on machine
> learning (ML) can be vulnerable to adversarial data, such as adversarial images"
> & pointed to a number of articles demonstrating this.  As a result, I proposed
> that a CWE be added to cover this case.  Unfortunately, I may have accidentally
> sent that email to the wrong email address, so I'm incorporating that email
> below in case it got lost.
>
> In any case, I think the situation is getting even *worse* than when I wrote that
> email.  See the paper "Adversarial Patch" by Tom B. Brown, Dandelion Mané,
> Aurko Roy, Martín Abadi, Justin Gilmer, submitted on 27 Dec 2017,
> <https://arxiv.org/abs/1712.09665>.  In this case, you can just print out some
> odd-looking figure, and it will force something to be classified as something
> else.
>
> We're now building vehicles that will *depend* on machine learning, yet these
> algorithms have *known* issues - especially if they are not designed to counter
> adversarial images.
>
> I think it is *vital* that CWE add one or more weaknesses that cover "Machine
> learning algorithm vulnerable to adversarial input" (or some similar name).  This
> weakness may kill people if unaddressed, and I think putting this into the CWE
> set will increase the likelihood that people will address it.
>
> --- David A. Wheeler
>
> ======
>
> Here's the previous email "RE: Proposed new CWE: Machine learning classifier
> vulnerable to adversarial inputs (adversarial machine learning" I originally sent
> on July 18, 2017:
>
> There's been spectacular growth in the use of machine learning techniques (e.g.,
> neural networks) to classify images and other data.  These are core to self-
> driving cars, for example.
>
> Unfortunately, classifiers based on machine learning (ML) can be vulnerable to
> adversarial data, such as adversarial images.  In many cases, people are training
> ML on "good" data but not thinking about what happens when an attacker
> shows up, and the resulting systems are vulnerable.  Sounds familiar :-).
>
> The basic problems is that attackers can create adversarial inputs that fool
> classifiers based on machine learning.  In years past it was thought that
> adversarial data would only work in "lab conditions" but unfortunately that
> appears to be false.  Some additional reading is here, which I hope will be
> convincing evidence that this is a widely-known weakness that repeatedly
> appears:
>   https://blog.openai.com/robust-adversarial-inputs/
>   https://blog.openai.com/adversarial-example-research/
>   http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-
> misconceptions.html
>   https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-
> fooling-attacks-artificial-intelligence
>   http://www.evolvingai.org/fooling
>
> Like many problems, there doesn't seem to be an easy perfect solution.
> However, part of the solution appears to be that developers using ML classifiers
> need to track adversarial input research, develop their own adversarial data
> (such as adversarial images), and train their systems to be less likely to be fooled
> by them (this sometimes has the side-effect of improving non-adversarial
> classification).  Having a CWE would make it easier to capture & improve on
> countermeasure best practices.
>
> Before trying to write up a detailed CWE entry - can people agree that there's a
> real weakness here?  Not everyone is using ML classifiers, but they've become
> widespread in important domains, and are likely to become important to human
> safety.  What's more, countermeasures exist - but people have to know that
> there's a problem.  Given all that, I think it's time to add them to the CWE list.
>
> Thoughts? Thanks.
>
> --- David A. Wheeler
>
> To unsubscribe, send an email message to [hidden email] with
> SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have
> difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Kurt Seifried
Can I suggest we do an overarching CWE ala CWE-20 and then more specific ones as new attacks and so on come out.


-Kurt





> On Jan 11, 2018, at 10:15, Christey, Steven M. <[hidden email]> wrote:
>
> David,
>
> This is a very interesting suggestion, and maybe a new entry is appropriate.
>
> Here's a walkthrough of some of my thinking on the topic, which will hopefully help demonstrate some of the classification/description process I try to use for new CWE entries.
>
> One might crudely say that this is a problem with "input validation" (CWE-20), but that's rather simplistic.
>
> If the classifiers are used in authentication, then maybe it is just an interesting, domain-specific example of CWE-287 (Improper Authentication).  But, "adversarial inputs" could be used in many different scenarios, not just authentication.
>
> We have CWE-804 (Guessable CAPTCHA), which outlines various CAPTCHA methods, including audio/visual methods; but that weakness involves the generation of *easily recognizable* images, versus *incorrectly-recognizable* images as you're outlining.
>
> Finally, there's CWE-843 ("Type Confusion") which sounds like it could be related, but is explicitly about programming language data types.
>
> So none of these are a good match, and we could write a new entry.
>
> First, we want to look beyond the attack-oriented names and descriptions, and try to identify the underlying weakness(es) based on (potentially-incorrect) behaviors, as they apply to resources, which are expected to have certain properties.  The "Adversarial Patch" paper (and term) is attack-oriented.  What is being attacked?  There's a behavior of "classifying" visual data (an image resource) whose property is incorrectly calculated to be X ("a toaster") when it is actually Y ("a dog").
>
> From the perspective of an affected application, the different techniques for generating an adversarial patch are effectively part of the attack - analogous, perhaps, to how somebody can construct a buffer overflow exploit using a long string of "A" characters, versus a nop sled and shellcode, versus ROP programming - all those overflow techniques are ways of modifying the inputs to give the attacker greater control after the overflow has occurred (what vuln theory calls "facilitator manipulations.")
>
> Where possible, a higher-level CWE should stay away from talking about a specific algorithmic technique.  So, for example, whether a "classifier" technique is implemented using machine learning or some other algorithm is relatively irrelevant.
>
> So, a new entry might look something like:
>
>  Name: "Incorrect Automated Image Classification"
>
>  Summary: the product automatically performs image classification, but it does not correctly classify a modified image, which may have security-relevant impacts.
>
> This obviously would need better verbiage, more detailed explanations, and some specific examples or references like what you've provided.  And we'd want to be careful to ensure that the entry is written in a way that separates it from simplistic, automated type detection such as "the program looks at the magic bytes of a file to determine whether it's a GIF or JPG."  (Or maybe that should be acceptable under this entry?)  And the "PoC||GTFO" publications come to mind.
>
> There's an abstraction question here too, since other types of "input stimuli" such as audio might also be used in security-relevant contexts (say, speech recognition).  It's not immediately clear to me how we could term such general phenomena while omitting simple types like strings and integers.  And having a generic parent named something like "Incorrect Classification" seems too likely to be misused since it gets too close to being assumed to be talking about classified data and information assurance.
>
> Hope that's enough to get some more discussion going.  Thanks, David!
>
> - Steve
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:owner-cwe-research-
>> [hidden email]] On Behalf Of Wheeler, David A
>> Sent: Wednesday, January 10, 2018 4:01 PM
>> To: cwe-research-list CWE Research Discussion <cwe-research-
>> [hidden email]>
>> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
>> adversarial inputs (adversarial machine learning)
>>
>> I propose that a new CWE be created to cover "Machine learning algorithm
>> vulnerable to adversarial input" (or some similar name).  More details below.
>>
>> On July 18, 2017, I sent an email stating that "classifiers based on machine
>> learning (ML) can be vulnerable to adversarial data, such as adversarial images"
>> & pointed to a number of articles demonstrating this.  As a result, I proposed
>> that a CWE be added to cover this case.  Unfortunately, I may have accidentally
>> sent that email to the wrong email address, so I'm incorporating that email
>> below in case it got lost.
>>
>> In any case, I think the situation is getting even *worse* than when I wrote that
>> email.  See the paper "Adversarial Patch" by Tom B. Brown, Dandelion Mané,
>> Aurko Roy, Martín Abadi, Justin Gilmer, submitted on 27 Dec 2017,
>> <https://arxiv.org/abs/1712.09665>. In this case, you can just print out some
>> odd-looking figure, and it will force something to be classified as something
>> else.
>>
>> We're now building vehicles that will *depend* on machine learning, yet these
>> algorithms have *known* issues - especially if they are not designed to counter
>> adversarial images.
>>
>> I think it is *vital* that CWE add one or more weaknesses that cover "Machine
>> learning algorithm vulnerable to adversarial input" (or some similar name).  This
>> weakness may kill people if unaddressed, and I think putting this into the CWE
>> set will increase the likelihood that people will address it.
>>
>> --- David A. Wheeler
>>
>> ======
>>
>> Here's the previous email "RE: Proposed new CWE: Machine learning classifier
>> vulnerable to adversarial inputs (adversarial machine learning" I originally sent
>> on July 18, 2017:
>>
>> There's been spectacular growth in the use of machine learning techniques (e.g.,
>> neural networks) to classify images and other data.  These are core to self-
>> driving cars, for example.
>>
>> Unfortunately, classifiers based on machine learning (ML) can be vulnerable to
>> adversarial data, such as adversarial images.  In many cases, people are training
>> ML on "good" data but not thinking about what happens when an attacker
>> shows up, and the resulting systems are vulnerable.  Sounds familiar :-).
>>
>> The basic problems is that attackers can create adversarial inputs that fool
>> classifiers based on machine learning.  In years past it was thought that
>> adversarial data would only work in "lab conditions" but unfortunately that
>> appears to be false.  Some additional reading is here, which I hope will be
>> convincing evidence that this is a widely-known weakness that repeatedly
>> appears:
>>  https://blog.openai.com/robust-adversarial-inputs/
>>  https://blog.openai.com/adversarial-example-research/
>>  http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-
>> misconceptions.html
>>  https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-
>> fooling-attacks-artificial-intelligence
>>  http://www.evolvingai.org/fooling
>>
>> Like many problems, there doesn't seem to be an easy perfect solution.
>> However, part of the solution appears to be that developers using ML classifiers
>> need to track adversarial input research, develop their own adversarial data
>> (such as adversarial images), and train their systems to be less likely to be fooled
>> by them (this sometimes has the side-effect of improving non-adversarial
>> classification).  Having a CWE would make it easier to capture & improve on
>> countermeasure best practices.
>>
>> Before trying to write up a detailed CWE entry - can people agree that there's a
>> real weakness here?  Not everyone is using ML classifiers, but they've become
>> widespread in important domains, and are likely to become important to human
>> safety.  What's more, countermeasures exist - but people have to know that
>> there's a problem.  Given all that, I think it's time to add them to the CWE list.
>>
>> Thoughts? Thanks.
>>
>> --- David A. Wheeler
>>
>> To unsubscribe, send an email message to [hidden email] with
>> SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have
>> difficulties, write to [hidden email].
>
> To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Wheeler, David A
> From: Kurt Seifried [mailto:[hidden email]]
> Can I suggest we do an overarching CWE ala CWE-20 and then more specific
> ones as new attacks and so on come out.

If there's reasonable overarching one, great, but I don't think CWE-20 (input validation) is it.
Input validation usually says, "only allow data in that matches certain patterns", but
adversarial inputs don't really fit.

--- David A. Wheeler


Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Christey, Steven M.
I agree with David - validation is trying to prove that input is "good."  In adversarial images, it's making the wrong conclusions about what the input even represents.

Kurt -  I already mentioned something like "Incorrect Classification" (perhaps a bad name) as a possible generic pattern.  Why wouldn't such an entry, if created, already count as an appropriate "overarching CWE"?

- Steve


> -----Original Message-----
> From: Wheeler, David A [mailto:[hidden email]]
> Sent: Thursday, January 11, 2018 1:22 PM
> To: Kurt Seifried <[hidden email]>; Christey, Steven M. <[hidden email]>
> Cc: cwe-research-list CWE Research Discussion <cwe-research-
> [hidden email]>
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
> adversarial inputs (adversarial machine learning)
>
> > From: Kurt Seifried [mailto:[hidden email]]
> > Can I suggest we do an overarching CWE ala CWE-20 and then more specific
> > ones as new attacks and so on come out.
>
> If there's reasonable overarching one, great, but I don't think CWE-20 (input
> validation) is it.
> Input validation usually says, "only allow data in that matches certain patterns",
> but
> adversarial inputs don't really fit.
>
> --- David A. Wheeler
>

Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Kurt Seifried
When I referenced CWE-20 I didn't mean use CWE-20 specifically, I meant have an over arching "Machine learning algorithm vulnerable to adversarial input" or whatever and then maybe more specific ones (e.g. one for attacks during the learning phase, one for attacks post learning, and obviously as time goes on the types and variety of attacks will grow). 


On Thu, Jan 11, 2018 at 11:56 AM, Christey, Steven M. <[hidden email]> wrote:
I agree with David - validation is trying to prove that input is "good."  In adversarial images, it's making the wrong conclusions about what the input even represents.

Kurt -  I already mentioned something like "Incorrect Classification" (perhaps a bad name) as a possible generic pattern.  Why wouldn't such an entry, if created, already count as an appropriate "overarching CWE"?

- Steve


> -----Original Message-----
> From: Wheeler, David A [mailto:[hidden email]]
> Sent: Thursday, January 11, 2018 1:22 PM
> To: Kurt Seifried <[hidden email]>; Christey, Steven M. <[hidden email]>
> Cc: cwe-research-list CWE Research Discussion <cwe-research-
> [hidden email]>
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
> adversarial inputs (adversarial machine learning)
>
> > From: Kurt Seifried [mailto:[hidden email]]
> > Can I suggest we do an overarching CWE ala CWE-20 and then more specific
> > ones as new attacks and so on come out.
>
> If there's reasonable overarching one, great, but I don't think CWE-20 (input
> validation) is it.
> Input validation usually says, "only allow data in that matches certain patterns",
> but
> adversarial inputs don't really fit.
>
> --- David A. Wheeler
>




--
Kurt Seifried
[hidden email]
To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Wheeler, David A
In reply to this post by Christey, Steven M.
Christey, Steven M. [mailto:[hidden email]]:
> First, we want to look beyond the attack-oriented names and descriptions,
> and try to identify the underlying weakness(es) based on (potentially-
> incorrect) behaviors, as they apply to resources, which are expected to have
> certain properties.  The "Adversarial Patch" paper (and term) is attack-
> oriented.  What is being attacked?
...
> There's a behavior of "classifying" visual
> data (an image resource) whose property is incorrectly calculated to be X
> ("a toaster") when it is actually Y ("a dog").

That's too narrow a scope, I think.

First, adversarial inputs aren't limited to images. Machine Learning
(ML) is also used in many other situations (audio & video in particular),
and adversarial input should work just fine in those cases - or really
in any case where ML is used.

Also, I suspect adversarial inputs are NOT limited to *just* ML classifiers.
I think they also apply to:
* Regression algorithms (e.g., to predict future states).
  These are normally continuous results, but if an adversary can force
  a wildly-wrong estimated result, it can still be a big problem.
* Unsupervised learning algorithms (e.g., for clustering,
  dimensionality reduction and association rule learning).
  This case is a little less clear to me, but I suspect they are *probably*
  also vulnerable to adversarial inputs.

If you want to create a "higher-level" CWE, dealing with
adversaries causing misclassification would be a reasonable child
of that higher-level CWE.

> Where possible, a higher-level CWE should stay away from talking about a
> specific algorithmic technique.  So, for example, whether a "classifier"
> technique is implemented using machine learning or some other algorithm
> is relatively irrelevant.

I'm not sure that is true in this case, and "where possible" is key.
CWE already talks about buffers, and buffers are an algorithmic technical.

I think that ML is a key issue here.  Most ML algorithm results
are fundamentally unreviewable, which is qualitatively different than
most systems we've built before with security properties.

In a (hand-written) rule-based system (no ML),
if an adversary created inputs that exploited the rules, we'd say the rule
had a logic error.  We can counter such problems with human review and
if necessary prove certain properties using formal methods.
In short, such systems are transparent.

But in most ML systems there is no practical way for humans to
review, before deployment, the reasons an ML algorithm will do something.
Decision trees are an exception (the reasons *can* be reviewed in them),
and you can provide some information in some other cases, but in general
most ML algorithms are opaque & resist review (!).

In my opinion, this is an additional reason why so many people who work with ML
are unaware of adversarial inputs.  Not only do they not think of security
(an old problem), but they can't explain the reasons a system
does what it's doing in general - so they're unused to having to justify what they do.

I'm not saying that the new CWE entry would *have* to be focused on ML,
but I think it's worth carefully considering.  These attacks appear,
to me, to be fundamentally based on exploiting ML algorithms'
tendency to create "general rules" that would be considered
completely ridiculous if could people review them.


> So, a new entry might look something like:
>
>   Name: "Incorrect Automated Image Classification"

It's not just image, and it's not just classification.  It could be called
"Incorrect Automated Results", but that sounds like the parent of all other CWEs :-).

"Incorrect Machine Learning results due to malicious inputs" isn't exactly what you want,
I know, but it's more like what I had in mind :-).
I'm not sure how to precisely meet your goals above, but hey,
mailing lists enable discussion right? :-).

>   Summary: the product automatically performs image classification, but it
> does not correctly classify a modified image, which may have security-
> relevant impacts.

How about:

Summary: One or more machine learning (ML) algorithms
produces incorrect results due to adversarial input,
which may have security-relevant impages.

> we'd want to be careful to ensure that the entry is written in a way that
> separates it from simplistic, automated type detection such as "the
> program looks at the magic bytes of a file to determine whether it's a GIF or
> JPG."  (Or maybe that should be acceptable under this entry?)

I don't think it should be acceptable under this entry.
If nothing else, people can review the code that looks at the magic bytes
and say "correct" or "not correct" before the code is shipped.
In most ML systems, that kind of review is impractical, so you have to
take other measures (train on adversarial & random inputs, select
transparent ML algorithms, etc.).

> Hope that's enough to get some more discussion going.  Thanks, David!

Sure!  I think it'll take a little discussion to carefully define
exactly what the problem is, but I think it's important in this case.

--- David A. Wheeler

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Christey, Steven M.
> From: Wheeler, David A [mailto:[hidden email]]
> Sent: Thursday, January 11, 2018 6:22 PM
> To: Christey, Steven M. <[hidden email]>; cwe-research-list CWE Research
> Discussion <[hidden email]>
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
> adversarial inputs (adversarial machine learning)
>
>[snip]
>
>
> First, adversarial inputs aren't limited to images. Machine Learning
> (ML) is also used in many other situations (audio & video in particular),
> and adversarial input should work just fine in those cases - or really
> in any case where ML is used.

To that end, I just learned of this paper about audio:

  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
  https://arxiv.org/abs/1801.01944

Thanks for all your feedback.  We will be discussing this issue internally and will get back to the list with some followup discussion and a proposal for new entry/entries.

- Steve

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Wheeler, David A
> -----Original Message-----
> From: Christey, Steven M. [mailto:[hidden email]]
> Sent: Wednesday, January 24, 2018 1:37 PM
> To: Wheeler, David A; cwe-research-list CWE Research Discussion
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable to
> adversarial inputs (adversarial machine learning)
>
> > From: Wheeler, David A [mailto:[hidden email]]
> > Sent: Thursday, January 11, 2018 6:22 PM
> > To: Christey, Steven M. <[hidden email]>; cwe-research-list CWE
> > Research Discussion <[hidden email]>
> > Subject: RE: Proposed new CWE: Machine learning classifier vulnerable
> > to adversarial inputs (adversarial machine learning)
> >
> >[snip]
> >
> >
> > First, adversarial inputs aren't limited to images. Machine Learning
> > (ML) is also used in many other situations (audio & video in
> > particular), and adversarial input should work just fine in those
> > cases - or really in any case where ML is used.
>
> To that end, I just learned of this paper about audio:
>
>   Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
>   https://arxiv.org/abs/1801.01944

Here's another set of audio attacks:
https://nicholas.carlini.com/code/audio_adversarial_examples/

There's a lot of research in this area, and it's getting concerning. E.g.:
"Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods"
Nicholas Carlini & David Wagner
ACM Workshop on Artificial Intelligence and Security, 2017.
https://nicholas.carlini.com/papers/2017_aisec_breakingdetection.pdf

Nicholas Carlini has some other interesting papers on this:
https://nicholas.carlini.com/papers/

There may never be easy solutions, but that does *not* mean we should ignore the problem.

--- David A. Wheeler

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

Christey, Steven M.
Thanks for the latest additional references, David.

We are of the mindset to create a new CWE entry for this.

Detailed CWE classification would be, effectively, an original research problem, and it seems likely that this general problem is not fully understood and likely to change significantly over the course of the next year or two.  As such, it would be too resource-intensive to perform the original research and build an entire sub-hierarchy.  However, a class-level entry seems appropriate at this point.

Below is our current draft for such a new entry, which will also include a number of references as already mentioned in this thread.  Feedback welcome :)

- Steve

Name: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations

[NOTES:
  - "perturbations" is a commonly-used term in this area, so we will use it for consistency with the community that is studying them
  - we say "automated recognition mechanism" to allow for techniques besides ML, and to attempt to side-step the whole debate
    about what constitutes ML or not


Summary: The product uses an automated mechanism such as machine learning to recognize a complex set of data inputs (e.g. image or audio) as a particular concept, but it does not properly detect or handle inputs that have been modified or constructed in a way that causes the mechanism to incorrectly recognize the input as a different concept than intended.

Extended description:

When techniques such as machine learning are used to automatically classify input streams, and those classifications are used for security-critical decisions, then any mistake in classification can introduce a vulnerability that allows attackers to cause the product to make the wrong security decision.  If the automated mechanism is not developed or "trained" with enough input data, then attackers may be able to craft malicious input that intentionally triggers the incorrect classification.

Targeted technologies include, but are not necessarily limited to:
- automated speech recognition
- automated image recognition


- Steve

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].