Re: [Non-DoD Source] RE: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning) (UNCLASSIFIED)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Non-DoD Source] RE: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning) (UNCLASSIFIED)

Hood, Jonathan W CTR USARMY RDECOM AMRDEC (US)
CLASSIFICATION: UNCLASSIFIED

I agree with Steve. While I'm not opposed to a specific ML-based CWE and think that David identifies a valid gap in the current CWEs, I think we can make it cover a little bit more that's missing from the current CWE list. Below is my slightly-modified original reply to David:

Perhaps changing this to "Logic Injection" and making it a child of CWE-74 and CWE-511 would make it apply to more than just machine learning? I see this as a type of logic injection issue, which there isn't really a good CWE for right now.

But this type of issue affects more than just machine learning — consider a vulnerability where cameras are being monitored and facial recognition software is running. Someone finds a vulnerability in the facial recognition software that makes people wearing an Alabama Crimson Tide hat be ignored by the software. This weakness is a logic injection issue: there is logic a malicious actor can input into the system that makes it not perform as expected, often in disastrous ways.

Dr. Jonathan Hood
Software Assurance | SED Cyber Solutions Center
(256) 876-0326


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Christey, Steven M.
Sent: Thursday, January 11, 2018 11:16 AM
To: Wheeler, David A CTR (US) <[hidden email]>; cwe-research-list CWE Research Discussion <[hidden email]>
Subject: [Non-DoD Source] RE: Proposed new CWE: Machine learning classifier vulnerable to adversarial inputs (adversarial machine learning)

All active links contained in this email were disabled.  Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser.  




----

David,

This is a very interesting suggestion, and maybe a new entry is appropriate.

Here's a walkthrough of some of my thinking on the topic, which will hopefully help demonstrate some of the classification/description process I try to use for new CWE entries.

One might crudely say that this is a problem with "input validation" (CWE-20), but that's rather simplistic.

If the classifiers are used in authentication, then maybe it is just an interesting, domain-specific example of CWE-287 (Improper Authentication).  But, "adversarial inputs" could be used in many different scenarios, not just authentication.

We have CWE-804 (Guessable CAPTCHA), which outlines various CAPTCHA methods, including audio/visual methods; but that weakness involves the generation of *easily recognizable* images, versus *incorrectly-recognizable* images as you're outlining.

Finally, there's CWE-843 ("Type Confusion") which sounds like it could be related, but is explicitly about programming language data types.

So none of these are a good match, and we could write a new entry.

First, we want to look beyond the attack-oriented names and descriptions, and try to identify the underlying weakness(es) based on (potentially-incorrect) behaviors, as they apply to resources, which are expected to have certain properties.  The "Adversarial Patch" paper (and term) is attack-oriented.  What is being attacked?  There's a behavior of "classifying" visual data (an image resource) whose property is incorrectly calculated to be X ("a toaster") when it is actually Y ("a dog").

From the perspective of an affected application, the different techniques for generating an adversarial patch are effectively part of the attack - analogous, perhaps, to how somebody can construct a buffer overflow exploit using a long string of "A" characters, versus a nop sled and shellcode, versus ROP programming - all those overflow techniques are ways of modifying the inputs to give the attacker greater control after the overflow has occurred (what vuln theory calls "facilitator manipulations.")

Where possible, a higher-level CWE should stay away from talking about a specific algorithmic technique.  So, for example, whether a "classifier" technique is implemented using machine learning or some other algorithm is relatively irrelevant.

So, a new entry might look something like:

  Name: "Incorrect Automated Image Classification"

  Summary: the product automatically performs image classification, but it does not correctly classify a modified image, which may have security-relevant impacts.

This obviously would need better verbiage, more detailed explanations, and some specific examples or references like what you've provided.  And we'd want to be careful to ensure that the entry is written in a way that separates it from simplistic, automated type detection such as "the program looks at the magic bytes of a file to determine whether it's a GIF or JPG."  (Or maybe that should be acceptable under this entry?)  And the "PoC||GTFO" publications come to mind.

There's an abstraction question here too, since other types of "input stimuli" such as audio might also be used in security-relevant contexts (say, speech recognition).  It's not immediately clear to me how we could term such general phenomena while omitting simple types like strings and integers.  And having a generic parent named something like "Incorrect Classification" seems too likely to be misused since it gets too close to being assumed to be talking about classified data and information assurance.

Hope that's enough to get some more discussion going.  Thanks, David!

- Steve


> -----Original Message-----
> From: [hidden email]
> [Caution-mailto:owner-cwe-research-
> [hidden email]] On Behalf Of Wheeler, David A
> Sent: Wednesday, January 10, 2018 4:01 PM
> To: cwe-research-list CWE Research Discussion <cwe-research-
> [hidden email]>
> Subject: RE: Proposed new CWE: Machine learning classifier vulnerable
> to adversarial inputs (adversarial machine learning)
>
> I propose that a new CWE be created to cover "Machine learning
> algorithm vulnerable to adversarial input" (or some similar name).  More details below.
>
> On July 18, 2017, I sent an email stating that "classifiers based on
> machine learning (ML) can be vulnerable to adversarial data, such as adversarial images"
> & pointed to a number of articles demonstrating this.  As a result, I
> proposed that a CWE be added to cover this case.  Unfortunately, I may
> have accidentally sent that email to the wrong email address, so I'm
> incorporating that email below in case it got lost.
>
> In any case, I think the situation is getting even *worse* than when I
> wrote that email.  See the paper "Adversarial Patch" by Tom B. Brown,
> Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer, submitted on
> 27 Dec 2017, <Caution-https://arxiv.org/abs/1712.09665>.  In this
> case, you can just print out some odd-looking figure, and it will
> force something to be classified as something else.
>
> We're now building vehicles that will *depend* on machine learning,
> yet these algorithms have *known* issues - especially if they are not
> designed to counter adversarial images.
>
> I think it is *vital* that CWE add one or more weaknesses that cover
> "Machine learning algorithm vulnerable to adversarial input" (or some
> similar name).  This weakness may kill people if unaddressed, and I
> think putting this into the CWE set will increase the likelihood that people will address it.
>
> --- David A. Wheeler
>
> ======
>
> Here's the previous email "RE: Proposed new CWE: Machine learning
> classifier vulnerable to adversarial inputs (adversarial machine
> learning" I originally sent on July 18, 2017:
>
> There's been spectacular growth in the use of machine learning
> techniques (e.g., neural networks) to classify images and other data.  
> These are core to self- driving cars, for example.
>
> Unfortunately, classifiers based on machine learning (ML) can be
> vulnerable to adversarial data, such as adversarial images.  In many
> cases, people are training ML on "good" data but not thinking about
> what happens when an attacker shows up, and the resulting systems are vulnerable.  Sounds familiar :-).
>
> The basic problems is that attackers can create adversarial inputs
> that fool classifiers based on machine learning.  In years past it was
> thought that adversarial data would only work in "lab conditions" but
> unfortunately that appears to be false.  Some additional reading is
> here, which I hope will be convincing evidence that this is a
> widely-known weakness that repeatedly
> appears:
>   Caution-https://blog.openai.com/robust-adversarial-inputs/
>   Caution-https://blog.openai.com/adversarial-example-research/
>  
> Caution-http://www.kdnuggets.com/2015/07/deep-learning-adversarial-exa
> mples-
> misconceptions.html
>  
> Caution-https://www.theverge.com/2017/4/12/15271874/ai-adversarial-ima
> ges- fooling-attacks-artificial-intelligence
>   Caution-http://www.evolvingai.org/fooling
>
> Like many problems, there doesn't seem to be an easy perfect solution.
> However, part of the solution appears to be that developers using ML
> classifiers need to track adversarial input research, develop their
> own adversarial data (such as adversarial images), and train their
> systems to be less likely to be fooled by them (this sometimes has the
> side-effect of improving non-adversarial classification).  Having a
> CWE would make it easier to capture & improve on countermeasure best practices.
>
> Before trying to write up a detailed CWE entry - can people agree that
> there's a real weakness here?  Not everyone is using ML classifiers,
> but they've become widespread in important domains, and are likely to
> become important to human safety.  What's more, countermeasures exist
> - but people have to know that there's a problem.  Given all that, I think it's time to add them to the CWE list.
>
> Thoughts? Thanks.
>
> --- David A. Wheeler
>
> To unsubscribe, send an email message to [hidden email] with
> SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have
> difficulties, write to [hidden email].
To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

CLASSIFICATION: UNCLASSIFIED

To unsubscribe, send an email message to [hidden email] with SIGNOFF CWE-RESEARCH-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

smime.p7s (7K) Download Attachment