Taxonomy talk

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Taxonomy talk

heinbockel
After talking with multiple end users and spending
some time trying to organize my thoughts, I am
looking for some opinions on the CEE Taxonomy.

From the beginning, all of the talk has been surrounding
a subject-verb-object-result model for describing the
type of event being logged.

While I still think there is validity to this approach,
we tend to approach the log space from multiple perspectives.
For example, IDS events: a passive device sends an alert
about traffic matching a signature. We conceptually think
about this differently than audit logs (e.g. OS). Shouldn't
an event taxonomy reflect this?

At the pure taxonomic categorization, there might not be
much difference. However if we start looking at syntax and
implementation, we expect to see different types of data
based on the event type. Therefore, I seems to make sense
to introduce another taxonomy layer.

What I am currently thinking is something like:

- Signature/Rule match event
        Subject: IDS, HIDS, IPS, Firewall, A/V
        Verb: allowed, blocked, quarantined, removed
        Object: Packet, file
        Result: success, fail

- Audit (maybe break down further?)
        Subject: user, file, host, application, service
        Verb: add, delete, modify, start, stop
        Obj: account, password, config, service

- Web
        ...
- E-mail
        ...
- DHCP
- NAC
- NAT


Instead of adding everything to one top level "event"
category, it seems easier to break the event space into
some logical partitions. I think that this would make
discussion, support, and implementation a lot easier and
straightforward.

Trying to descripe the log universe in a single taxonomic
manner seems too unwhieldy. When we start talking about data
types, I doesn't make sense to treat OS-level events the same
as IDS or DHCP events.

What are your opinions?
Does something like this make sense?


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615



smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Chris Koutras

Bill,

I have been toying with the audit logs at my company. We came to something similar to what is below but with a few added fields.
subject - A broad category that describes the class of the person or process taking or receiving the action such as User or data store.
verb - An action word that describes what was done to the subject such as data input or command output.
object - The entity receiving the actions such as database name or filename.
adjective - Further describe what action is being taken such as buy, upload or query.
result - Either success or failure
info - Free form text about the logged action such as somefile.txt or 10 records returned.

Just a few thohgts on the audit format for consideration.

Thanks,
Chris


"Heinbockel, Bill" <[hidden email]>

10/24/2008 12:52 PM

Please respond to
"Heinbockel, Bill" <[hidden email]>

To
[hidden email]
cc
Subject
[CEE-DISCUSSION-LIST] Taxonomy talk





After talking with multiple end users and spending
some time trying to organize my thoughts, I am
looking for some opinions on the CEE Taxonomy.

From the beginning, all of the talk has been surrounding
a subject-verb-object-result model for describing the
type of event being logged.

While I still think there is validity to this approach,
we tend to approach the log space from multiple perspectives.
For example, IDS events: a passive device sends an alert
about traffic matching a signature. We conceptually think
about this differently than audit logs (e.g. OS). Shouldn't
an event taxonomy reflect this?

At the pure taxonomic categorization, there might not be
much difference. However if we start looking at syntax and
implementation, we expect to see different types of data
based on the event type. Therefore, I seems to make sense
to introduce another taxonomy layer.

What I am currently thinking is something like:

- Signature/Rule match event
                Subject: IDS, HIDS, IPS, Firewall, A/V
                Verb: allowed, blocked, quarantined, removed
                Object: Packet, file
                Result: success, fail

- Audit (maybe break down further?)
                Subject: user, file, host, application, service
                Verb: add, delete, modify, start, stop
                Obj: account, password, config, service

- Web
                ...
- E-mail
                ...
- DHCP
- NAC
- NAT


Instead of adding everything to one top level "event"
category, it seems easier to break the event space into
some logical partitions. I think that this would make
discussion, support, and implementation a lot easier and
straightforward.

Trying to descripe the log universe in a single taxonomic
manner seems too unwhieldy. When we start talking about data
types, I doesn't make sense to treat OS-level events the same
as IDS or DHCP events.

What are your opinions?
Does something like this make sense?


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615




________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Raffael Marty-3
In reply to this post by heinbockel
On this, Bill,

I need to correlate audit events and IDS events. I need them to fall  
into the same taxonomy. Otherwise my correlation are going to be just  
unworkable. And it would go against the idea of an overarching  
taxonomy. Having said that, I think we need to add a component to the  
taxonomy:

object
action
status
devicetype

Whatever we call the latter. You have a point about the type of event.  
I used to have a node for this at ArcSight and it was a failure, but  
thinking about it, it can be used in this way to let you query and  
figure out how much you actually trust an event. It is sometimes  
useful to find resource failures on just the NIDSs or failed attacks  
reported by a HIDS. So, I think we should add this node.

Actually, while I am at it... I believe I raised this with Anton at  
some point: IDSs really need a relevance or confidence measure to  
their alerts. How tight is a signature written? What's the likelihood  
of false positives. And this could be added to other log types as  
well. How certain are you about what you are reporting. Especially in  
the security devices this would be invaluable.

I am going to shut up now...

   -raffy

On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:

> After talking with multiple end users and spending
> some time trying to organize my thoughts, I am
> looking for some opinions on the CEE Taxonomy.
>
> From the beginning, all of the talk has been surrounding
> a subject-verb-object-result model for describing the
> type of event being logged.
>
> While I still think there is validity to this approach,
> we tend to approach the log space from multiple perspectives.
> For example, IDS events: a passive device sends an alert
> about traffic matching a signature. We conceptually think
> about this differently than audit logs (e.g. OS). Shouldn't
> an event taxonomy reflect this?
>
> At the pure taxonomic categorization, there might not be
> much difference. However if we start looking at syntax and
> implementation, we expect to see different types of data
> based on the event type. Therefore, I seems to make sense
> to introduce another taxonomy layer.
>
> What I am currently thinking is something like:
>
> - Signature/Rule match event
> Subject: IDS, HIDS, IPS, Firewall, A/V
> Verb: allowed, blocked, quarantined, removed
> Object: Packet, file
> Result: success, fail
>
> - Audit (maybe break down further?)
> Subject: user, file, host, application, service
> Verb: add, delete, modify, start, stop
> Obj: account, password, config, service
>
> - Web
> ...
> - E-mail
> ...
> - DHCP
> - NAC
> - NAT
>
>
> Instead of adding everything to one top level "event"
> category, it seems easier to break the event space into
> some logical partitions. I think that this would make
> discussion, support, and implementation a lot easier and
> straightforward.
>
> Trying to descripe the log universe in a single taxonomic
> manner seems too unwhieldy. When we start talking about data
> types, I doesn't make sense to treat OS-level events the same
> as IDS or DHCP events.
>
> What are your opinions?
> Does something like this make sense?
>
>
> William Heinbockel
> Infosec Engineer, Sr.
> The MITRE Corporation
> 202 Burlington Rd. MS S145
> Bedford, MA 01730
> [hidden email]
> 781-271-2615
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Tina Bird
> Actually, while I am at it... I believe I raised this with Anton at  
> some point: IDSs really need a relevance or confidence measure to  
> their alerts. How tight is a signature written? What's the
> likelihood  
> of false positives. And this could be added to other log types as  
> well. How certain are you about what you are reporting.
> Especially in  
> the security devices this would be invaluable.
>
> I am going to shut up now...

It might be invaluable, but isn't it impossible? How can the device itself
grade its own performance?

In any event, surely this is outside the current scope...there's plenty of
existing data to make sense of before we start creating new data to deal
with  ;-)

t.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Eric Fitzgerald
I need time to grok most of Raffy's avalanche of comments today, but I wanted to chime in on this one.
 
There is a hierarchy of levels of semantic meaning in events, and at the base of this hierarchy is there are the events that we care about in CEE.  These events have no relevance or confidence measurement associated with them, because they are always absolutely 100% true from the point of view of the system raising the event- they just record something that the system did or saw.
 
After an event is analyzed by a system such as an IDS or even a health monitoring or management system, then the analysis engine may issue something event-like but which is at a higher semantic level (this thing might still be considered an event as far as CEE is concerned but it would be elsewhere in the taxonomy).  These higher level semantic event-like records might have a probability-type metric associated with them, but that is an application-specific parameter and probably not broadly applicable, at least not enough to include in the base schema for events.
 
I completely agree with the idea of a probability metric and in fact have a use case, but I disagree that it is part of the base event schema for all events.
 
At least that is my opinion.  I'll be happy to expound at length on our next con call.
 
Best regards,
Eric
 

From: Tina Bird [[hidden email]]
Sent: Friday, December 05, 2008 7:36 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk

> Actually, while I am at it... I believe I raised this with Anton at
> some point: IDSs really need a relevance or confidence measure to
> their alerts. How tight is a signature written? What's the
> likelihood
> of false positives. And this could be added to other log types as
> well. How certain are you about what you are reporting.
> Especially in
> the security devices this would be invaluable.
>
> I am going to shut up now...

It might be invaluable, but isn't it impossible? How can the device itself
grade its own performance?

In any event, surely this is outside the current scope...there's plenty of
existing data to make sense of before we start creating new data to deal
with  ;-)

t.

Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

David Corlette
In reply to this post by Tina Bird
>>> On 05.12.2008 at 22:36, in message
<1FA584E6B2BD46AD902DB0D7CAD00A25@lindesfarne>, Tina Bird
<[hidden email]> wrote:

>>  Actually, while I am at it... I believe I raised this with Anton at  
>> some point: IDSs really need a relevance or confidence measure to  
>> their alerts. How tight is a signature written? What's the
>> likelihood  
>> of false positives. And this could be added to other log types as  
>> well. How certain are you about what you are reporting.
>> Especially in  the security devices this would be invaluable.
> It might be invaluable, but isn't it impossible? How can the device itself
> grade its own performance?
>
> In any event, surely this is outside the current scope...there's plenty of
> existing data to make sense of before we start creating new data to deal
> with  ;-)

I think this definitely is in the current scope, but you've hit on an important issue. Some devices like IDSs do create a locally-defined "relevance" or "likelihood" field, but this doesn't necessarily mean that this also expresses how much the enterprise should worry about that event and reacting to it.

In the Linux world, everything that's not the kernel is essentially considered untrusted, for example.

But I think the calculation of a "business relevance" (which might incorporate all sorts of things like: "likelihood", "relevance", information about what assets were affected, trustworthiness of the event observers, etc) is something that is outside the scope of CEE, more something that a SIEM tool would calculate based on observed environmental factors.  So it's probably good enough for an event observer to report what *it* thinks, and then our analysis tools can use that as only one factor in the calculation.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

David Corlette
In reply to this post by Eric Fitzgerald
Eric, I think you've made an assumption about where the event is generated, which is why the initiator/target/observer separation is important in XDAS.  I agree that IF the observer (that generates the event) is ALSO the target (affected by the event), then we can be sure that it's 100% accurate in reporting the effects of the activity.  But there are plenty of cases where the observer is NOT the target, like as you point out with IDSs and other types of systems.  In those cases there's always a risk that the observed behavior is not what actually happened.  But I don't see those as a separate class of events.

>>> On 06.12.2008 at 01:19, in message
<A0392052-1559-48C7-98A1-E4D4391F1771@mimectl>, Eric Fitzgerald
<[hidden email]> wrote:

> I need time to grok most of Raffy's avalanche of comments today, but I wanted
> to chime in on this one.
>
> There is a hierarchy of levels of semantic meaning in events, and at the
> base of this hierarchy is there are the events that we care about in CEE.  
> These events have no relevance or confidence measurement associated with
> them, because they are always absolutely 100% true from the point of view of
> the system raising the event- they just record something that the system did
> or saw.
>
> After an event is analyzed by a system such as an IDS or even a health
> monitoring or management system, then the analysis engine may issue something
> event-like but which is at a higher semantic level (this thing might still be
> considered an event as far as CEE is concerned but it would be elsewhere in
> the taxonomy).  These higher level semantic event-like records might have a
> probability-type metric associated with them, but that is an
> application-specific parameter and probably not broadly applicable, at least
> not enough to include in the base schema for events.
>
> I completely agree with the idea of a probability metric and in fact have a
> use case, but I disagree that it is part of the base event schema for all
> events.
>
> At least that is my opinion.  I'll be happy to expound at length on our next
> con call.
>
> Best regards,
> Eric
>
> ________________________________
> From: Tina Bird [[hidden email]]
> Sent: Friday, December 05, 2008 7:36 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk
>
>> Actually, while I am at it... I believe I raised this with Anton at
>> some point: IDSs really need a relevance or confidence measure to
>> their alerts. How tight is a signature written? What's the
>> likelihood
>> of false positives. And this could be added to other log types as
>> well. How certain are you about what you are reporting.
>> Especially in
>> the security devices this would be invaluable.
>>
>> I am going to shut up now...
>
> It might be invaluable, but isn't it impossible? How can the device itself
> grade its own performance?
>
> In any event, surely this is outside the current scope...there's plenty of
> existing data to make sense of before we start creating new data to deal
> with  ;-)
>
> t.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Sanford Whitehouse-2
In reply to this post by Raffael Marty-3
I think so, too.  For those not interested in the discussion, drop down
to "Summary".

When looking at the taxonomy I try to compare it to how a manual
analysis of logs would be done using the taxonomy in my head.  In the
filtering part of the process I either make an assumption about the
event context or use information I have to determine if an event is
relevant.  The filtering includes what type of product or component
generated the message, what type of message is it, does it fall into the
class of events I'm interested in, and can the message be seen more than
once.  All this before determining what the event was.  Manually, I have
to know or assume the answers.

In the case of a general taxonomy, it is best not to make assumptions.
It's also highly desirable to minimize the information that must be
provided by the vendor.  When thinking about the taxonomy structure I
ask if enough information is provided in the message, or I have to go by
experience or a priori information.

There are guidelines that apply to this problem:
(Convenience definitions: System = a single identifiable source of
messages, such as a host or device.  Component = Any part of the system,
non-native product installed on the system, or component/subsystem of
either.)

+ The event is independent of the system it came from or the component
that generated it.

- From a taxonomy perspective (the description of the event) a login can
be reported from anywhere by anything.

+ The same event can be reported by multiple components on the same
system, or by multiple systems.

- The first is anything that can run on the system.  In the case of
login it can be the operating system component that supported the login,
an audit component watching the operating system, and an HIDS installed
on the same system. (Each of these can have a definition supporting why
they may exist on the same system.)

- There can be no expectation that desired components will be available
on a system.

Obviously, a system audit component may not run on a system.  The
distinctions about what can report an event must be recognized.

+ Messages can come from multiple systems not directly related.

- Such as the OS and an external IDS system.

+ There are too many products to define what it one product does simply
by the name of the process, the log, some external system
classification, or other method requiring external information.

- Ideally, analysis of an event can begin as soon as the message or
messages are received.  It should not have to wait for someone to sit in
front of an analysis product and manage some type of component
classification for each message.

And finally...

+ Make decisions about the message a quickly as possible.

- Given the mantra about having to analyze ba-zillions of
messages/second, speed can be improved by reducing and simplifying the
set of tests used to filter and determine relevance.

Summary

All of this adds up to splitting the taxonomy into two parts; the event
context and the event description.  (There are actually four or five
pieces to any event.  I'll list those after.)

Event context contains information about what reported the event.  It
should have:

+ Event source name:  The process or other component that generated the
message.  (This might be considered convenience information.  I tend to
use it to help with dealing with external information that can't be
avoided.)

+ Event source type:  What does it do?  Security, auditing, user
management, etc.  This allows for distinctions between multiple
components on the same system.  The alternative is to know that auditd
and useradd may report on the same event.

+ Event class (or function): What functional area did the event occur
in?       A login may have an event source type of "user management",
"audit", and "NIDS".  The class may be "system access".  Pick which you
want to use.

+ Event type:  These are application, administration, and operation.
 From these it is possible to know the perspective one should take on
the event.  Did someone change something, is the product working well,
or is the event something the product did.

 From these I can do the following:

+ Filter which messages are useful when reported by multiple components
on the same system.

+ Quickly filter desired messages by avoiding minimizing test of the
event description.

- This includes determining analysis to be done (op, admin, app) earlier
in message processing.

+ Assist with correlating event messages reported by more than one system.

+ Combine information when reported by multiple components or systems
for a larger picture.

The first part of any analysis is filtering the messages in the
direction of desired analysis an use.  The fields assist both the log
analysis product and the administrator in efficiency and focus.

To accomplish the same points without event context will require a
method to add the same information.  Adding it will require analyzing
each message and message source, and the existence of a classification
system.  The would be defined on a vendor/user and product case by case
basis, and may be ad hoc.

Without the fields, analysis and filtering would be roughly equal to
doing a persons address/phone look up on the web and just using the
persons name.  There can be any number of matches.  Filtering is done by
reading the results and selecting the best fit based on external
criteria.  Each time the best fit criteria change, such a where the
person lives, the process has to be done again.  If the results don't
match the criteria, the criteria must be modified, additional
information must be supplied, or it must be accepted that no suitable
result exists.  Analysis is modified to support it, essentially creating
an exception that must be managed.

Underlying use cases for an event standard must include the dynamic IT
environment, the ongoing changes to products, and the volume and rate of
messages.  To do that, manageability and analysis should be considered.
  I think the additional fields will help with that.

As usual, apologies for the long email.

Sanford

Raffael Marty wrote:

> On this, Bill,
>
> I need to correlate audit events and IDS events. I need them to fall
> into the same taxonomy. Otherwise my correlation are going to be just
> unworkable. And it would go against the idea of an overarching taxonomy.
> Having said that, I think we need to add a component to the taxonomy:
>
> object
> action
> status
> devicetype
>
> Whatever we call the latter. You have a point about the type of event. I
> used to have a node for this at ArcSight and it was a failure, but
> thinking about it, it can be used in this way to let you query and
> figure out how much you actually trust an event. It is sometimes useful
> to find resource failures on just the NIDSs or failed attacks reported
> by a HIDS. So, I think we should add this node.
>
> Actually, while I am at it... I believe I raised this with Anton at some
> point: IDSs really need a relevance or confidence measure to their
> alerts. How tight is a signature written? What's the likelihood of false
> positives. And this could be added to other log types as well. How
> certain are you about what you are reporting. Especially in the security
> devices this would be invaluable.
>
> I am going to shut up now...
>
>   -raffy
>
> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>
>> After talking with multiple end users and spending
>> some time trying to organize my thoughts, I am
>> looking for some opinions on the CEE Taxonomy.
>>
>> From the beginning, all of the talk has been surrounding
>> a subject-verb-object-result model for describing the
>> type of event being logged.
>>
>> While I still think there is validity to this approach,
>> we tend to approach the log space from multiple perspectives.
>> For example, IDS events: a passive device sends an alert
>> about traffic matching a signature. We conceptually think
>> about this differently than audit logs (e.g. OS). Shouldn't
>> an event taxonomy reflect this?
>>
>> At the pure taxonomic categorization, there might not be
>> much difference. However if we start looking at syntax and
>> implementation, we expect to see different types of data
>> based on the event type. Therefore, I seems to make sense
>> to introduce another taxonomy layer.
>>
>> What I am currently thinking is something like:
>>
>> - Signature/Rule match event
>>     Subject: IDS, HIDS, IPS, Firewall, A/V
>>     Verb: allowed, blocked, quarantined, removed
>>     Object: Packet, file
>>     Result: success, fail
>>
>> - Audit (maybe break down further?)
>>     Subject: user, file, host, application, service
>>     Verb: add, delete, modify, start, stop
>>     Obj: account, password, config, service
>>
>> - Web
>>     ...
>> - E-mail
>>     ...
>> - DHCP
>> - NAC
>> - NAT
>>
>>
>> Instead of adding everything to one top level "event"
>> category, it seems easier to break the event space into
>> some logical partitions. I think that this would make
>> discussion, support, and implementation a lot easier and
>> straightforward.
>>
>> Trying to descripe the log universe in a single taxonomic
>> manner seems too unwhieldy. When we start talking about data
>> types, I doesn't make sense to treat OS-level events the same
>> as IDS or DHCP events.
>>
>> What are your opinions?
>> Does something like this make sense?
>>
>>
>> William Heinbockel
>> Infosec Engineer, Sr.
>> The MITRE Corporation
>> 202 Burlington Rd. MS S145
>> Bedford, MA 01730
>> [hidden email]
>> 781-271-2615
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Sanford Whitehouse-2
In reply to this post by Raffael Marty-3
Wrt relevance; what confidence would there be in a vendor supplied
value?  At best it is a starting or reference point later refined by the
vendor and customer.  If it's not a solid value, how would it's presence
in an alert be evaluated over time?

Over time means as new or modified event characteristics are seen, and
as the vendor makes adjustments to the signature or releases changes to
the IDS product.

The former can only be determined over time and adapted to.  Addressing
the latter requires information from the vendor, such as product or
signature version, and correlation with the installed IDS base.
Otherwise, effective use of adjusted values will fail.  Immediate use of
new product/signatures will invalidate current adjusted values.
Applying the values across time, such as applying them to archived
alerts, will produce interesting results without saving the adjusted
values at archive time.

All of that complexity is, of course, tied to confidence in the initial
number practical estimates of its effectiveness over time and the
ability to adapt, and the level of cost in dealing with issues that rise
  from confidence, positive and negative, that does not match reality.

My preference would be more cross-system correlation.  Trust, but verify.

Sanford

Raffael Marty wrote:

> On this, Bill,
>
> I need to correlate audit events and IDS events. I need them to fall
> into the same taxonomy. Otherwise my correlation are going to be just
> unworkable. And it would go against the idea of an overarching taxonomy.
> Having said that, I think we need to add a component to the taxonomy:
>
> object
> action
> status
> devicetype
>
> Whatever we call the latter. You have a point about the type of event. I
> used to have a node for this at ArcSight and it was a failure, but
> thinking about it, it can be used in this way to let you query and
> figure out how much you actually trust an event. It is sometimes useful
> to find resource failures on just the NIDSs or failed attacks reported
> by a HIDS. So, I think we should add this node.
>
> Actually, while I am at it... I believe I raised this with Anton at some
> point: IDSs really need a relevance or confidence measure to their
> alerts. How tight is a signature written? What's the likelihood of false
> positives. And this could be added to other log types as well. How
> certain are you about what you are reporting. Especially in the security
> devices this would be invaluable.
>
> I am going to shut up now...
>
>   -raffy
>
> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>
>> After talking with multiple end users and spending
>> some time trying to organize my thoughts, I am
>> looking for some opinions on the CEE Taxonomy.
>>
>> From the beginning, all of the talk has been surrounding
>> a subject-verb-object-result model for describing the
>> type of event being logged.
>>
>> While I still think there is validity to this approach,
>> we tend to approach the log space from multiple perspectives.
>> For example, IDS events: a passive device sends an alert
>> about traffic matching a signature. We conceptually think
>> about this differently than audit logs (e.g. OS). Shouldn't
>> an event taxonomy reflect this?
>>
>> At the pure taxonomic categorization, there might not be
>> much difference. However if we start looking at syntax and
>> implementation, we expect to see different types of data
>> based on the event type. Therefore, I seems to make sense
>> to introduce another taxonomy layer.
>>
>> What I am currently thinking is something like:
>>
>> - Signature/Rule match event
>>     Subject: IDS, HIDS, IPS, Firewall, A/V
>>     Verb: allowed, blocked, quarantined, removed
>>     Object: Packet, file
>>     Result: success, fail
>>
>> - Audit (maybe break down further?)
>>     Subject: user, file, host, application, service
>>     Verb: add, delete, modify, start, stop
>>     Obj: account, password, config, service
>>
>> - Web
>>     ...
>> - E-mail
>>     ...
>> - DHCP
>> - NAC
>> - NAT
>>
>>
>> Instead of adding everything to one top level "event"
>> category, it seems easier to break the event space into
>> some logical partitions. I think that this would make
>> discussion, support, and implementation a lot easier and
>> straightforward.
>>
>> Trying to descripe the log universe in a single taxonomic
>> manner seems too unwhieldy. When we start talking about data
>> types, I doesn't make sense to treat OS-level events the same
>> as IDS or DHCP events.
>>
>> What are your opinions?
>> Does something like this make sense?
>>
>>
>> William Heinbockel
>> Infosec Engineer, Sr.
>> The MITRE Corporation
>> 202 Burlington Rd. MS S145
>> Bedford, MA 01730
>> [hidden email]
>> 781-271-2615
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Sanford Whitehouse-2
In reply to this post by Sanford Whitehouse-2
I was going to add the four/five pieces of an event/message to this.
I'll keep it quick.

These exist for all messages, even if not included in the message
itself.  They exist because any use of an event message must consider
them when managing or using the message or event.

Behind each of these is an assumption they are separate from each other.
  Basically, any information that exists in one piece can exist for any
set of information in the other pieces.  A port number in one part of
the message can be present regardless of where it came from, what
product it applies to, and the event it is related to.  This
independence allows focus on the characteristics of each piece without
giving up completeness or sacrificing use and meaning.

1.  System context

The obvious information about where and when the event occurred. (When
could be another piece.  I'm leaving it here for now.)  It also includes
information that allows one to make use of external information.  This
might be the OS type, version, etc.

2.  Log source context

Where the record of the event came from.  Necessary when an event can be
recorded in a product or system log and sent as an alert, or when
multiple collection methods pull the information from the same log but
report separately.

3.  Event context

The source and information about the source that allows clear
understanding of what the product was doing when the event occurred.

4.  Event description

I think this is what is currently viewed as the taxonomy.

5.  Event detail

The ports, addresses, users, devices, and so on.

Some of the information is found outside the message, such as through
collection.  Generally the system IP address is used to convey where the
event happened.  Other information would be necessary if TCP/IP wasn't
available as a foundation or reliable.

Though the information is necessary it does not need to come from the
same place.  For example, system context can be a separate message
included at start up or periodically while the system is running.

The remainder must be included, in some form.  The log source context
enables analysis for uniqueness of a message.  It's not always possible
to use only a product log, system log, or alert.  System logs contain
general events from many products.  A product log can contain the event,
as well as other events.  An alert may contain additional information
about an event, but is not complete.  Any of the logs may be combined
for completeness.

The standard may define where and what to log to help prevent this.

Sanford


Sanford Whitehouse wrote:

> I think so, too.  For those not interested in the discussion, drop down
> to "Summary".
>
> When looking at the taxonomy I try to compare it to how a manual
> analysis of logs would be done using the taxonomy in my head.  In the
> filtering part of the process I either make an assumption about the
> event context or use information I have to determine if an event is
> relevant.  The filtering includes what type of product or component
> generated the message, what type of message is it, does it fall into the
> class of events I'm interested in, and can the message be seen more than
> once.  All this before determining what the event was.  Manually, I have
> to know or assume the answers.
>
> In the case of a general taxonomy, it is best not to make assumptions.
> It's also highly desirable to minimize the information that must be
> provided by the vendor.  When thinking about the taxonomy structure I
> ask if enough information is provided in the message, or I have to go by
> experience or a priori information.
>
> There are guidelines that apply to this problem:
> (Convenience definitions: System = a single identifiable source of
> messages, such as a host or device.  Component = Any part of the system,
> non-native product installed on the system, or component/subsystem of
> either.)
>
> + The event is independent of the system it came from or the component
> that generated it.
>
> - From a taxonomy perspective (the description of the event) a login can
> be reported from anywhere by anything.
>
> + The same event can be reported by multiple components on the same
> system, or by multiple systems.
>
> - The first is anything that can run on the system.  In the case of
> login it can be the operating system component that supported the login,
> an audit component watching the operating system, and an HIDS installed
> on the same system. (Each of these can have a definition supporting why
> they may exist on the same system.)
>
> - There can be no expectation that desired components will be available
> on a system.
>
> Obviously, a system audit component may not run on a system.  The
> distinctions about what can report an event must be recognized.
>
> + Messages can come from multiple systems not directly related.
>
> - Such as the OS and an external IDS system.
>
> + There are too many products to define what it one product does simply
> by the name of the process, the log, some external system
> classification, or other method requiring external information.
>
> - Ideally, analysis of an event can begin as soon as the message or
> messages are received.  It should not have to wait for someone to sit in
> front of an analysis product and manage some type of component
> classification for each message.
>
> And finally...
>
> + Make decisions about the message a quickly as possible.
>
> - Given the mantra about having to analyze ba-zillions of
> messages/second, speed can be improved by reducing and simplifying the
> set of tests used to filter and determine relevance.
>
> Summary
>
> All of this adds up to splitting the taxonomy into two parts; the event
> context and the event description.  (There are actually four or five
> pieces to any event.  I'll list those after.)
>
> Event context contains information about what reported the event.  It
> should have:
>
> + Event source name:  The process or other component that generated the
> message.  (This might be considered convenience information.  I tend to
> use it to help with dealing with external information that can't be
> avoided.)
>
> + Event source type:  What does it do?  Security, auditing, user
> management, etc.  This allows for distinctions between multiple
> components on the same system.  The alternative is to know that auditd
> and useradd may report on the same event.
>
> + Event class (or function): What functional area did the event occur
> in?       A login may have an event source type of "user management",
> "audit", and "NIDS".  The class may be "system access".  Pick which you
> want to use.
>
> + Event type:  These are application, administration, and operation.
>  From these it is possible to know the perspective one should take on
> the event.  Did someone change something, is the product working well,
> or is the event something the product did.
>
>  From these I can do the following:
>
> + Filter which messages are useful when reported by multiple components
> on the same system.
>
> + Quickly filter desired messages by avoiding minimizing test of the
> event description.
>
> - This includes determining analysis to be done (op, admin, app) earlier
> in message processing.
>
> + Assist with correlating event messages reported by more than one system.
>
> + Combine information when reported by multiple components or systems
> for a larger picture.
>
> The first part of any analysis is filtering the messages in the
> direction of desired analysis an use.  The fields assist both the log
> analysis product and the administrator in efficiency and focus.
>
> To accomplish the same points without event context will require a
> method to add the same information.  Adding it will require analyzing
> each message and message source, and the existence of a classification
> system.  The would be defined on a vendor/user and product case by case
> basis, and may be ad hoc.
>
> Without the fields, analysis and filtering would be roughly equal to
> doing a persons address/phone look up on the web and just using the
> persons name.  There can be any number of matches.  Filtering is done by
> reading the results and selecting the best fit based on external
> criteria.  Each time the best fit criteria change, such a where the
> person lives, the process has to be done again.  If the results don't
> match the criteria, the criteria must be modified, additional
> information must be supplied, or it must be accepted that no suitable
> result exists.  Analysis is modified to support it, essentially creating
> an exception that must be managed.
>
> Underlying use cases for an event standard must include the dynamic IT
> environment, the ongoing changes to products, and the volume and rate of
> messages.  To do that, manageability and analysis should be considered.
>  I think the additional fields will help with that.
>
> As usual, apologies for the long email.
>
> Sanford
>
> Raffael Marty wrote:
>> On this, Bill,
>>
>> I need to correlate audit events and IDS events. I need them to fall
>> into the same taxonomy. Otherwise my correlation are going to be just
>> unworkable. And it would go against the idea of an overarching
>> taxonomy. Having said that, I think we need to add a component to the
>> taxonomy:
>>
>> object
>> action
>> status
>> devicetype
>>
>> Whatever we call the latter. You have a point about the type of event.
>> I used to have a node for this at ArcSight and it was a failure, but
>> thinking about it, it can be used in this way to let you query and
>> figure out how much you actually trust an event. It is sometimes
>> useful to find resource failures on just the NIDSs or failed attacks
>> reported by a HIDS. So, I think we should add this node.
>>
>> Actually, while I am at it... I believe I raised this with Anton at
>> some point: IDSs really need a relevance or confidence measure to
>> their alerts. How tight is a signature written? What's the likelihood
>> of false positives. And this could be added to other log types as
>> well. How certain are you about what you are reporting. Especially in
>> the security devices this would be invaluable.
>>
>> I am going to shut up now...
>>
>>   -raffy
>>
>> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>>
>>> After talking with multiple end users and spending
>>> some time trying to organize my thoughts, I am
>>> looking for some opinions on the CEE Taxonomy.
>>>
>>> From the beginning, all of the talk has been surrounding
>>> a subject-verb-object-result model for describing the
>>> type of event being logged.
>>>
>>> While I still think there is validity to this approach,
>>> we tend to approach the log space from multiple perspectives.
>>> For example, IDS events: a passive device sends an alert
>>> about traffic matching a signature. We conceptually think
>>> about this differently than audit logs (e.g. OS). Shouldn't
>>> an event taxonomy reflect this?
>>>
>>> At the pure taxonomic categorization, there might not be
>>> much difference. However if we start looking at syntax and
>>> implementation, we expect to see different types of data
>>> based on the event type. Therefore, I seems to make sense
>>> to introduce another taxonomy layer.
>>>
>>> What I am currently thinking is something like:
>>>
>>> - Signature/Rule match event
>>>     Subject: IDS, HIDS, IPS, Firewall, A/V
>>>     Verb: allowed, blocked, quarantined, removed
>>>     Object: Packet, file
>>>     Result: success, fail
>>>
>>> - Audit (maybe break down further?)
>>>     Subject: user, file, host, application, service
>>>     Verb: add, delete, modify, start, stop
>>>     Obj: account, password, config, service
>>>
>>> - Web
>>>     ...
>>> - E-mail
>>>     ...
>>> - DHCP
>>> - NAC
>>> - NAT
>>>
>>>
>>> Instead of adding everything to one top level "event"
>>> category, it seems easier to break the event space into
>>> some logical partitions. I think that this would make
>>> discussion, support, and implementation a lot easier and
>>> straightforward.
>>>
>>> Trying to descripe the log universe in a single taxonomic
>>> manner seems too unwhieldy. When we start talking about data
>>> types, I doesn't make sense to treat OS-level events the same
>>> as IDS or DHCP events.
>>>
>>> What are your opinions?
>>> Does something like this make sense?
>>>
>>>
>>> William Heinbockel
>>> Infosec Engineer, Sr.
>>> The MITRE Corporation
>>> 202 Burlington Rd. MS S145
>>> Bedford, MA 01730
>>> [hidden email]
>>> 781-271-2615
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Eric Fitzgerald
In reply to this post by David Corlette
No, I've made no such assumption.  Let me restate my argument.

When you start talking about "remove observers" you are not being clear that you are mixing two use cases.  I will differentiate between them.

Use case #1 is "remote instrumentation".  My canonical examples are SCADA- or network-management type equipment, or one of my favorites, NMEA instrumentation.  These systems observe and record what they observe without translation or interpretation (we can argue semantics; I'm sure there are use cases where there is translation/interpretation of the data in such systems but that falls into my next use case).

Use case #2 is "CIDF A-Box".  My canonical example of this is the classic IDS.  These systems observe activity that happened somewhere else (for example, traffic mirrored to a monitoring port on a Cisco switch).  These systems attempt to interpret their observations and their output is an analysis of their input.

I would love to have this discussion in person as it really is an information theory discussion.  My argument is that events from use case #2 occupy a different semantic level than events from use case #1.  Events from use case #2 have a characteristic that they have a "probability" or "accuracy" (or insert your term here) associated with them.  Events from use case #1 do not (or trivially, their value is "1" as there is no chance of having misunderstood the observed phenomenon since the instrumentation is not trying to understand it.

Therefore my argument is that "probability" or whatever we want to call it is not a global property that applies to all events (like timestamp, or taxon, etc.), but is use case/application domain specific.

I'm not arguing that there is no such thing as "accuracy" or that IDS don't output events; I'm arguing that these are different use cases.

Best regards,
Eric

-----Original Message-----
From: David Corlette [mailto:[hidden email]]
Sent: Saturday, December 06, 2008 12:12 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk

Eric, I think you've made an assumption about where the event is generated, which is why the initiator/target/observer separation is important in XDAS.  I agree that IF the observer (that generates the event) is ALSO the target (affected by the event), then we can be sure that it's 100% accurate in reporting the effects of the activity.  But there are plenty of cases where the observer is NOT the target, like as you point out with IDSs and other types of systems.  In those cases there's always a risk that the observed behavior is not what actually happened.  But I don't see those as a separate class of events.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Eric Fitzgerald
In reply to this post by Sanford Whitehouse-2
Hey Sanford,

I agree with all your arguments and reasoning below.  We can debate implementation and terminology but conceptually I agree.

I have only two questions:

Q1: Would it be possible to encode your four classification labels into the taxon?
+ Event source name
+ Event source type
+ Event class
+ Event type

Q2: Would it be desirable to do so?

Best regards,
Eric



-----Original Message-----
From: Sanford Whitehouse [mailto:[hidden email]]
Sent: Saturday, December 06, 2008 3:39 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk

I think so, too.  For those not interested in the discussion, drop down
to "Summary".

When looking at the taxonomy I try to compare it to how a manual
analysis of logs would be done using the taxonomy in my head.  In the
filtering part of the process I either make an assumption about the
event context or use information I have to determine if an event is
relevant.  The filtering includes what type of product or component
generated the message, what type of message is it, does it fall into the
class of events I'm interested in, and can the message be seen more than
once.  All this before determining what the event was.  Manually, I have
to know or assume the answers.

In the case of a general taxonomy, it is best not to make assumptions.
It's also highly desirable to minimize the information that must be
provided by the vendor.  When thinking about the taxonomy structure I
ask if enough information is provided in the message, or I have to go by
experience or a priori information.

There are guidelines that apply to this problem:
(Convenience definitions: System = a single identifiable source of
messages, such as a host or device.  Component = Any part of the system,
non-native product installed on the system, or component/subsystem of
either.)

+ The event is independent of the system it came from or the component
that generated it.

- From a taxonomy perspective (the description of the event) a login can
be reported from anywhere by anything.

+ The same event can be reported by multiple components on the same
system, or by multiple systems.

- The first is anything that can run on the system.  In the case of
login it can be the operating system component that supported the login,
an audit component watching the operating system, and an HIDS installed
on the same system. (Each of these can have a definition supporting why
they may exist on the same system.)

- There can be no expectation that desired components will be available
on a system.

Obviously, a system audit component may not run on a system.  The
distinctions about what can report an event must be recognized.

+ Messages can come from multiple systems not directly related.

- Such as the OS and an external IDS system.

+ There are too many products to define what it one product does simply
by the name of the process, the log, some external system
classification, or other method requiring external information.

- Ideally, analysis of an event can begin as soon as the message or
messages are received.  It should not have to wait for someone to sit in
front of an analysis product and manage some type of component
classification for each message.

And finally...

+ Make decisions about the message a quickly as possible.

- Given the mantra about having to analyze ba-zillions of
messages/second, speed can be improved by reducing and simplifying the
set of tests used to filter and determine relevance.

Summary

All of this adds up to splitting the taxonomy into two parts; the event
context and the event description.  (There are actually four or five
pieces to any event.  I'll list those after.)

Event context contains information about what reported the event.  It
should have:

+ Event source name:  The process or other component that generated the
message.  (This might be considered convenience information.  I tend to
use it to help with dealing with external information that can't be
avoided.)

+ Event source type:  What does it do?  Security, auditing, user
management, etc.  This allows for distinctions between multiple
components on the same system.  The alternative is to know that auditd
and useradd may report on the same event.

+ Event class (or function): What functional area did the event occur
in?       A login may have an event source type of "user management",
"audit", and "NIDS".  The class may be "system access".  Pick which you
want to use.

+ Event type:  These are application, administration, and operation.
 From these it is possible to know the perspective one should take on
the event.  Did someone change something, is the product working well,
or is the event something the product did.

 From these I can do the following:

+ Filter which messages are useful when reported by multiple components
on the same system.

+ Quickly filter desired messages by avoiding minimizing test of the
event description.

- This includes determining analysis to be done (op, admin, app) earlier
in message processing.

+ Assist with correlating event messages reported by more than one system.

+ Combine information when reported by multiple components or systems
for a larger picture.

The first part of any analysis is filtering the messages in the
direction of desired analysis an use.  The fields assist both the log
analysis product and the administrator in efficiency and focus.

To accomplish the same points without event context will require a
method to add the same information.  Adding it will require analyzing
each message and message source, and the existence of a classification
system.  The would be defined on a vendor/user and product case by case
basis, and may be ad hoc.

Without the fields, analysis and filtering would be roughly equal to
doing a persons address/phone look up on the web and just using the
persons name.  There can be any number of matches.  Filtering is done by
reading the results and selecting the best fit based on external
criteria.  Each time the best fit criteria change, such a where the
person lives, the process has to be done again.  If the results don't
match the criteria, the criteria must be modified, additional
information must be supplied, or it must be accepted that no suitable
result exists.  Analysis is modified to support it, essentially creating
an exception that must be managed.

Underlying use cases for an event standard must include the dynamic IT
environment, the ongoing changes to products, and the volume and rate of
messages.  To do that, manageability and analysis should be considered.
  I think the additional fields will help with that.

As usual, apologies for the long email.

Sanford

Raffael Marty wrote:

> On this, Bill,
>
> I need to correlate audit events and IDS events. I need them to fall
> into the same taxonomy. Otherwise my correlation are going to be just
> unworkable. And it would go against the idea of an overarching taxonomy.
> Having said that, I think we need to add a component to the taxonomy:
>
> object
> action
> status
> devicetype
>
> Whatever we call the latter. You have a point about the type of event. I
> used to have a node for this at ArcSight and it was a failure, but
> thinking about it, it can be used in this way to let you query and
> figure out how much you actually trust an event. It is sometimes useful
> to find resource failures on just the NIDSs or failed attacks reported
> by a HIDS. So, I think we should add this node.
>
> Actually, while I am at it... I believe I raised this with Anton at some
> point: IDSs really need a relevance or confidence measure to their
> alerts. How tight is a signature written? What's the likelihood of false
> positives. And this could be added to other log types as well. How
> certain are you about what you are reporting. Especially in the security
> devices this would be invaluable.
>
> I am going to shut up now...
>
>   -raffy
>
> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>
>> After talking with multiple end users and spending
>> some time trying to organize my thoughts, I am
>> looking for some opinions on the CEE Taxonomy.
>>
>> From the beginning, all of the talk has been surrounding
>> a subject-verb-object-result model for describing the
>> type of event being logged.
>>
>> While I still think there is validity to this approach,
>> we tend to approach the log space from multiple perspectives.
>> For example, IDS events: a passive device sends an alert
>> about traffic matching a signature. We conceptually think
>> about this differently than audit logs (e.g. OS). Shouldn't
>> an event taxonomy reflect this?
>>
>> At the pure taxonomic categorization, there might not be
>> much difference. However if we start looking at syntax and
>> implementation, we expect to see different types of data
>> based on the event type. Therefore, I seems to make sense
>> to introduce another taxonomy layer.
>>
>> What I am currently thinking is something like:
>>
>> - Signature/Rule match event
>>     Subject: IDS, HIDS, IPS, Firewall, A/V
>>     Verb: allowed, blocked, quarantined, removed
>>     Object: Packet, file
>>     Result: success, fail
>>
>> - Audit (maybe break down further?)
>>     Subject: user, file, host, application, service
>>     Verb: add, delete, modify, start, stop
>>     Obj: account, password, config, service
>>
>> - Web
>>     ...
>> - E-mail
>>     ...
>> - DHCP
>> - NAC
>> - NAT
>>
>>
>> Instead of adding everything to one top level "event"
>> category, it seems easier to break the event space into
>> some logical partitions. I think that this would make
>> discussion, support, and implementation a lot easier and
>> straightforward.
>>
>> Trying to descripe the log universe in a single taxonomic
>> manner seems too unwhieldy. When we start talking about data
>> types, I doesn't make sense to treat OS-level events the same
>> as IDS or DHCP events.
>>
>> What are your opinions?
>> Does something like this make sense?
>>
>>
>> William Heinbockel
>> Infosec Engineer, Sr.
>> The MITRE Corporation
>> 202 Burlington Rd. MS S145
>> Bedford, MA 01730
>> [hidden email]
>> 781-271-2615
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Sanford Whitehouse-2
Hey Eric,

Thanks for reading through it.  I'm not fixed on terminology.  It's a
slippery thing for me.  Gets in the way at times. :)

Other responses in line.

Sanford

Eric Fitzgerald wrote:
> Hey Sanford,
>
> I agree with all your arguments and reasoning below.  We can debate implementation and terminology
  but conceptually I agree.
>
> I have only two questions:
>
> Q1: Would it be possible to encode your four classification labels into the taxon?
> + Event source name
> + Event source type
> + Event class
> + Event type

I'm sure it can be done.  Is there a description of the current
taxon/taxonomy?

>
> Q2: Would it be desirable to do so?
>

I believe so, at the risk of becoming complicated and cumbersome.  Has
there been discussion around the practical contents of a message?
Practical in terms of complexity and size, and alternatives for
necessary information that exceeds.  This is a concern without knowing
the current direction.

Sanford

> Best regards,
> Eric
>
>
>
> -----Original Message-----
> From: Sanford Whitehouse [mailto:[hidden email]]
> Sent: Saturday, December 06, 2008 3:39 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk
>
> I think so, too.  For those not interested in the discussion, drop down
> to "Summary".
>
> When looking at the taxonomy I try to compare it to how a manual
> analysis of logs would be done using the taxonomy in my head.  In the
> filtering part of the process I either make an assumption about the
> event context or use information I have to determine if an event is
> relevant.  The filtering includes what type of product or component
> generated the message, what type of message is it, does it fall into the
> class of events I'm interested in, and can the message be seen more than
> once.  All this before determining what the event was.  Manually, I have
> to know or assume the answers.
>
> In the case of a general taxonomy, it is best not to make assumptions.
> It's also highly desirable to minimize the information that must be
> provided by the vendor.  When thinking about the taxonomy structure I
> ask if enough information is provided in the message, or I have to go by
> experience or a priori information.
>
> There are guidelines that apply to this problem:
> (Convenience definitions: System = a single identifiable source of
> messages, such as a host or device.  Component = Any part of the system,
> non-native product installed on the system, or component/subsystem of
> either.)
>
> + The event is independent of the system it came from or the component
> that generated it.
>
> - From a taxonomy perspective (the description of the event) a login can
> be reported from anywhere by anything.
>
> + The same event can be reported by multiple components on the same
> system, or by multiple systems.
>
> - The first is anything that can run on the system.  In the case of
> login it can be the operating system component that supported the login,
> an audit component watching the operating system, and an HIDS installed
> on the same system. (Each of these can have a definition supporting why
> they may exist on the same system.)
>
> - There can be no expectation that desired components will be available
> on a system.
>
> Obviously, a system audit component may not run on a system.  The
> distinctions about what can report an event must be recognized.
>
> + Messages can come from multiple systems not directly related.
>
> - Such as the OS and an external IDS system.
>
> + There are too many products to define what it one product does simply
> by the name of the process, the log, some external system
> classification, or other method requiring external information.
>
> - Ideally, analysis of an event can begin as soon as the message or
> messages are received.  It should not have to wait for someone to sit in
> front of an analysis product and manage some type of component
> classification for each message.
>
> And finally...
>
> + Make decisions about the message a quickly as possible.
>
> - Given the mantra about having to analyze ba-zillions of
> messages/second, speed can be improved by reducing and simplifying the
> set of tests used to filter and determine relevance.
>
> Summary
>
> All of this adds up to splitting the taxonomy into two parts; the event
> context and the event description.  (There are actually four or five
> pieces to any event.  I'll list those after.)
>
> Event context contains information about what reported the event.  It
> should have:
>
> + Event source name:  The process or other component that generated the
> message.  (This might be considered convenience information.  I tend to
> use it to help with dealing with external information that can't be
> avoided.)
>
> + Event source type:  What does it do?  Security, auditing, user
> management, etc.  This allows for distinctions between multiple
> components on the same system.  The alternative is to know that auditd
> and useradd may report on the same event.
>
> + Event class (or function): What functional area did the event occur
> in?       A login may have an event source type of "user management",
> "audit", and "NIDS".  The class may be "system access".  Pick which you
> want to use.
>
> + Event type:  These are application, administration, and operation.
>  From these it is possible to know the perspective one should take on
> the event.  Did someone change something, is the product working well,
> or is the event something the product did.
>
>  From these I can do the following:
>
> + Filter which messages are useful when reported by multiple components
> on the same system.
>
> + Quickly filter desired messages by avoiding minimizing test of the
> event description.
>
> - This includes determining analysis to be done (op, admin, app) earlier
> in message processing.
>
> + Assist with correlating event messages reported by more than one system.
>
> + Combine information when reported by multiple components or systems
> for a larger picture.
>
> The first part of any analysis is filtering the messages in the
> direction of desired analysis an use.  The fields assist both the log
> analysis product and the administrator in efficiency and focus.
>
> To accomplish the same points without event context will require a
> method to add the same information.  Adding it will require analyzing
> each message and message source, and the existence of a classification
> system.  The would be defined on a vendor/user and product case by case
> basis, and may be ad hoc.
>
> Without the fields, analysis and filtering would be roughly equal to
> doing a persons address/phone look up on the web and just using the
> persons name.  There can be any number of matches.  Filtering is done by
> reading the results and selecting the best fit based on external
> criteria.  Each time the best fit criteria change, such a where the
> person lives, the process has to be done again.  If the results don't
> match the criteria, the criteria must be modified, additional
> information must be supplied, or it must be accepted that no suitable
> result exists.  Analysis is modified to support it, essentially creating
> an exception that must be managed.
>
> Underlying use cases for an event standard must include the dynamic IT
> environment, the ongoing changes to products, and the volume and rate of
> messages.  To do that, manageability and analysis should be considered.
>   I think the additional fields will help with that.
>
> As usual, apologies for the long email.
>
> Sanford
>
> Raffael Marty wrote:
>> On this, Bill,
>>
>> I need to correlate audit events and IDS events. I need them to fall
>> into the same taxonomy. Otherwise my correlation are going to be just
>> unworkable. And it would go against the idea of an overarching taxonomy.
>> Having said that, I think we need to add a component to the taxonomy:
>>
>> object
>> action
>> status
>> devicetype
>>
>> Whatever we call the latter. You have a point about the type of event. I
>> used to have a node for this at ArcSight and it was a failure, but
>> thinking about it, it can be used in this way to let you query and
>> figure out how much you actually trust an event. It is sometimes useful
>> to find resource failures on just the NIDSs or failed attacks reported
>> by a HIDS. So, I think we should add this node.
>>
>> Actually, while I am at it... I believe I raised this with Anton at some
>> point: IDSs really need a relevance or confidence measure to their
>> alerts. How tight is a signature written? What's the likelihood of false
>> positives. And this could be added to other log types as well. How
>> certain are you about what you are reporting. Especially in the security
>> devices this would be invaluable.
>>
>> I am going to shut up now...
>>
>>   -raffy
>>
>> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>>
>>> After talking with multiple end users and spending
>>> some time trying to organize my thoughts, I am
>>> looking for some opinions on the CEE Taxonomy.
>>>
>>> From the beginning, all of the talk has been surrounding
>>> a subject-verb-object-result model for describing the
>>> type of event being logged.
>>>
>>> While I still think there is validity to this approach,
>>> we tend to approach the log space from multiple perspectives.
>>> For example, IDS events: a passive device sends an alert
>>> about traffic matching a signature. We conceptually think
>>> about this differently than audit logs (e.g. OS). Shouldn't
>>> an event taxonomy reflect this?
>>>
>>> At the pure taxonomic categorization, there might not be
>>> much difference. However if we start looking at syntax and
>>> implementation, we expect to see different types of data
>>> based on the event type. Therefore, I seems to make sense
>>> to introduce another taxonomy layer.
>>>
>>> What I am currently thinking is something like:
>>>
>>> - Signature/Rule match event
>>>     Subject: IDS, HIDS, IPS, Firewall, A/V
>>>     Verb: allowed, blocked, quarantined, removed
>>>     Object: Packet, file
>>>     Result: success, fail
>>>
>>> - Audit (maybe break down further?)
>>>     Subject: user, file, host, application, service
>>>     Verb: add, delete, modify, start, stop
>>>     Obj: account, password, config, service
>>>
>>> - Web
>>>     ...
>>> - E-mail
>>>     ...
>>> - DHCP
>>> - NAC
>>> - NAT
>>>
>>>
>>> Instead of adding everything to one top level "event"
>>> category, it seems easier to break the event space into
>>> some logical partitions. I think that this would make
>>> discussion, support, and implementation a lot easier and
>>> straightforward.
>>>
>>> Trying to descripe the log universe in a single taxonomic
>>> manner seems too unwhieldy. When we start talking about data
>>> types, I doesn't make sense to treat OS-level events the same
>>> as IDS or DHCP events.
>>>
>>> What are your opinions?
>>> Does something like this make sense?
>>>
>>>
>>> William Heinbockel
>>> Infosec Engineer, Sr.
>>> The MITRE Corporation
>>> 202 Burlington Rd. MS S145
>>> Bedford, MA 01730
>>> [hidden email]
>>> 781-271-2615
>>>
>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

David Corlette
In reply to this post by Sanford Whitehouse-2
Hi Sanford,

We call this the "Observer" in the XDAS data model, and consider it to be distinct (in some cases) from the user/host/service that actually caused the event to occur.

>>> On 06.12.2008 at 18:39, in message <[hidden email]>, Sanford
Whitehouse <[hidden email]> wrote:

> Event context contains information about what reported the event.  It
> should have:
>
> + Event source name....
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

David Corlette
In reply to this post by Eric Fitzgerald
OK, I think we understand each other then.  I guess I would argue that even though these two classes of events might represent different semantic levels, I don't see any reason why they couldn't be contained within a single data model.  That data model might include optional fields like "accuracy" which are only relevant to certain types of message.  I guess we'll see if this works.


>>> On 06.12.2008 at 19:31, in message
<[hidden email]
v.microsoft.com>, Eric Fitzgerald <[hidden email]> wrote:
> No, I've made no such assumption.  Let me restate my argument.
--SNIP--

> I would love to have this discussion in person as it really is an
> information theory discussion.  My argument is that events from use case #2
> occupy a different semantic level than events from use case #1. ...
>
> -----Original Message-----
> From: David Corlette [mailto:[hidden email]]
> Sent: Saturday, December 06, 2008 12:12 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk
>
> Eric, I think you've made an assumption about where the event is generated,
> which is why the initiator/target/observer separation is important in XDAS.  
> I agree that IF the observer (that generates the event) is ALSO the target
> (affected by the event), then we can be sure that it's 100% accurate in
> reporting the effects of the activity.  But there are plenty of cases where
> the observer is NOT the target, like as you point out with IDSs and other
> types of systems.  In those cases there's always a risk that the observed
> behavior is not what actually happened.  But I don't see those as a separate
> class of events.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Raffael Marty-3
In reply to this post by Tina Bird
> It might be invaluable, but isn't it impossible? How can the device  
> itself
> grade its own performance?

It's not necessarily the device itself, but the person writing the  
rules. You can really easily decide whether an IDS signature will  
generate any false positives or there is just no way that it will.

Anyways

   -raffy

>
>
> In any event, surely this is outside the current scope...there's  
> plenty of
> existing data to make sense of before we start creating new data to  
> deal
> with  ;-)
>
> t.
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Raffael Marty-3
In reply to this post by Eric Fitzgerald
Well...

I am just not sure that we want to add a hierarchy so high up. Again.  
Correlation needs to be simple, and if I have to correlate events  
within two different branches, that might be hard and not really useful.

   -raffy

On Dec 5, 2008, at 10:19 PM, Eric Fitzgerald wrote:

> I need time to grok most of Raffy's avalanche of comments today, but  
> I wanted to chime in on this one.
>
> There is a hierarchy of levels of semantic meaning in events, and at  
> the base of this hierarchy is there are the events that we care  
> about in CEE.  These events have no relevance or confidence  
> measurement associated with them, because they are always absolutely  
> 100% true from the point of view of the system raising the event-  
> they just record something that the system did or saw.
>
> After an event is analyzed by a system such as an IDS or even a  
> health monitoring or management system, then the analysis engine may  
> issue something event-like but which is at a higher semantic level  
> (this thing might still be considered an event as far as CEE is  
> concerned but it would be elsewhere in the taxonomy).  These higher  
> level semantic event-like records might have a probability-type  
> metric associated with them, but that is an application-specific  
> parameter and probably not broadly applicable, at least not enough  
> to include in the base schema for events.
>
> I completely agree with the idea of a probability metric and in fact  
> have a use case, but I disagree that it is part of the base event  
> schema for all events.
>
> At least that is my opinion.  I'll be happy to expound at length on  
> our next con call.
>
> Best regards,
> Eric
>
> From: Tina Bird [[hidden email]]
> Sent: Friday, December 05, 2008 7:36 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk
>
> > Actually, while I am at it... I believe I raised this with Anton at
> > some point: IDSs really need a relevance or confidence measure to
> > their alerts. How tight is a signature written? What's the
> > likelihood
> > of false positives. And this could be added to other log types as
> > well. How certain are you about what you are reporting.
> > Especially in
> > the security devices this would be invaluable.
> >
> > I am going to shut up now...
>
> It might be invaluable, but isn't it impossible? How can the device  
> itself
> grade its own performance?
>
> In any event, surely this is outside the current scope...there's  
> plenty of
> existing data to make sense of before we start creating new data to  
> deal
> with  ;-)
>
> t.
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Sanford Whitehouse-2
In reply to this post by David Corlette
How does the model distinguish between observing an event and
participating in an event?

The XDAS model appears to work at a level above where this type of
context information is intended to fit.  It is after the message as been
consumed and some form of external information has been provided to make
a distinction between observer and executor.

Working from the top down:

1.  XDAS
2.  External information
3.  Original event message

Event context is, conceptually, part of #3, intended to minimize the
need for #2.  In some cases #1 and #2 are not required.  This depends on
the use case for the message.  Is it some administrative function
(lowest level) or compliance (one in the highest level)?

If event context is not part of #3 the information provided in #2 must
be larger and more complex.  How will the XDAS level make a distinction?

Also, could you describe how the cases where observer may not be
distinct from the observed?

I don't think I've seen the most recent version of the XDAS model.  Is
there some place I can pull it down from?

Sanford


David Corlette wrote:

> Hi Sanford,
>
> We call this the "Observer" in the XDAS data model, and consider it to be distinct (in some cases) from the user/host/service that actually caused the event to occur.
>
>>>> On 06.12.2008 at 18:39, in message <[hidden email]>, Sanford
> Whitehouse <[hidden email]> wrote:
>
>> Event context contains information about what reported the event.  It
>> should have:
>>
>> + Event source name....
>
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

joël Winteregg-3
Hi Sanford,

>
> I don't think I've seen the most recent version of the XDAS model.  Is
> there some place I can pull it down from?
>

There is an available version of XDAS v2 at: http://12.193.84.139:8080
workspace -> Team workspaces -> Event Standard -> File folder


Joël
Reply | Threaded
Open this post in threaded view
|

Re: Taxonomy talk

Raffael Marty-3
In reply to this post by Sanford Whitehouse-2
I wouldn't be too opposed to that, actually.

HOWEVER! The source name should not be. The source type, we can talk  
about. That's what I indicated in one of my posts by device type. The  
event class is really similar to the event type. And really, the event  
class is the group of object - action - status. I guess for type it's  
just a granularity decision.

   -raffy

--
   Raffael Marty
   Chief Security Strategist                           @ Splunk>
   Security Visualization: http://secviz.org       raffy.ch/blog



On Dec 7, 2008, at 4:45 PM, Sanford Whitehouse wrote:

> Hey Eric,
>
> Thanks for reading through it.  I'm not fixed on terminology.  It's a
> slippery thing for me.  Gets in the way at times. :)
>
> Other responses in line.
>
> Sanford
>
> Eric Fitzgerald wrote:
>> Hey Sanford,
>>
>> I agree with all your arguments and reasoning below.  We can debate  
>> implementation and terminology
>  but conceptually I agree.
>>
>> I have only two questions:
>>
>> Q1: Would it be possible to encode your four classification labels  
>> into the taxon?
>> + Event source name
>> + Event source type
>> + Event class
>> + Event type
>
> I'm sure it can be done.  Is there a description of the current
> taxon/taxonomy?
>
>>
>> Q2: Would it be desirable to do so?
>>
>
> I believe so, at the risk of becoming complicated and cumbersome.  Has
> there been discussion around the practical contents of a message?
> Practical in terms of complexity and size, and alternatives for
> necessary information that exceeds.  This is a concern without knowing
> the current direction.
>
> Sanford
>
>> Best regards,
>> Eric
>>
>>
>>
>> -----Original Message-----
>> From: Sanford Whitehouse [mailto:[hidden email]]
>> Sent: Saturday, December 06, 2008 3:39 PM
>> To: [hidden email]
>> Subject: Re: [CEE-DISCUSSION-LIST] Taxonomy talk
>>
>> I think so, too.  For those not interested in the discussion, drop  
>> down
>> to "Summary".
>>
>> When looking at the taxonomy I try to compare it to how a manual
>> analysis of logs would be done using the taxonomy in my head.  In the
>> filtering part of the process I either make an assumption about the
>> event context or use information I have to determine if an event is
>> relevant.  The filtering includes what type of product or component
>> generated the message, what type of message is it, does it fall  
>> into the
>> class of events I'm interested in, and can the message be seen more  
>> than
>> once.  All this before determining what the event was.  Manually, I  
>> have
>> to know or assume the answers.
>>
>> In the case of a general taxonomy, it is best not to make  
>> assumptions.
>> It's also highly desirable to minimize the information that must be
>> provided by the vendor.  When thinking about the taxonomy structure I
>> ask if enough information is provided in the message, or I have to  
>> go by
>> experience or a priori information.
>>
>> There are guidelines that apply to this problem:
>> (Convenience definitions: System = a single identifiable source of
>> messages, such as a host or device.  Component = Any part of the  
>> system,
>> non-native product installed on the system, or component/subsystem of
>> either.)
>>
>> + The event is independent of the system it came from or the  
>> component
>> that generated it.
>>
>> - From a taxonomy perspective (the description of the event) a  
>> login can
>> be reported from anywhere by anything.
>>
>> + The same event can be reported by multiple components on the same
>> system, or by multiple systems.
>>
>> - The first is anything that can run on the system.  In the case of
>> login it can be the operating system component that supported the  
>> login,
>> an audit component watching the operating system, and an HIDS  
>> installed
>> on the same system. (Each of these can have a definition supporting  
>> why
>> they may exist on the same system.)
>>
>> - There can be no expectation that desired components will be  
>> available
>> on a system.
>>
>> Obviously, a system audit component may not run on a system.  The
>> distinctions about what can report an event must be recognized.
>>
>> + Messages can come from multiple systems not directly related.
>>
>> - Such as the OS and an external IDS system.
>>
>> + There are too many products to define what it one product does  
>> simply
>> by the name of the process, the log, some external system
>> classification, or other method requiring external information.
>>
>> - Ideally, analysis of an event can begin as soon as the message or
>> messages are received.  It should not have to wait for someone to  
>> sit in
>> front of an analysis product and manage some type of component
>> classification for each message.
>>
>> And finally...
>>
>> + Make decisions about the message a quickly as possible.
>>
>> - Given the mantra about having to analyze ba-zillions of
>> messages/second, speed can be improved by reducing and simplifying  
>> the
>> set of tests used to filter and determine relevance.
>>
>> Summary
>>
>> All of this adds up to splitting the taxonomy into two parts; the  
>> event
>> context and the event description.  (There are actually four or five
>> pieces to any event.  I'll list those after.)
>>
>> Event context contains information about what reported the event.  It
>> should have:
>>
>> + Event source name:  The process or other component that generated  
>> the
>> message.  (This might be considered convenience information.  I  
>> tend to
>> use it to help with dealing with external information that can't be
>> avoided.)
>>
>> + Event source type:  What does it do?  Security, auditing, user
>> management, etc.  This allows for distinctions between multiple
>> components on the same system.  The alternative is to know that  
>> auditd
>> and useradd may report on the same event.
>>
>> + Event class (or function): What functional area did the event occur
>> in?       A login may have an event source type of "user management",
>> "audit", and "NIDS".  The class may be "system access".  Pick which  
>> you
>> want to use.
>>
>> + Event type:  These are application, administration, and operation.
>> From these it is possible to know the perspective one should take on
>> the event.  Did someone change something, is the product working  
>> well,
>> or is the event something the product did.
>>
>> From these I can do the following:
>>
>> + Filter which messages are useful when reported by multiple  
>> components
>> on the same system.
>>
>> + Quickly filter desired messages by avoiding minimizing test of the
>> event description.
>>
>> - This includes determining analysis to be done (op, admin, app)  
>> earlier
>> in message processing.
>>
>> + Assist with correlating event messages reported by more than one  
>> system.
>>
>> + Combine information when reported by multiple components or systems
>> for a larger picture.
>>
>> The first part of any analysis is filtering the messages in the
>> direction of desired analysis an use.  The fields assist both the log
>> analysis product and the administrator in efficiency and focus.
>>
>> To accomplish the same points without event context will require a
>> method to add the same information.  Adding it will require analyzing
>> each message and message source, and the existence of a  
>> classification
>> system.  The would be defined on a vendor/user and product case by  
>> case
>> basis, and may be ad hoc.
>>
>> Without the fields, analysis and filtering would be roughly equal to
>> doing a persons address/phone look up on the web and just using the
>> persons name.  There can be any number of matches.  Filtering is  
>> done by
>> reading the results and selecting the best fit based on external
>> criteria.  Each time the best fit criteria change, such a where the
>> person lives, the process has to be done again.  If the results don't
>> match the criteria, the criteria must be modified, additional
>> information must be supplied, or it must be accepted that no suitable
>> result exists.  Analysis is modified to support it, essentially  
>> creating
>> an exception that must be managed.
>>
>> Underlying use cases for an event standard must include the dynamic  
>> IT
>> environment, the ongoing changes to products, and the volume and  
>> rate of
>> messages.  To do that, manageability and analysis should be  
>> considered.
>>  I think the additional fields will help with that.
>>
>> As usual, apologies for the long email.
>>
>> Sanford
>>
>> Raffael Marty wrote:
>>> On this, Bill,
>>>
>>> I need to correlate audit events and IDS events. I need them to fall
>>> into the same taxonomy. Otherwise my correlation are going to be  
>>> just
>>> unworkable. And it would go against the idea of an overarching  
>>> taxonomy.
>>> Having said that, I think we need to add a component to the  
>>> taxonomy:
>>>
>>> object
>>> action
>>> status
>>> devicetype
>>>
>>> Whatever we call the latter. You have a point about the type of  
>>> event. I
>>> used to have a node for this at ArcSight and it was a failure, but
>>> thinking about it, it can be used in this way to let you query and
>>> figure out how much you actually trust an event. It is sometimes  
>>> useful
>>> to find resource failures on just the NIDSs or failed attacks  
>>> reported
>>> by a HIDS. So, I think we should add this node.
>>>
>>> Actually, while I am at it... I believe I raised this with Anton  
>>> at some
>>> point: IDSs really need a relevance or confidence measure to their
>>> alerts. How tight is a signature written? What's the likelihood of  
>>> false
>>> positives. And this could be added to other log types as well. How
>>> certain are you about what you are reporting. Especially in the  
>>> security
>>> devices this would be invaluable.
>>>
>>> I am going to shut up now...
>>>
>>>  -raffy
>>>
>>> On Oct 24, 2008, at 9:52 AM, Heinbockel, Bill wrote:
>>>
>>>> After talking with multiple end users and spending
>>>> some time trying to organize my thoughts, I am
>>>> looking for some opinions on the CEE Taxonomy.
>>>>
>>>> From the beginning, all of the talk has been surrounding
>>>> a subject-verb-object-result model for describing the
>>>> type of event being logged.
>>>>
>>>> While I still think there is validity to this approach,
>>>> we tend to approach the log space from multiple perspectives.
>>>> For example, IDS events: a passive device sends an alert
>>>> about traffic matching a signature. We conceptually think
>>>> about this differently than audit logs (e.g. OS). Shouldn't
>>>> an event taxonomy reflect this?
>>>>
>>>> At the pure taxonomic categorization, there might not be
>>>> much difference. However if we start looking at syntax and
>>>> implementation, we expect to see different types of data
>>>> based on the event type. Therefore, I seems to make sense
>>>> to introduce another taxonomy layer.
>>>>
>>>> What I am currently thinking is something like:
>>>>
>>>> - Signature/Rule match event
>>>>    Subject: IDS, HIDS, IPS, Firewall, A/V
>>>>    Verb: allowed, blocked, quarantined, removed
>>>>    Object: Packet, file
>>>>    Result: success, fail
>>>>
>>>> - Audit (maybe break down further?)
>>>>    Subject: user, file, host, application, service
>>>>    Verb: add, delete, modify, start, stop
>>>>    Obj: account, password, config, service
>>>>
>>>> - Web
>>>>    ...
>>>> - E-mail
>>>>    ...
>>>> - DHCP
>>>> - NAC
>>>> - NAT
>>>>
>>>>
>>>> Instead of adding everything to one top level "event"
>>>> category, it seems easier to break the event space into
>>>> some logical partitions. I think that this would make
>>>> discussion, support, and implementation a lot easier and
>>>> straightforward.
>>>>
>>>> Trying to descripe the log universe in a single taxonomic
>>>> manner seems too unwhieldy. When we start talking about data
>>>> types, I doesn't make sense to treat OS-level events the same
>>>> as IDS or DHCP events.
>>>>
>>>> What are your opinions?
>>>> Does something like this make sense?
>>>>
>>>>
>>>> William Heinbockel
>>>> Infosec Engineer, Sr.
>>>> The MITRE Corporation
>>>> 202 Burlington Rd. MS S145
>>>> Bedford, MA 01730
>>>> [hidden email]
>>>> 781-271-2615
>>>>
>>>>
>>
>>
12