CEE Taxonomy: Enumeration or Language?

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

CEE Taxonomy: Enumeration or Language?

heinbockel

While there is general agreement that log types
need to be
consistently and accurately identified, there has
been no
agreement as to the best way to pursue this need.


GOAL: To be able to unambiguously group logs based
upon
the representative event type.


For example, if given a log file, I should be able
to
quickly identify all logs dealing with events
related to
"authentication", "privilege elevation", or
"configuration
changes". In order to more closely relate
regulatory and
policy-level mandates to the actual logs, this is
a
necessity for most (all?) users.


There are two ways that this can be handled:

1. Language -- every event can be described as a
(possibly
unknown) SUBJECT performing an ACTION on an
OBJECT. The
process requires a developer to select the most
appropriate word choice from each of the 3
categories.


2. Enumeration -- provide a listing of all of the
event
types. Each event matches to exactly one
enumerated type.

In the most recent draft (v2.0) of XDAS, there is
a
multi-level dotted-notation (similar to SNMP OIDs)
to
enumerate events. The first level is the registry
id, then
the provider id, followed by the "event space"
(category),
and finally the singleton event. For instance,
"0.0.7.0"
identifies the "Create association with data item"
event
type in the "Data Item or Resource Element Content
Access
Events" category, as provided by OpenGroup and
defined
in the OpenGroup registry. Other examples of the
current
categories are "Account Management Events", "Trust
Management Events", and "Peer Association
Management
Events", consisting of singleton event types such
as
"Backup datastore", "Invoke service or
application",
"Create account".

(Both approaches agree must be some expression of
the result.)

Now, each of these approaches has merit. What this
discussion breaks down to is that a language, like
OVAL,
better captures nuances and allows for more
flexibility.
An enumeration (e.g., CWE, CVE) is more precise,
requires
more "well-defined" boundaries, and is better for
computers.

The CEE Taxonomy problem can be solved with either
a
language or an enumeration-based approach. With
past
standardization efforts, MITRE has used use-cases
as the
primary driver. With CVE, the primary use was the
differentiation of vulnerabilities, for which an
enumeration works really well. For OVAL, there are
too
many different ways of validation/verification
across
platforms.


Just something to put some thought into over the
holiday.
I am interested in hearing any feedback from this
group as
to which you think is more appropriate for
expressing the
type of log.


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615



smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Sanford Whitehouse
1.  Language.

a.  Enumeration can change.

Logs written with enumerations are subject to keeping track of
enumeration dictionary changes.  Each message will need to have a
version, or some other form or relating the dictionary to the log or
message.  In SNMP, the OID fields have a hierarchy that has been
constant for quite a while.  But use of fields, past the enterprise
identifier, is up to the vendor.  It's mapped to the MIB.  They can use
it any way they like.  (Correct me if I'm wrong.  It's been a while...)

That's not the case here.  The taxonomy definition will have a set of
defined fields, and values for those field that will, presumably, remain
constant.  There will be changes to the definitions until a balance has
been found.  Changes to enumeration will change archived logs.

b.  The first level users are administrators (system and product.)  The
logs they read should be understandable without translation.  

Administrators and users are not fond of enumeration of any kind, let
alone one that would limit them to guessing what has been logged.

2.  Where can enumeration apply?  At the level where data is exchanged
between log consumers.  

a.  It's a useful shorthand for keeping traffic down and structure the
representation.  However, I wouldn't use it for long term archival.
Keeping track of dictionaries and log versions is too complex.

b.  In cases where event descriptions, and other values, are enumerated
is usually in databases and database applications, or complex logging
environments such as i5/OS or z/OS.  It's a natural for them, and very
useful for reporting.  All of those have an environment or
infrastructure that supports use.  One can't use the logs or log access
tools without the environment being complete and up to date.  When the
logs are moved out, the dictionaries have to move with them.  It's a big
pain.

3.  If I can't get to it, it's not much use.

The basic argument is today I can read most logs with just about any
editor.  If taxonomy enumeration is used, that may not be possible.  I
like it simple when it comes to me and me logs.

Sanford

-----Original Message-----
From: Heinbockel, Bill [mailto:[hidden email]]
Sent: Wednesday, July 02, 2008 7:12 AM
To: [hidden email]
Subject: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?


While there is general agreement that log types need to be consistently
and accurately identified, there has been no agreement as to the best
way to pursue this need.


GOAL: To be able to unambiguously group logs based upon the
representative event type.


For example, if given a log file, I should be able to quickly identify
all logs dealing with events related to "authentication", "privilege
elevation", or "configuration changes". In order to more closely relate
regulatory and policy-level mandates to the actual logs, this is a
necessity for most (all?) users.


There are two ways that this can be handled:

1. Language -- every event can be described as a (possibly
unknown) SUBJECT performing an ACTION on an OBJECT. The process requires
a developer to select the most appropriate word choice from each of the
3 categories.


2. Enumeration -- provide a listing of all of the event types. Each
event matches to exactly one enumerated type.

In the most recent draft (v2.0) of XDAS, there is a multi-level
dotted-notation (similar to SNMP OIDs) to enumerate events. The first
level is the registry id, then the provider id, followed by the "event
space"
(category),
and finally the singleton event. For instance, "0.0.7.0"
identifies the "Create association with data item"
event
type in the "Data Item or Resource Element Content Access Events"
category, as provided by OpenGroup and defined in the OpenGroup
registry. Other examples of the current categories are "Account
Management Events", "Trust Management Events", and "Peer Association
Management Events", consisting of singleton event types such as "Backup
datastore", "Invoke service or application", "Create account".

(Both approaches agree must be some expression of the result.)

Now, each of these approaches has merit. What this discussion breaks
down to is that a language, like OVAL, better captures nuances and
allows for more flexibility.
An enumeration (e.g., CWE, CVE) is more precise, requires more
"well-defined" boundaries, and is better for computers.

The CEE Taxonomy problem can be solved with either a language or an
enumeration-based approach. With past standardization efforts, MITRE has
used use-cases as the primary driver. With CVE, the primary use was the
differentiation of vulnerabilities, for which an enumeration works
really well. For OVAL, there are too many different ways of
validation/verification across platforms.


Just something to put some thought into over the holiday.
I am interested in hearing any feedback from this group as to which you
think is more appropriate for expressing the type of log.


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by heinbockel
GOAL: To be able to unambiguously group logs based upon the representative event type.


My 2c:

One of the great benefits we get from a multi-level taxonomy (which in Sentinel we've been applying for years by dint of lots of manual effort applied to the chaotic log messages we get from a wide variety of vendors) is the ability to filter and group things easily.  So for example I can say "show me all account management activity (i.e. 0.0.0.* in the current proposed XDAS taxonomy)" and "show me all data access activity (i.e. 0.0.7.*)".  If I only want file/table writes, then I can get more specific "show me only data writes (0.0.7.5)".

Whether this hierarchic taxonomy is expressed as dotted numbers or as words is probably unimportant; with XDAS we went with numbers because they are more compact and can be matched/filtered more easily (and when processing thousands of events per second this becomes important). But there's a one-to-one correspondence with verbs built into the taxonomy already, so conversion is trivial if you want to read the logs manually.

The point being, based on our experience we feel that a true hierarchic taxonomy is critical to a proper event standard.
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

heinbockel

>-----Original Message-----
>From: David Corlette
>[mailto:[hidden email]]
>Sent: Wednesday, 02 July 2008 15:34
>To: cee-discussion-list CEE-Related Discussion
>Subject: Re: [CEE-DISCUSSION-LIST] CEE
>Taxonomy: Enumeration or Language?
>
>GOAL: To be able to unambiguously group logs
>based upon the representative event type.
>
>
>My 2c:
>
>One of the great benefits we get from a multi-
>level taxonomy (which in Sentinel we've been
>applying for years by dint of lots of manual
>effort applied to the chaotic log messages we
>get from a wide variety of vendors) is the
>ability to filter and group things easily.  So
>for example I can say "show me all account
>management activity (i.e. 0.0.0.* in the
>current proposed XDAS taxonomy)" and "show me
>all data access activity (i.e. 0.0.7.*)".  If I
>only want file/table writes, then I can get
>more specific "show me only data writes
>(0.0.7.5)".
>
>Whether this hierarchic taxonomy is expressed
>as dotted numbers or as words is probably
>unimportant; with XDAS we went with numbers
>because they are more compact and can be
>matched/filtered more easily (and when
>processing thousands of events per second this
>becomes important). But there's a one-to-one
>correspondence with verbs built into the
>taxonomy already, so conversion is trivial if
>you want to read the logs manually.
Actually, I am not convinced there is a one-to-one
correspondence with the language verbs.

For example take something simple like
authentication.
Is it important to distinguish in the taxonomy the
difference between a user authenticating to an
operating system, a web service, or su/sudo? What
about
things like SSO, where an application
authenticates
on your behalf?


>
>The point being, based on our experience we
>feel that a true hierarchic taxonomy is
>critical to a proper event standard.

The problem here is that any "true hierarchic"
taxonomy
is based on a single use case. With enumerations,
it is
easy to create a hierarchy. By strictly limiting
the scope
to security audit, the case can be made to support
a
security audit taxonomy.


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Tina Bird
Sanford's response incorporated much of what I was going to say. I strongly
prefer a language-based approach to an enumerative approach, primarily
because as a system administrator, I don't want to be dependent on some sort
of translator that takes me from a numeric event to words (a la SNMP).
Current log messages are a mess, as far as useability goes, but I'd still
rather see "login failed" (even without a failure reason, or a host IP
address) than anything that looks like an OID.

> For example take something simple like
> authentication.
> Is it important to distinguish in the taxonomy the
> difference between a user authenticating to an
> operating system, a web service, or su/sudo? What
> about
> things like SSO, where an application
> authenticates
> on your behalf?

It's both simpler *and* more complex than this (although it's a complexity I
think we can clarify). From the subject-verb-object point of view that Bill
described originally, the user is the *object*, not the subject. The user
provides credentials, but in each case, the *subject* (ie. the entity that
is causing the log message to be created) is whichever application or system
process the user is trying to access: login on a UNIX box, the Local
Security Authority on a Windows system, or an application like sudo or a
database. So each of these authentication examples boils down to

subject: system process or application requiring authentication
verb: auth succeeded or auth failed
object: user (or process or application) from which authentication is
required

This is good, because it unifies how we format and interpret all
authentication events, no matter which entities (users, processes, apps) are
involved.

It's complex because it's unintuitive, as demonstrated by the familiar
phrase "the user authenticating" (which implies that the user is the subject
of the action, which is not true) or even worse, "log on to our Web site"
(for a site that does not require authentication), which usually means "use
your Web browser to access our nifty content" and has nothing to do with
authentication whatsoever.

Using a subject-verb-object model will require us to provide very clear and
specific instructions on how to *identify* the subject and object correctly,
but that's Merely a Matter of Documentation :-) I'm especially fond of it
because using this format strongly encourages programmers, analysts and
whoever else ends up influencing logging to think rigorously about the
"workflow" for the given event. [I was going to say "forces" rather than
"strongly encourages," but then common sense kicked in.]

> The problem here is that any "true hierarchic"
> taxonomy
> is based on a single use case. With enumerations,
> it is
> easy to create a hierarchy. By strictly limiting
> the scope
> to security audit, the case can be made to support
> a
> security audit taxonomy.

I think I agree with this, if I understand what Bill is saying. Defining a
single hierarchy that will incorporate all the various types of logs out
there seems, uh, implausible. But for particular situations -- credit card
transactional data, user management, system and application updates -- we
can probably provide meaningful hierarchies.

cheers -- tbird
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Tina Bird
One small correction:

> So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
required

It's more precise to say that the subject is the process or application
requesting or mediating the authentication...
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by Tina Bird
> Current log messages are a mess, as far as useability goes, but I'd still
> rather see "login failed" (even without a failure reason, or a host IP
> address) than anything that looks like an OID.

I think it's important to note that this is only one particular use case.  Most of our customers are far more interested in automated processing and analysis than in spending their time poring through logs manually (although of course the latter is necessary on occasion).

>> Is it important to distinguish in the taxonomy the
>> difference between a user authenticating to an
>> operating system, a web service

I say no, the action is still an authentication, but the "target" is different.


>> su/sudo?

We call this privilege escalation, which is fundamentally different, although maybe "escalation" makes too many assumptions.


>> What
>> about
>> things like SSO, where an application
>> authenticates
>> on your behalf?

Still authentication. The act of creating an authorized session is a separate event in my opinion.


> It's both simpler *and* more complex than this (although it's a complexity I
> think we can clarify). From the subject-verb-object point of view that Bill
> described originally, the user is the *object*, not the subject. The user
> provides credentials, but in each case, the *subject* (ie. the entity that
> is causing the log message to be created) is whichever application or system
> process the user is trying to access: login on a UNIX box, the Local
> Security Authority on a Windows system, or an application like sudo or a
> database. So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
> required

The semantic confusion caused by the above we find to be disheartening. This is a separate topic than taxonomy, but XDAS defines three different "objects" that relate to any event:

Initiator: The user, service, and/or system that *causes* an event to occur
Target: The user, service, system, trust, or data object that is *affected* by an event
Observer: The service or system that *detected* the event and generates a log message reporting that fact

The above covers the authentication case easily even if there's a third-party authentication engine involved and if the event is actually reported by an IDS or similar system.


> It's complex because it's unintuitive, as demonstrated by the familiar
> phrase "the user authenticating" (which implies that the user is the subject
> of the action, which is not true) or even worse, "log on to our Web site"
> (for a site that does not require authentication), which usually means "use
> your Web browser to access our nifty content" and has nothing to do with
> authentication whatsoever.

I don't find this one particularly complex if put in the context of the above structure. If an actual person is authenticating, then that person is the Initiator, but of course real people aren't really represented in computerese (except in Identity Management systems, which we can discuss later).  It should be obvious that that real person is attempting to access an account, which is therefore the Target of the event.

I think the confusion comes about simply because people use the term "user" loosely to represent the actual person, the account, and so forth. If we simply define our terms carefully, the confusion disappears.


> Using a subject-verb-object model will require us to provide very clear and
> specific instructions on how to *identify* the subject and object correctly,
> but that's Merely a Matter of Documentation :-)

I agree with this exactly, I believe we're saying the same thing.  So it just becomes a matter of definition.


>> The problem here is that any "true hierarchic"
>> taxonomy
>> is based on a single use case. With enumerations,
>> it is
>> easy to create a hierarchy. By strictly limiting
>> the scope
>> to security audit, the case can be made to support
>> a
>> security audit taxonomy.
>
> I think I agree with this, if I understand what Bill is saying. Defining a
> single hierarchy that will incorporate all the various types of logs out
> there seems, uh, implausible. But for particular situations -- credit card
> transactional data, user management, system and application updates -- we
> can probably provide meaningful hierarchies.

I think I agree with this too, but the way this is stated sounds a bit limiting. What we have found in working with and taxonomizing event logs from many vendors over many years is that *most*  events fall into pretty obvious taxonomic categories that are very useful for a quite broad set of use cases including most forensic analysis, enterprise reporting, compliance, etc etc.  In many cases the events that "don't fit" are really just a different viewpoint from the vendor - they could easily be rewritten into an equally valid form that would conform to an event standard.

On the other hand, there are always use cases where the events that are produced really don't quite fit the model that we've set up.  In those cases I think it might be perfectly valid to simply come up with a different model.  So for example imagine we have three event models:

1) One model that expresses interactions between domain objects using the Initiator, Target, Observer, Action described above, e.g. XDAS
  - this would cover the vast majority of compliance, enterprise reporting, and most forensic use cases

2) Another model that covers "current state" events, i.e. how much bandwidth, disk space, what state a variable is in, etc
  - This would cover many operational use cases where you want to track statistics and such

3) A third model which covers "debug" events - e.g. stack traces and component failures and that sort of thing
  - This would be more for debug and deep forensic analaysis

I don't see a reason why three very simple models couldn't cover virtually all the use cases we are aware of and be flexible enough to adapt to new ones.  But I think trying to force-fit one model from the above list into one of the other models may be tricky, which is what we seem to be trying to do.  If the models follow the same basic expressive structure, a simple flag could tell us which model was in use, and therefore how to parse it (and therefore the overarching model could be extensible).  Finally, the transport and other recommendations that are being defined in CEE could simply say "what you transport over this mechanism is one of the defined event models" (where the definition of those models comes from XDAS and possibly other standards).  So CEE becomes "IP" and XDAS, debug, etc become "UDP", "TCP", etc  ;-)


The overall message here is that what I saw at the SIG was mostly people raising exceptions that wouldn't fit in the proposed models.  To which I say, let's make the model simpler, but have more models (within a common framework), rather than making a ridiculously complex model that no one can understand.

Thoughts?
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

John Calcote-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Corlette wrote:

> ...
> The above covers the authentication case easily even if there's a
> third-party authentication engine involved and if the event is
> actually reported by an IDS or similar system.
>
>> It's complex because it's unintuitive, as demonstrated by the
>> familiar phrase "the user authenticating" (which implies that the
>> user is the subject of the action, which is not true) or even worse,
>> "log on to our Web site" (for a site that does not require
>> authentication), which usually means "use your Web browser to access
>> our nifty content" and has nothing to do with authentication
>> whatsoever.
>> ...

It seems to me there are a couple of desirable goals being discussed here:

1) Administrators like to read human language based log messages.
2) Taxonomy must be well-defined so there's no confusion about what is
meant by any given message.

But on further analysis, these two goals are VERY difficult to
reconcile. Human language introduces ambiguity -- almost by definition
- -- but numeric or enumerated (whether hierarchical or flat) taxonomy is
difficult to read by an administrator, without the aid of translation tools.

Now, let's look at our motivation for these two goals:

1) Why use text-based values? Because we want to be able to look at a
log without the aid of translation tools. Are there any other reasons? I
can't really conceive of them.

2) Why define a standardized taxonomy? Because we want the entire world
to understand that a given event is a proxy-authentication-by-a-service
event, regardless of the event source.

XDAS (v2.0) chose the hierarchical approach because it's very flexible,
allowing entire sub-taxonomies to be inserted where required.
Additionally, as David mentioned, it's very easy to parse. Given the
explosive growth in volume of events to be processed that we will no
doubt see in the near future (we've already seen some of this growth),
this is VERY important for analysis system scalability. And that
explosive growth creates the very need to target support for such
automated analysis systems.

Another important benefit of a hierarchical taxonomy is that it allows
for future refinement. We can do our best to define a "global" taxonomy,
but we will always make someone angry at our lack of foresight, with
respect to their particular use cases. I picture it as an ongoing
incremental process, allowing individual organizations to introduce
corrections into their own systems by hooking into the existing standard
taxonomy at reasonable points, and then approaching future standards
committee meetings with these changes when they feel they've matured
enough to be accepted by the community.

To use the example already mentioned in this thread, suppose we provide
a standard "authentication" event, and later (probably shortly after the
standard is released), some group feels that there really should be
various sub-types of authentication. They define those sub-types as a
sub-hierarchy beneath the existing (now more generic) authentication event.

As with other ongoing standards processes, the committee members would
then review the proposed additions, modify them so they are more
palatable to the community, and then amend the existing standard with
updates that include a new sub-hierarchy beneath the existing
authentication event type. This is a well-understood -- and more
importantly -- a well-accepted methodology.

Existing analysis software continues to work (albeit, with less
event-type granularity for authentication events), but newer software
can then take advantage of the newer authentication event sub-classes.

Now, all of that said, I fully realize that CEE was founded on the very
key concept of language-based event logging. It's a really neat idea --
*IF* it can be done efficiently. Such a standard would necessarily have
to include VERY specific rules for how human language is interpreted by
log analysis engines. The amount of processing power required to
*properly* analyze such a log file would be tremendous.

And to what end? So administrators don't have to use a translation tool
when glancing at a log file, which is quite frankly a secondary form of
analysis in today's world.

For heaven's sake! A high-school student could write a filter in a
half-hour in bourne shell script (using sed or awk) that would convert
OIDs to human-readable text - using his favorite vernacular, no less!

$ cat event.log | oids2text | more

John

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
=eL+N
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Sanford Whitehouse
The root of the taxonomy is a message level schema.  It's a very simple
schema.  It presumes nothing about use.  Only describes the event.  Any
other use, above this root level, is the classification/categorization
aspect of the taxonomy.

The first level might look like ...
        action=login object=user
        action=login object=service

(Note:  this is not to suggest the form or names at this level.  It's
only an example for the language/enumeration discussion).

The classification level looks at all events with the same taxon values
or values that share a similar characteristic.  "login" might be
classified as an Authentication event.  Any Kerberos tickets and file
access controls can be put into an Authorization event classification.

Categorization, by my definition (wholly incorrect) is anything someone
judges to fit within a group.  Risk management might use system
monitoring and badge use at building entrances to be the same thing.
It's up to them.

On to the language part.

At the root level, the action/object and other terms can be enumerations
or words.  Ultimately, enumerations have language equivalents.
Interpretation applies to both, with all the risks of transition and
existing localized definitions.  The method to address it is the same
other professions use; a simple, unambiguous definition with little or
no overlap.

This has other issues, such as architectural or domain common use.  A
port for a shipping management system and TCP/IP port use the same term.
(This also touches a number of other issues, such as the desired event
logging level.)

Anyway, the word and the numeration are synonyms.  Words are easy to
read.

Sanford



 

-----Original Message-----
From: John Calcote [mailto:[hidden email]]
Sent: Wednesday, July 02, 2008 5:23 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or
Language?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Corlette wrote:

> ...
> The above covers the authentication case easily even if there's a
> third-party authentication engine involved and if the event is
> actually reported by an IDS or similar system.
>
>> It's complex because it's unintuitive, as demonstrated by the
>> familiar phrase "the user authenticating" (which implies that the
>> user is the subject of the action, which is not true) or even worse,
>> "log on to our Web site" (for a site that does not require
>> authentication), which usually means "use your Web browser to access
>> our nifty content" and has nothing to do with authentication
>> whatsoever.
>> ...

It seems to me there are a couple of desirable goals being discussed
here:

1) Administrators like to read human language based log messages.
2) Taxonomy must be well-defined so there's no confusion about what is
meant by any given message.

But on further analysis, these two goals are VERY difficult to
reconcile. Human language introduces ambiguity -- almost by definition
- -- but numeric or enumerated (whether hierarchical or flat) taxonomy
is difficult to read by an administrator, without the aid of translation
tools.

Now, let's look at our motivation for these two goals:

1) Why use text-based values? Because we want to be able to look at a
log without the aid of translation tools. Are there any other reasons? I
can't really conceive of them.

2) Why define a standardized taxonomy? Because we want the entire world
to understand that a given event is a proxy-authentication-by-a-service
event, regardless of the event source.

XDAS (v2.0) chose the hierarchical approach because it's very flexible,
allowing entire sub-taxonomies to be inserted where required.
Additionally, as David mentioned, it's very easy to parse. Given the
explosive growth in volume of events to be processed that we will no
doubt see in the near future (we've already seen some of this growth),
this is VERY important for analysis system scalability. And that
explosive growth creates the very need to target support for such
automated analysis systems.

Another important benefit of a hierarchical taxonomy is that it allows
for future refinement. We can do our best to define a "global" taxonomy,
but we will always make someone angry at our lack of foresight, with
respect to their particular use cases. I picture it as an ongoing
incremental process, allowing individual organizations to introduce
corrections into their own systems by hooking into the existing standard
taxonomy at reasonable points, and then approaching future standards
committee meetings with these changes when they feel they've matured
enough to be accepted by the community.

To use the example already mentioned in this thread, suppose we provide
a standard "authentication" event, and later (probably shortly after the
standard is released), some group feels that there really should be
various sub-types of authentication. They define those sub-types as a
sub-hierarchy beneath the existing (now more generic) authentication
event.

As with other ongoing standards processes, the committee members would
then review the proposed additions, modify them so they are more
palatable to the community, and then amend the existing standard with
updates that include a new sub-hierarchy beneath the existing
authentication event type. This is a well-understood -- and more
importantly -- a well-accepted methodology.

Existing analysis software continues to work (albeit, with less
event-type granularity for authentication events), but newer software
can then take advantage of the newer authentication event sub-classes.

Now, all of that said, I fully realize that CEE was founded on the very
key concept of language-based event logging. It's a really neat idea --
*IF* it can be done efficiently. Such a standard would necessarily have
to include VERY specific rules for how human language is interpreted by
log analysis engines. The amount of processing power required to
*properly* analyze such a log file would be tremendous.

And to what end? So administrators don't have to use a translation tool
when glancing at a log file, which is quite frankly a secondary form of
analysis in today's world.

For heaven's sake! A high-school student could write a filter in a
half-hour in bourne shell script (using sed or awk) that would convert
OIDs to human-readable text - using his favorite vernacular, no less!

$ cat event.log | oids2text | more

John

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
=eL+N
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by John Calcote-2
Actually I'm not sure we have a problem here at all.  Part of what we've discussed doing with XDAS is defining a number of expressive formats, with well-defined translations between them.

We were thinking about this in the context of JSON, XML, field, delimited, binary, and so forth, with the idea that:

...<initiator><account><name>dcorlette</name><id>130</id><domain>AD-DOMAIN</domain></account></initiator>...

is exactly equivalent to:

...{ Initiator: { account: { name: "dcorlette", id: 130, domain: "AD-DOMAIN" } } }...

is exactly equivalent to:

...|INIT|dcorlette|130|AD-DOMAIN|...


BUT - there's nothing to prevent us from defining equivalencies between compact and readable versions of certain normalized *data* fields too, so in the compact form you use:

...|ACTION|0.0.8.1|...

and in the verbose version you use:

...{ Action: { Taxonomy: "OpenGroup.XDAS.System.Shutdown" } }...


Since these translations would be pre-defined in the standard itself, one would imagine that most tools would provide trivial methods to convert between them - tools like what John mentioned or simply different "representations" depending on whether it's displayed, on the wire, automatically processed, stored in a DB, or stored in a text file.



>>> On Wed, Jul 2, 2008 at  8:23 PM, in message <[hidden email]>, John
Calcote <[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David Corlette wrote:
>> ...
>> The above covers the authentication case easily even if there's a
>> third-party authentication engine involved and if the event is
>> actually reported by an IDS or similar system.
>>
>>> It's complex because it's unintuitive, as demonstrated by the
>>> familiar phrase "the user authenticating" (which implies that the
>>> user is the subject of the action, which is not true) or even worse,
>>> "log on to our Web site" (for a site that does not require
>>> authentication), which usually means "use your Web browser to access
>>> our nifty content" and has nothing to do with authentication
>>> whatsoever.
>>> ...
>
> It seems to me there are a couple of desirable goals being discussed here:
>
> 1) Administrators like to read human language based log messages.
> 2) Taxonomy must be well-defined so there's no confusion about what is
> meant by any given message.
>
> But on further analysis, these two goals are VERY difficult to
> reconcile. Human language introduces ambiguity -- almost by definition
> - -- but numeric or enumerated (whether hierarchical or flat) taxonomy is
> difficult to read by an administrator, without the aid of translation tools.
>
> Now, let's look at our motivation for these two goals:
>
> 1) Why use text-based values? Because we want to be able to look at a
> log without the aid of translation tools. Are there any other reasons? I
> can't really conceive of them.
>
> 2) Why define a standardized taxonomy? Because we want the entire world
> to understand that a given event is a proxy-authentication-by-a-service
> event, regardless of the event source.
>
> XDAS (v2.0) chose the hierarchical approach because it's very flexible,
> allowing entire sub-taxonomies to be inserted where required.
> Additionally, as David mentioned, it's very easy to parse. Given the
> explosive growth in volume of events to be processed that we will no
> doubt see in the near future (we've already seen some of this growth),
> this is VERY important for analysis system scalability. And that
> explosive growth creates the very need to target support for such
> automated analysis systems.
>
> Another important benefit of a hierarchical taxonomy is that it allows
> for future refinement. We can do our best to define a "global" taxonomy,
> but we will always make someone angry at our lack of foresight, with
> respect to their particular use cases. I picture it as an ongoing
> incremental process, allowing individual organizations to introduce
> corrections into their own systems by hooking into the existing standard
> taxonomy at reasonable points, and then approaching future standards
> committee meetings with these changes when they feel they've matured
> enough to be accepted by the community.
>
> To use the example already mentioned in this thread, suppose we provide
> a standard "authentication" event, and later (probably shortly after the
> standard is released), some group feels that there really should be
> various sub-types of authentication. They define those sub-types as a
> sub-hierarchy beneath the existing (now more generic) authentication event.
>
> As with other ongoing standards processes, the committee members would
> then review the proposed additions, modify them so they are more
> palatable to the community, and then amend the existing standard with
> updates that include a new sub-hierarchy beneath the existing
> authentication event type. This is a well-understood -- and more
> importantly -- a well-accepted methodology.
>
> Existing analysis software continues to work (albeit, with less
> event-type granularity for authentication events), but newer software
> can then take advantage of the newer authentication event sub-classes.
>
> Now, all of that said, I fully realize that CEE was founded on the very
> key concept of language-based event logging. It's a really neat idea --
> *IF* it can be done efficiently. Such a standard would necessarily have
> to include VERY specific rules for how human language is interpreted by
> log analysis engines. The amount of processing power required to
> *properly* analyze such a log file would be tremendous.
>
> And to what end? So administrators don't have to use a translation tool
> when glancing at a log file, which is quite frankly a secondary form of
> analysis in today's world.
>
> For heaven's sake! A high-school student could write a filter in a
> half-hour in bourne shell script (using sed or awk) that would convert
> OIDs to human-readable text - using his favorite vernacular, no less!
>
> $ cat event.log | oids2text | more
>
> John
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
> xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
> =eL+N
> -----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Eric Fitzgerald
In reply to this post by Tina Bird
I agree with you Tina.

In general I would like to avoid the unqualified use of the term "user" in events.

I have identified several different use cases for a "User" in an event record:

Subject/Actor
  * primary [e.g. user account identity associated with a running process]
  * impersonated/on-behalf-of [e.g. on whose behalf a task is being performed]
  * caller-of-interface
Object/Target
  * user as account object
  * user as object of authentication/logon



-----Original Message-----
From: Tina Bird [mailto:[hidden email]]
Sent: Wednesday, July 02, 2008 2:15 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?

One small correction:

> So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
required

It's more precise to say that the subject is the process or application
requesting or mediating the authentication...
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

heinbockel
In reply to this post by John Calcote-2

After having this discussion with several
coworkers,
I would like focus this conversion to two points.


Should CEE support:

1. Hierarchical/structured taxonomies

My opinion:
     Any structure implied to data is based on a
particular
     view or use case. There are definite benefits
to this
     as both Dave and John have pointed out.
However, without
     narrowing the scope of logs & CEE I do not
see how a
     all event types can be represented in a
universally
     applicable hierarchy, nor do I see any value
in defining
     multiple hierarchies.



2. Enumerated taxonomy choices

My opinion:
     I think that CEE needs to provide some
direction as to
     the words (subject, object, action) people
should choose.
     Too many words are overloaded ('user',
'logon') and others
     have many synonyms ('logon', 'login',
'authentication',
     'password accepted', etc.).
     I don't think it matters whether these are
expressed in
     word lists or numbers/indices -- maybe this
is a syntax-level
     declaration.

     However, what I do think is important, is
that there be:
     (1) a way to express event types not defined
within the current
     'official' taxonomy, and
     (2) a way to express specific details/names
relating to each word
     choice (e.g., 'account' == Joe, 'file' ==
/etc/passwd)


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Tina Bird
Nicely worded, Bill. I concur. A couple of minor comments below:

> Should CEE support:
>
> 1. Hierarchical/structured taxonomies
>
> My opinion:
>      Any structure implied to data is based on a
> particular
>      view or use case. There are definite benefits
> to this
>      as both Dave and John have pointed out.
> However, without
>      narrowing the scope of logs & CEE I do not
> see how a
>      all event types can be represented in a
> universally
>      applicable hierarchy, nor do I see any value
> in defining
>      multiple hierarchies.

Perhaps what CEE should incorporate is a method for building these
hierarchies, to maximize the chances that users (and vendors) will create
hierarchies that other organizations would be able to use?

> 2. Enumerated taxonomy choices
>
> My opinion:
>      I think that CEE needs to provide some
> direction as to
>      the words (subject, object, action) people
> should choose.
>      Too many words are overloaded ('user',
> 'logon') and others
>      have many synonyms ('logon', 'login',
> 'authentication',
>      'password accepted', etc.).
>      I don't think it matters whether these are
> expressed in
>      word lists or numbers/indices -- maybe this
> is a syntax-level
>      declaration.

I agree.

>      However, what I do think is important, is
> that there be:
>      (1) a way to express event types not defined
> within the current
>      'official' taxonomy, and
>      (2) a way to express specific details/names
> relating to each word
>      choice (e.g., 'account' == Joe, 'file' ==
> /etc/passwd)

Since there's no way we'll be able to capture every kind of data that all
the humans, machines and software out there are likely to use, we *have* to
have a defined mechanism for defining types locally, whether or not they
would ever be added to the "official" list; as well as having some kind of
mechanism for nominating new event types to the official ones.

Item 2 suggests a specific "structure" that identifies attribute/value pairs
within a given dataset? That makes sense to me.

cheers -- tbird
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Tina Bird
In reply to this post by David Corlette
> The semantic confusion caused by the above we find to be
> disheartening. This is a separate topic than taxonomy, but
> XDAS defines three different "objects" that relate to any event:
>
> Initiator: The user, service, and/or system that *causes* an
> event to occur
> Target: The user, service, system, trust, or data object that
> is *affected* by an event
> Observer: The service or system that *detected* the event and
> generates a log message reporting that fact

> I don't find this one particularly complex if put in the
> context of the above structure. If an actual person is
> authenticating, then that person is the Initiator, but of
> course real people aren't really represented in computerese
> (except in Identity Management systems, which we can discuss
> later).  It should be obvious that that real person is
> attempting to access an account, which is therefore the
> Target of the event.

As you say below, I think we're mostly in violent agreement here :-) If I
understand your definitions above, in most cases, the "observer" will be the
process or app that is performing the action, which it then records, right?
Are you aware of any counter-examples?

> I think the confusion comes about simply because people use
> the term "user" loosely to represent the actual person, the
> account, and so forth. If we simply define our terms
> carefully, the confusion disappears.

It's ALL about defining the terms :-)

> I think I agree with this too, but the way this is stated
> sounds a bit limiting. What we have found in working with and
> taxonomizing event logs from many vendors over many years is
> that *most*  events fall into pretty obvious taxonomic
> categories that are very useful for a quite broad set of use
> cases including most forensic analysis, enterprise reporting,
> compliance, etc etc.  In many cases the events that "don't
> fit" are really just a different viewpoint from the vendor -
> they could easily be rewritten into an equally valid form
> that would conform to an event standard.

Or to put it another way, the vast majority of events recorded within one
infrastructure will be the same (or similar enough to be counted as the same
type of event) in another infrastructure roughly composed of the same types
of systems.

I no longer have access to the data for which I can validate that statement
on "today's" enterprise network, but I did a rough statistical analysis 7
years ago (when I was at Counterpane) that showed that once you had the
"right" (read: a sufficient sample of events from the most widely deployed
devices and apps, in an enterprise sense) set of event types, you would be
able to parse 85-90% of all log data on any new network you started
monitoring; and that except for really peculiar cases (there are always a
few), you could achieve 95+% coverage in a new environment by writing no
more than 10 new signatures/filters/regexes.

> On the other hand, there are always use cases where the
> events that are produced really don't quite fit the model
> that we've set up.  In those cases I think it might be
> perfectly valid to simply come up with a different model.  So
> for example imagine we have three event models:
>
> 1) One model that expresses interactions between domain
> objects using the Initiator, Target, Observer, Action
> described above, e.g. XDAS
>   - this would cover the vast majority of compliance,
> enterprise reporting, and most forensic use cases
>
> 2) Another model that covers "current state" events, i.e. how
> much bandwidth, disk space, what state a variable is in, etc
>   - This would cover many operational use cases where you
> want to track statistics and such
>
> 3) A third model which covers "debug" events - e.g. stack
> traces and component failures and that sort of thing
>   - This would be more for debug and deep forensic analaysis
>
> I don't see a reason why three very simple models couldn't
> cover virtually all the use cases we are aware of and be
> flexible enough to adapt to new ones.  But I think trying to
> force-fit one model from the above list into one of the other
> models may be tricky, which is what we seem to be trying to
> do.  If the models follow the same basic expressive
> structure, a simple flag could tell us which model was in
> use, and therefore how to parse it (and therefore the
> overarching model could be extensible).  Finally, the
> transport and other recommendations that are being defined in
> CEE could simply say "what you transport over this mechanism
> is one of the defined event models" (where the definition of
> those models comes from XDAS and possibly other standards).  
> So CEE becomes "IP" and XDAS, debug, etc become "UDP", "TCP", etc  ;-)

I concur.

> The overall message here is that what I saw at the SIG was
> mostly people raising exceptions that wouldn't fit in the
> proposed models.  To which I say, let's make the model
> simpler, but have more models (within a common framework),
> rather than making a ridiculously complex model that no one
> can understand.
>
> Thoughts?

I think I would greatly prefer to be Copernicus (or Occam) than Ptolemy,
even if we can't aspire to Newtonian levels of accuracy ;-) Simple and
elegant is definitely preferable to ridiculously complex.

t.
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Tina Bird
In reply to this post by John Calcote-2
 

> And to what end? So administrators don't have to use a
> translation tool
> when glancing at a log file, which is quite frankly a
> secondary form of
> analysis in today's world.
>
> For heaven's sake! A high-school student could write a filter in a
> half-hour in bourne shell script (using sed or awk) that would convert
> OIDs to human-readable text - using his favorite vernacular, no less!
>
> $ cat event.log | oids2text | more

Ah, but as has been said, you're considering only the analysis use case. If
I'm trying to trouble-shoot a mission critical system and, for some
completely inexplicable reason, none of the sys admins who preceded me in my
job ever bothered to look at their logfiles (inconceivable!) -- and
therefore never bothered to write the script (or document the dictionaries
that relate "OID" digits to words -- the inability to read the data without
having to write a script is a significant barrier. Especially if it's 3am.

I think we can do both, and improve the situation for both the analysis
tools *and* the people who are doing real-time mainenance and
trouble-shooting. "Merely" standardizing an event format for authenticating
a user to a system would be a huge first step (the format of Microsoft's
logs for the Internet Authentication Server, its RADIUS implementation,
might as well be written in Klingon). I don't think we are going to be able
to avoid creating "approved vocabulary"; at that point, assigning numeric
identifiers to attributes is pretty easy.

t.
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by heinbockel
Hi guys,

Some thoughts inline:

> Should CEE support:
>
> 1. Hierarchical/structured taxonomies
>
> My opinion:
>      Any structure implied to data is based on a particular view or use case. There are definite benefits to this as both Dave and John have pointed out.
> However, without narrowing the scope of logs & CEE I do not see how all event types can be represented in a universally applicable hierarchy, nor do I see any value in
> defining multiple hierarchies.

Uh...
Logical problem here: at some point we need to define what is in scope and what is not in scope. This should be driven by what we define as the important use cases. It's not possible to say "we won't restrict what we can express" because then someone will come along and say "express this car."  Hard to do with bits and bytes.

For example, I'm pretty sure we can all agree that we have no interest in expressing the entire contents of a core dump.

The point being, there are definitely things that are in scope and things that are out of scope.  There's been a lot of talk in the abstract worrying about things we can't express - can we have some examples?  What are the use cases where people (or automated systems) would need to know those things?  

Let's *start* with the use cases and then move to how we will implement them.  So far the only examples given have been easily expressible in terms of the current XDAS schema and taxonomy, although there's definitely room for improvement there.

BTW, to say that any structure on data is based on a particular view or use case I don't think is true.  Natural languages, for example, are definitely structured, and yet I wouldn't say that there's only one or even a small set of particular "use cases" that define the structure of the English language.

> 2. Enumerated taxonomy choices
>
> My opinion:
>      I think that CEE needs to provide some direction as to the words (subject, object, action) people should choose.
>      Too many words are overloaded ('user', 'logon') and others have many synonyms ('logon', 'login', 'authentication', 'password accepted', etc.).  I don't think it matters whether these are
> expressed in word lists or numbers/indices -- maybe this is a syntax-level  declaration.
>
>      However, what I do think is important, is that there be:
>      (1) a way to express event types not defined within the current 'official' taxonomy, and
>      (2) a way to express specific details/names relating to each word choice (e.g., 'account' == Joe, 'file' == /etc/passwd)

I think basically I agree with this, although I think we're confusing taxonomy (classification) and schema (or syntax/expression) here.

I would say:
1) The base taxonomy needs to cover 80%+ of the use cases that we define as essential, e.g. >80% of the events we want to express based on the use cases should be classifiable in the base taxonomy
2) The taxonomy should further be extensible to cover other types of events that we choose to leave out of the base, or to handle other, unspecified use cases.
3) The taxonomy should unambiguously define the meaning of the terms or numbers it uses to classify things. So for example we might have:
 "Authentication = the process of presenting a set of credentials to establish the identity of the requestor - typically username/password, but could also be certificates etc."

On a completely separate note, the concept of having a set of commonly-used words to identify data elements within the expressed event is very important.  It is also important, as Bill points out, that this syntax be extensible, but:
1) this has nothing to do with taxonomy, except insofar as the taxonomy might *imply* a structure to the data
2) within XDAS, we envision a "core" set of common tags that have defined meanings, and then a separate event section where vendors can put in additional details as name/value pairs with their own meanings.
Again, the intent is to cover 80% of the critical use cases, but to allow for extension for additional vendor- or domain-specific use cases.


To wrap this up, I think we're facing a problem here in that we're talking in very abstract terms and making claims about what is and what is not expressible without really having any concrete examples to draw upon.  I would suggest that we refocus our efforts on the use cases that we put together for the Burton SIG and work through them one by one, defining what requirements are implied by each use case in turn and then and only then moving to how we will implement those requirements.  Only if we find that certain use cases have conflicting requirements will we need to worry about separate hierarchies or different syntaxes or whatever.

How's that sound?
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by Tina Bird
>>> On Mon, Jul 7, 2008 at  1:11 PM, in message
<C9CEC0BEDC70446D907CA9C641DB8B7F@lindesfarne>, Tina Bird
<[hidden email]> wrote:

> Perhaps what CEE should incorporate is a method for building these
> hierarchies, to maximize the chances that users (and vendors) will create
> hierarchies that other organizations would be able to use?

We spoke about this at the SIG - The Open Group is ready and willing to set up a registry where such hierarchies could be reviewed and approved, at least for what's currently called XDAS, FWIW.  Of course MITRE also has a long history serving as a registry as well.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?

Tina Bird
In reply to this post by David Corlette
 

> To wrap this up, I think we're facing a problem here in that
> we're talking in very abstract terms and making claims about
> what is and what is not expressible without really having any
> concrete examples to draw upon.  I would suggest that we
> refocus our efforts on the use cases that we put together for
> the Burton SIG and work through them one by one, defining
> what requirements are implied by each use case in turn and
> then and only then moving to how we will implement those
> requirements.  Only if we find that certain use cases have
> conflicting requirements will we need to worry about separate
> hierarchies or different syntaxes or whatever.
>
> How's that sound?

I **completely** agree. And thank you for inserting some practicality into
the discussion.

Are the Burton SIG use cases on line somewhere? Oddly enough, I have strong
opinions about use cases...

cheers -- t.
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

David Corlette
In reply to this post by Tina Bird
>>> On Mon, Jul 7, 2008 at  1:23 PM, in message
<03046E3B67AC45C1B4976DFF24F48F01@lindesfarne>, Tina Bird
<[hidden email]> wrote:
>>
>> Initiator: The user, service, and/or system that *causes* an
>> event to occur
>> Target: The user, service, system, trust, or data object that
>> is *affected* by an event
>> Observer: The service or system that *detected* the event and
>> generates a log message reporting that fact

>  If I
> understand your definitions above, in most cases, the "observer" will be the
> process or app that is performing the action, which it then records, right?
> Are you aware of any counter-examples?

Yes, absolutely.  An IDS, for example, "observes" actions on other systems without performing any action itself.

 

> I no longer have access to the data for which I can validate that statement
> on "today's" enterprise network, but I did a rough statistical analysis 7
> years ago (when I was at Counterpane) that showed that once you had the
> "right" (read: a sufficient sample of events from the most widely deployed
> devices and apps, in an enterprise sense) set of event types, you would be
> able to parse 85-90% of all log data on any new network you started
> monitoring; and that except for really peculiar cases (there are always a
> few), you could achieve 95+% coverage in a new environment by writing no
> more than 10 new signatures/filters/regexes.

That's good news.  I think our initial target should be to cover 80% or so of the events necessary to support the most critical use cases - if your theory holds true we should be in good shape.

 
> I think I would greatly prefer to be Copernicus (or Occam) than Ptolemy,
> even if we can't aspire to Newtonian levels of accuracy ;-) Simple and
> elegant is definitely preferable to ridiculously complex.

Nicely stated.  As long as we're not being Nietzsche, I'm happy.
Reply | Threaded
Open this post in threaded view
|

Re: CEE Taxonomy: Enumeration or Language?

Sanford Whitehouse
Let's see if I can get this out succinctly.

First, there are a number of assumptions I'm not comfortable with.
Overstating it a bit, it seems like we're reaching for an uber standard
that covers all forms of logs and information.  There has been no break
down of the logging process, who is the stakeholder/customer at a given
stage, and the requirements necessary to do their job.  A systems
administrator doesn't care much about standards or compliance other than
the minimum necessary to keep his job.  The SA speaks the language of
the system/product.  The compliance manager doesn't give a hoot about
systems or networks as long as the information can be mapped to an
agnostic policy and set of reports.  Both are involved in the stream,
both have very different requirements.  They aren't the only players.
All can be satisfied.

Second, while a logging standard would benefit systems groups, network
groups, and security groups, the driver behind it is business.  Table
inserts and logins are important at a number of levels.  The key level
for business needs is 'who looked at the social security numbers,' 'did
someone print a check for a non-existent customer,' and 'are changes to
a system in accordance with change control and correct for the class of
system?'  This means examining logs in terms of the product, not the
system it's running on or the resources it uses.

Tina - The 95% number applies if system and security logs are the
targets.  The most problematic issues I've seen are products.  Financial
transactions, point-of-sale, HR products, and other products riding on
systems and networks.  One can track all kinds of activity in a
financial database and never know what the financial product is doing.
Products seldom use the system logging facility for their information.
They seldom have a log export capability.

The problem is multi-layered and complex.  It's not unsolvable.  

Wrt the taxonomy, it has several levels.  The first is defining a set of
characteristics, how to  identify them, and the common terms used for
both.  Teeth, fur, wings, a tail, and so on.  This level is the model.
The next is to determine how to group the characteristics in a
meaningful way.   What characteristics describe a bear, a shark, a
human, a duck.  The final is how to group them into a useful category.
All of the previous might be considered dangerous animals, by someone's
definition of dangerous.

The first level is what administrators and vendors are concerned about.
It requires a structure, actually a couple.  The second usually leads to
the third.  The two are what the compliance manager is interested in.

If that direction is correct, the implication is the first level users
need the messages in one form, the second in another.  That implies a
transition layer.  The transition layer is what everyone in log analysis
and management deals with.  It can't be shortcut.  It can be made
simpler.

Sanford

-----Original Message-----
From: David Corlette [mailto:[hidden email]]
Sent: Monday, July 07, 2008 11:11 AM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or
Language?

>>> On Mon, Jul 7, 2008 at  1:23 PM, in message
<03046E3B67AC45C1B4976DFF24F48F01@lindesfarne>, Tina Bird
<[hidden email]> wrote:
>>
>> Initiator: The user, service, and/or system that *causes* an event to

>> occur
>> Target: The user, service, system, trust, or data object that is
>> *affected* by an event
>> Observer: The service or system that *detected* the event and
>> generates a log message reporting that fact

>  If I
> understand your definitions above, in most cases, the "observer" will
> be the process or app that is performing the action, which it then
records, right?
> Are you aware of any counter-examples?

Yes, absolutely.  An IDS, for example, "observes" actions on other
systems without performing any action itself.

 

> I no longer have access to the data for which I can validate that
> statement on "today's" enterprise network, but I did a rough
> statistical analysis 7 years ago (when I was at Counterpane) that
> showed that once you had the "right" (read: a sufficient sample of
> events from the most widely deployed devices and apps, in an
> enterprise sense) set of event types, you would be able to parse
> 85-90% of all log data on any new network you started monitoring; and
> that except for really peculiar cases (there are always a few), you
> could achieve 95+% coverage in a new environment by writing no more
than 10 new signatures/filters/regexes.

That's good news.  I think our initial target should be to cover 80% or
so of the events necessary to support the most critical use cases - if
your theory holds true we should be in good shape.

 
> I think I would greatly prefer to be Copernicus (or Occam) than
> Ptolemy, even if we can't aspire to Newtonian levels of accuracy ;-)
> Simple and elegant is definitely preferable to ridiculously complex.

Nicely stated.  As long as we're not being Nietzsche, I'm happy.
12