Quantcast

Structured vs. Free-form Event Messages

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Structured vs. Free-form Event Messages

heinbockel
I have had this conversation many times over the past years, let me have it
once more just so I can get this monkey off of my shoulders.

Free-form event logs should not be the basis of any type of system-focused
event management. Heck, free-form text should only be minimally present in
event data. By "free-form", I mean any event message that uses natural
language to describe an event and uses no (or very minimal) parsable
structure. For example, it is much easier to parse and understand name-value
pairs, XML, CSV, or JSON than it is to have a computer understand this
email.

Why? Natural language processing is easy for humans, but hard for computers.
Conversely, structured data processing is trivial for computers and more
difficult for humans.


I argue that we should design events for the lowest common denominator, and
in this case that means we use structured event descriptions that are
optimized for machines.

The basis of my argument is two-fold:

1. Structured data can always be easily converted into a more free-form,
human readable message via message templates. Performing the opposite
conversion (from free-form to structured) is much more difficult and lossy
(PCRE anyone?). (I bet that most vendors already do this to some extent to
support internationalization.)

2. Humans use machines as the interface to event logs. Machines are used to
filter, select, correlate, aggregate, and perform other actions on events.
The average user will only see a small fraction of most recorded events. For
the more skilled user, Argument 1 still holds -- the events can be
translated into a consistent, more readable format.


So, why are we still recording events using free-form strings instead of
structured data?

My best answer is that strings are easier for developers to use and are the
default for most log libraries: Syslog, log4j, log4c, etc.



William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Burnes, James - NRCS, Fort Collins, CO
Bill Heinbockel said...

-----Original Message-----
From: Heinbockel, Bill [mailto:[hidden email]]
Sent: Wednesday, January 25, 2012 2:20 PM
To: [hidden email]
Subject: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages

I have had this conversation many times over the past years, let me have it
once more just so I can get this monkey off of my shoulders.

Free-form event logs should not be the basis of any type of system-focused
event management. Heck, free-form text should only be minimally present in
event data. By "free-form", I mean any event message that uses natural
language to describe an event and uses no (or very minimal) parsable
structure. For example, it is much easier to parse and understand name-value
pairs, XML, CSV, or JSON than it is to have a computer understand this
email.

---

I'm surprised you get asked this question.   I understand that backend log processing companies have to process arbitrary format data and make do as best they can.   But structuring the event at the programmer's side only takes a little more time, can be marked-up automatically in context, saves significant amounts of CPU and delivers actionable intelligence instead of something that needs an oracle and tea leaves to determine its meaning.

It's the standard entropy argument.  It's always more difficult to reconstruct something once the information has been destroyed (or never included at all).

On the flip side, some sort of plugin into common IDEs that minimizes the hassle of finding the correct dictionary name and type for event messages will aid adoption in the developer community.  In the same way that IDEs have smart editors that determine language syntax and class members as you type, why couldn't CEE have syntax helpers for Visual Studio and Eclipse?  Something to think about.

Regards,

Jim Burnes
Application Security Engineer
USDA/NRCS/ITC/Fort Collins
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Moehrke, John (GE Healthcare)
In reply to this post by heinbockel
Bill,

I agree. In Healthcare we have been promulgating the use of an XML schema we
defined in RFC-3881, and further refined in ISO-12052. The added advantage
is that the whole event is defined in one message, rather than having to
knit together the system startup event, with the login event, with the
object selection, with the object update event...

John

-----Original Message-----
From: Heinbockel, Bill [mailto:[hidden email]]
Sent: Wednesday, January 25, 2012 3:20 PM
To: [hidden email]
Subject: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages

I have had this conversation many times over the past years, let me have it
once more just so I can get this monkey off of my shoulders.

Free-form event logs should not be the basis of any type of system-focused
event management. Heck, free-form text should only be minimally present in
event data. By "free-form", I mean any event message that uses natural
language to describe an event and uses no (or very minimal) parsable
structure. For example, it is much easier to parse and understand name-value
pairs, XML, CSV, or JSON than it is to have a computer understand this
email.

Why? Natural language processing is easy for humans, but hard for computers.
Conversely, structured data processing is trivial for computers and more
difficult for humans.


I argue that we should design events for the lowest common denominator, and
in this case that means we use structured event descriptions that are
optimized for machines.

The basis of my argument is two-fold:

1. Structured data can always be easily converted into a more free-form,
human readable message via message templates. Performing the opposite
conversion (from free-form to structured) is much more difficult and lossy
(PCRE anyone?). (I bet that most vendors already do this to some extent to
support internationalization.)

2. Humans use machines as the interface to event logs. Machines are used to
filter, select, correlate, aggregate, and perform other actions on events.
The average user will only see a small fraction of most recorded events. For
the more skilled user, Argument 1 still holds -- the events can be
translated into a consistent, more readable format.


So, why are we still recording events using free-form strings instead of
structured data?

My best answer is that strings are easier for developers to use and are the
default for most log libraries: Syslog, log4j, log4c, etc.



William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
[hidden email]
781-271-2615


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Stephan Buys
In reply to this post by heinbockel
Hi Bill,

My 2c,

I think you're assertion that it is easier for developers and that it is the default for most log libraries definitely has merit. Your argument for structured data makes absolute sense, and I support it, but in reality a lot of logging takes place with basic architectures and tools. Until there is a robust, common, framework with widely adopted tools to parse all this structured data I think it will continue to be the status quo for a long time.

Natural language, free-form logs are easy. They are easy to code, easy to use during debugging and development and they are usable without the need for any special tools or software (all you need is some grep, sed, awk, notepad, etc). It might not be practical, or easy, or sensible, but it is feasible today.

What's more, tools are catching up with this problem and making it much less of an issue. We use a lot of Splunk in our day-to-day logging operations and it makes it really easy. Google doesn't require structured data to search the web, Splunk doesn't need structure to read logs. What's more, as you said "PCRE anyone?", tools like Splunk allow the consumer to only extract/normalise the absolute minimum of fields/data, and weave together transactions after the fact - with PCRE, but at a manageable scale. My own research is focussing on using natural language processing in event classification - it's early days but technology such as Apple's Siri show the way. In the security space some of my fellow students are making strides in creating extensive security ontologies, further facilitating NLP.  In summary I guess I'm of the mind that for now the problem will be solved by tools, and not by formats.

(Terrible illustrative attempt)
NoSQL > SQL
Syslog/FreeForm > SNMP
Google > Yahoo(Original Directory approach)

Kind regards,
Stephan Buys
South African Log Guy






On 25 Jan 2012, at 11:20 PM, Heinbockel, Bill wrote:

> I have had this conversation many times over the past years, let me have it
> once more just so I can get this monkey off of my shoulders.
>
> Free-form event logs should not be the basis of any type of system-focused
> event management. Heck, free-form text should only be minimally present in
> event data. By "free-form", I mean any event message that uses natural
> language to describe an event and uses no (or very minimal) parsable
> structure. For example, it is much easier to parse and understand name-value
> pairs, XML, CSV, or JSON than it is to have a computer understand this
> email.
>
> Why? Natural language processing is easy for humans, but hard for computers.
> Conversely, structured data processing is trivial for computers and more
> difficult for humans.
>
>
> I argue that we should design events for the lowest common denominator, and
> in this case that means we use structured event descriptions that are
> optimized for machines.
>
> The basis of my argument is two-fold:
>
> 1. Structured data can always be easily converted into a more free-form,
> human readable message via message templates. Performing the opposite
> conversion (from free-form to structured) is much more difficult and lossy
> (PCRE anyone?). (I bet that most vendors already do this to some extent to
> support internationalization.)
>
> 2. Humans use machines as the interface to event logs. Machines are used to
> filter, select, correlate, aggregate, and perform other actions on events.
> The average user will only see a small fraction of most recorded events. For
> the more skilled user, Argument 1 still holds -- the events can be
> translated into a consistent, more readable format.
>
>
> So, why are we still recording events using free-form strings instead of
> structured data?
>
> My best answer is that strings are easier for developers to use and are the
> default for most log libraries: Syslog, log4j, log4c, etc.
>
>
>
> William Heinbockel
> Infosec Engineer, Sr.
> The MITRE Corporation
> 202 Burlington Rd. MS S145
> Bedford, MA 01730
> [hidden email]
> 781-271-2615
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Burnes, James - NRCS, Fort Collins, CO
Steven Buys said:

>I think you're assertion that it is easier for developers and that it is the default for most log libraries definitely >has merit. Your argument for structured data makes absolute sense, and I support it, but in reality a lot of logging >takes place with basic architectures and tools.

Interesting viewpoint.  I think there are two basic viewpoints when considering a new logging standard:

1. From the viewpoint of a log consumer / processor.
2. From the viewpoint of an developer generating log events that add intelligence to a stream.

From the viewpoint of a log consumer and processor (especially commercial software), you have every financial incentive to read, massage, process, determine the format and attempt to extract intelligence out of a free-form log event.  The amount of  overhead in gathering information about what some free-form event might mean as well as debugging the parsers for it (not to mention the ad-hoc changes to it) are significant.

From the viewpoint of a developer or the development organization, the marginal overhead to mark or tag event fields is really pretty small and the payoff is big.  That doesn't mean that some low-level trace event needs to be serialized into CEE markup, but it could be.  In fact, the design of a debug library could simply generate minimal markup for the field auto-magically.  How about the logging call allows you to specify the field name for each value included and it generates the missing pieces?

So maybe a couple light conclusions:

(a) From the developer's side, use something that allows you to add minimal tagging or markup.  Something predictable which stems from a known dictionary and serialization format.

(b) From the log ingester's side, do whatever it takes to add intelligence back into the stream and emit it in a structured, predictable serialization against a known dictionary and taxonomy (perhaps CEE?)

> Until there is a robust, common, framework with widely adopted tools to
>parse all this structured data I think it will continue to be the status quo for a long time.

Ummm... XML and JSON anyone?  Tools everywhere.   Parse the whole event and peek into it or callback/dispatch code on specific fields.  Pretty basic.

>Regarding,  Natural language, free-form logs are easy. They are easy to code, easy to use during debugging and >development and they are usable without the need for any special tools or software (all you need is some grep, sed, >awk, notepad, etc). It might not be practical, or easy, or sensible, but it is feasible today.

>What's more, tools are catching up with this problem and making it much less of an issue.

Of course they're easy to code.  You're just spewing data.  Easy to use during debugging .. sort of agree.  Debuggers could use contextual information if it were provided.  (I would especially suggest JSON serialization in this case.)

The question isn't whether tools are available to do the job.  If that were the case we could bundle up log events, use free tools from NASA and route them via the Moon and back.   The question is whether it makes sense to do it that way.

I used to write natural language parsers and I agree that they aren't that difficult.  Zork did a damn good job with imperative sentence structure back in the 1970s in a limited "adventure" frame of context.  But Zork didn't need to infer the context, it knew it was running an adventure game.

Inferring a frame of context in a log stream is something you'd like to avoid at all costs.  Back to the 2nd law of thermo again --- if the information is available, include it in the event.  If it's destroyed or never included ... well entropy reversal tends to be expensive, unreliable and temporary.

Now if your intent is to examine a specific event field containing human language from which you're trying to extract meaning, then natural language processing makes a lot of sense.  IOW, if you have an event that represents a Facebook post in an individually structured field named 'COMMENT', this would be an excellent candidate for NLP.

Is this the primary use of your NLP system?  Extracting meaning from human language fields?  

In this case I don't think it's logging proper, it's data mining for fun and profit.  :-)

In the end, I don't think it's a dichotomy.   How about CEE JSON serialization events and NLP pattern recognizers when called for?

FWIW

Jim Burnes
Application Security Engineer
USDA/NRCS/ITC/Fort Collins
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Keith Robertson
In reply to this post by heinbockel
Bill,

Glad that you brought our discussion on the list but, I am still not
convinced that the base <Event/> shouldn't allow for some sort of
free-form message string.  If we make all of the basic elements of the
<Event/> (e.g. id, p_app, p_proc, etc.) required and non-optional then I
see no degradation in the standard in allowing for an optional
element(s) consisting of a free-form message.  Specifically, I am
proposing something like [1].

Again, I would like to point you to the TSM [2] standard from IBM.  The
TSM standard is a structured logging format that has been in use for
over 10 years by many IBM  products.  The TSM standard strikes a good
balance  between structured events and unstructured structured
programmer data.  I'm not suggesting that the IBM model is perfect or
that it would do everything CEE is intended to do but I am suggesting
that it should be looked at to see how they successfully solved a very
similar set of problems.  I will briefly run through the standard at the
bottom of this email for the benefit of the list.

Cheers,
Keith


-Overview of the TSM Standard:
  -- The TSM standard has been in use by IBM for over 10 years.
  -- It supports both plaintext and XML output.
  -- It is a structured logging format with required fields that are
used as meta-data to describe the event so that it can be easily
consumed by automation.
  -- It includes free-form element(s) (i.e. <LogText/> and <Param/>)
that allows for a developer message and a structured list of arguments.
  -- It can be easily consumed by automation.  In fact, IBM has various
sets of tools designed to consume and diagnose the logs and they work
very well.
  -- It supports i18n and can be easily read and understood without the
aid of tooling.

-How a Developer Would Actually use the Standard:
A developer employing the TSM standard in their project would find that
it is quite similar to the popular log4j/log4c frameworks; hence, it is
easy to use.  It is important to note that IBM was farsighted enough to
build i18n support into TMS; hence, the tooling supports key look-ups in
globalized resource bundles.

   // Resource bundle containing the textual message.
   1354a41e=Could not connect to the server {0}, on port {1}

   // Example call in some bit of Java code
   logger.msg(Level.ERROR, ... , "1354a41e", "HPDCO1054E", ..., "acld2",
"7137", ... ); // See [3] for output from this.

- What the Output Looks Like:
As I said earlier, TMS output can either be plaintext or XML though it
us usually XML.  If you open a TMS XML log file you will find sequenced
XML message blocks like those in [3].  A user looking at the XML without
the aid of a tool would generally gravitate to the "Id" and the
"LogText" elements whereas automation would generally consume everything.


//-----

[1]
<xs:element block="#all" final="#all" name="Event">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" ref="crit" maxOccurs="1"/> <-- These should be
*required*
<xs:element minOccurs="1" ref="id" maxOccurs="1"/>
<xs:element minOccurs="1" ref="p_app" maxOccurs="1"/>
<xs:element ref="p_proc" maxOccurs="1" minOccurs="1"/>
<xs:element minOccurs="1" ref="p_proc_id" maxOccurs="1"/>
<xs:element ref="p_sys" maxOccurs="1" minOccurs="1"/>
<xs:element minOccurs="1" ref="pri" maxOccurs="1"/>
<xs:element ref="time" maxOccurs="1" minOccurs="1"/>
<xs:element minOccurs="1" name="Type" maxOccurs="1">
<xs:element minOccurs="0" name="LogMessage" maxOccurs="1"
type="xs:string"/> <-- New and optional.  Maybe even a bounded string.
       [snip rest of event remains the same]

[2]
http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=%2Fcom.ibm.itame3.doc_5.1%2Fam51_pdg54.htm 


[3]
<Message Id="HPDCO1054E" Severity="ERROR">
<Time Millis="1067220550984">2003-10-26-20:09:10.984</Time>
<Component>ivc/socket</Component>
<LogAttribs><KeyName><![CDATA[Message Number]]]</KeyName>
<Value><![CDATA[0x1354A41E]]]</Value></LogAttribs>
<Source FileName="e:\am510\src\mts\mtsclient.cpp" Method="unknown"
Line="1832"></Source>
<Process>pdmgrd</Process>
<Thread>0x000001c4</Thread>
<TranslationInfo Type="XPG4" Catalog="pdbivc.cat" SetId="1"
MsgKey="1354a41e">
<Param><![CDATA[acld2]]]</Param>
<Param><![CDATA[7137]]]</Param>
</TranslationInfo>
<LogText><![CDATA[HPDCO1054E   Could not connect to the server acld2, on
port 7137. ]]]</LogText>
</Message>

On 01/25/2012 04:20 PM, Heinbockel, Bill wrote:

> I have had this conversation many times over the past years, let me have it
> once more just so I can get this monkey off of my shoulders.
>
> Free-form event logs should not be the basis of any type of system-focused
> event management. Heck, free-form text should only be minimally present in
> event data. By "free-form", I mean any event message that uses natural
> language to describe an event and uses no (or very minimal) parsable
> structure. For example, it is much easier to parse and understand name-value
> pairs, XML, CSV, or JSON than it is to have a computer understand this
> email.
>
> Why? Natural language processing is easy for humans, but hard for computers.
> Conversely, structured data processing is trivial for computers and more
> difficult for humans.
>
>
> I argue that we should design events for the lowest common denominator, and
> in this case that means we use structured event descriptions that are
> optimized for machines.
>
> The basis of my argument is two-fold:
>
> 1. Structured data can always be easily converted into a more free-form,
> human readable message via message templates. Performing the opposite
> conversion (from free-form to structured) is much more difficult and lossy
> (PCRE anyone?). (I bet that most vendors already do this to some extent to
> support internationalization.)
>
> 2. Humans use machines as the interface to event logs. Machines are used to
> filter, select, correlate, aggregate, and perform other actions on events.
> The average user will only see a small fraction of most recorded events. For
> the more skilled user, Argument 1 still holds -- the events can be
> translated into a consistent, more readable format.
>
>
> So, why are we still recording events using free-form strings instead of
> structured data?
>
> My best answer is that strings are easier for developers to use and are the
> default for most log libraries: Syslog, log4j, log4c, etc.
>
>
>
> William Heinbockel
> Infosec Engineer, Sr.
> The MITRE Corporation
> 202 Burlington Rd. MS S145
> Bedford, MA 01730
> [hidden email]
> 781-271-2615
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Burnes, James - NRCS, Fort Collins, CO
Keith Robertson said,

>If we make all of the basic elements of the
><Event/> (e.g. id, p_app, p_proc, etc.) required and non-optional then I
>see no degradation in the standard in allowing for an optional
>element(s) consisting of a free-form message.  

I can see this as being very useful.  Maybe a free-form and optional field called "trace".  This would allow debugging the logging system itself as well as provide an avenue for program development trace information that is usually quick and dirty.  That way listeners to the event stream could use it or throw it away.  Developers and debug systems wouldn't have to parse through a huge DOM to extract a simple debug field.

Debuggers of the logging system itself would have an easy way to track event progress.

Good idea there, Keith or IBM or whomever.

Jim Burnes
App Sec
USDA/NRCS
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

william.leroy
In reply to this post by Burnes, James - NRCS, Fort Collins, CO
Dear all,

I have seen some very unstructured data being reported as a DATA field in
some IPS and IDS products for example chat or IM conversations that
are collected
because of content. And then there is the trap data that is collected
by an IDS or IPS
which is usually presented in hex format. I was recently talking about
other meta data this morning
that could be incorporated as an attachment to an event or have
pointers to a way to extract it
from within the event from another database.

I don't know if this is relevant to this conversation on event data
but certainly from
two way communication from the SIEM for example to many other external systems
unstructured data can be either gathered on demand or be included as
comment data or EXT data.



On Thu, Jan 26, 2012 at 12:48 PM, Burnes, James - NRCS, Fort Collins,
CO <[hidden email]> wrote:

> Steven Buys said:
>
>>I think you're assertion that it is easier for developers and that it is the default for most log libraries definitely >has merit. Your argument for structured data makes absolute sense, and I support it, but in reality a lot of logging >takes place with basic architectures and tools.
>
> Interesting viewpoint.  I think there are two basic viewpoints when considering a new logging standard:
>
> 1. From the viewpoint of a log consumer / processor.
> 2. From the viewpoint of an developer generating log events that add intelligence to a stream.
>
> From the viewpoint of a log consumer and processor (especially commercial software), you have every financial incentive to read, massage, process, determine the format and attempt to extract intelligence out of a free-form log event.  The amount of  overhead in gathering information about what some free-form event might mean as well as debugging the parsers for it (not to mention the ad-hoc changes to it) are significant.
>
> From the viewpoint of a developer or the development organization, the marginal overhead to mark or tag event fields is really pretty small and the payoff is big.  That doesn't mean that some low-level trace event needs to be serialized into CEE markup, but it could be.  In fact, the design of a debug library could simply generate minimal markup for the field auto-magically.  How about the logging call allows you to specify the field name for each value included and it generates the missing pieces?
>
> So maybe a couple light conclusions:
>
> (a) From the developer's side, use something that allows you to add minimal tagging or markup.  Something predictable which stems from a known dictionary and serialization format.
>
> (b) From the log ingester's side, do whatever it takes to add intelligence back into the stream and emit it in a structured, predictable serialization against a known dictionary and taxonomy (perhaps CEE?)
>
>> Until there is a robust, common, framework with widely adopted tools to
>>parse all this structured data I think it will continue to be the status quo for a long time.
>
> Ummm... XML and JSON anyone?  Tools everywhere.   Parse the whole event and peek into it or callback/dispatch code on specific fields.  Pretty basic.
>
>>Regarding,  Natural language, free-form logs are easy. They are easy to code, easy to use during debugging and >development and they are usable without the need for any special tools or software (all you need is some grep, sed, >awk, notepad, etc). It might not be practical, or easy, or sensible, but it is feasible today.
>
>>What's more, tools are catching up with this problem and making it much less of an issue.
>
> Of course they're easy to code.  You're just spewing data.  Easy to use during debugging .. sort of agree.  Debuggers could use contextual information if it were provided.  (I would especially suggest JSON serialization in this case.)
>
> The question isn't whether tools are available to do the job.  If that were the case we could bundle up log events, use free tools from NASA and route them via the Moon and back.   The question is whether it makes sense to do it that way.
>
> I used to write natural language parsers and I agree that they aren't that difficult.  Zork did a damn good job with imperative sentence structure back in the 1970s in a limited "adventure" frame of context.  But Zork didn't need to infer the context, it knew it was running an adventure game.
>
> Inferring a frame of context in a log stream is something you'd like to avoid at all costs.  Back to the 2nd law of thermo again --- if the information is available, include it in the event.  If it's destroyed or never included ... well entropy reversal tends to be expensive, unreliable and temporary.
>
> Now if your intent is to examine a specific event field containing human language from which you're trying to extract meaning, then natural language processing makes a lot of sense.  IOW, if you have an event that represents a Facebook post in an individually structured field named 'COMMENT', this would be an excellent candidate for NLP.
>
> Is this the primary use of your NLP system?  Extracting meaning from human language fields?
>
> In this case I don't think it's logging proper, it's data mining for fun and profit.  :-)
>
> In the end, I don't think it's a dichotomy.   How about CEE JSON serialization events and NLP pattern recognizers when called for?
>
> FWIW
>
> Jim Burnes
> Application Security Engineer
> USDA/NRCS/ITC/Fort Collins



--
Bill LeRoy


[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Keith Robertson
Can you post valid XML that demonstrates this?

On 01/26/2012 02:38 PM, [hidden email] wrote:

> Dear all,
>
> I have seen some very unstructured data being reported as a DATA field in
> some IPS and IDS products for example chat or IM conversations that
> are collected
> because of content. And then there is the trap data that is collected
> by an IDS or IPS
> which is usually presented in hex format. I was recently talking about
> other meta data this morning
> that could be incorporated as an attachment to an event or have
> pointers to a way to extract it
> from within the event from another database.
>
> I don't know if this is relevant to this conversation on event data
> but certainly from
> two way communication from the SIEM for example to many other external systems
> unstructured data can be either gathered on demand or be included as
> comment data or EXT data.
>
>
>
> On Thu, Jan 26, 2012 at 12:48 PM, Burnes, James - NRCS, Fort Collins,
> CO<[hidden email]>  wrote:
>> Steven Buys said:
>>
>>> I think you're assertion that it is easier for developers and that it is the default for most log libraries definitely>has merit. Your argument for structured data makes absolute sense, and I support it, but in reality a lot of logging>takes place with basic architectures and tools.
>> Interesting viewpoint.  I think there are two basic viewpoints when considering a new logging standard:
>>
>> 1. From the viewpoint of a log consumer / processor.
>> 2. From the viewpoint of an developer generating log events that add intelligence to a stream.
>>
>>  From the viewpoint of a log consumer and processor (especially commercial software), you have every financial incentive to read, massage, process, determine the format and attempt to extract intelligence out of a free-form log event.  The amount of  overhead in gathering information about what some free-form event might mean as well as debugging the parsers for it (not to mention the ad-hoc changes to it) are significant.
>>
>>  From the viewpoint of a developer or the development organization, the marginal overhead to mark or tag event fields is really pretty small and the payoff is big.  That doesn't mean that some low-level trace event needs to be serialized into CEE markup, but it could be.  In fact, the design of a debug library could simply generate minimal markup for the field auto-magically.  How about the logging call allows you to specify the field name for each value included and it generates the missing pieces?
>>
>> So maybe a couple light conclusions:
>>
>> (a) From the developer's side, use something that allows you to add minimal tagging or markup.  Something predictable which stems from a known dictionary and serialization format.
>>
>> (b) From the log ingester's side, do whatever it takes to add intelligence back into the stream and emit it in a structured, predictable serialization against a known dictionary and taxonomy (perhaps CEE?)
>>
>>> Until there is a robust, common, framework with widely adopted tools to
>>> parse all this structured data I think it will continue to be the status quo for a long time.
>> Ummm... XML and JSON anyone?  Tools everywhere.   Parse the whole event and peek into it or callback/dispatch code on specific fields.  Pretty basic.
>>
>>> Regarding,  Natural language, free-form logs are easy. They are easy to code, easy to use during debugging and>development and they are usable without the need for any special tools or software (all you need is some grep, sed,>awk, notepad, etc). It might not be practical, or easy, or sensible, but it is feasible today.
>>> What's more, tools are catching up with this problem and making it much less of an issue.
>> Of course they're easy to code.  You're just spewing data.  Easy to use during debugging .. sort of agree.  Debuggers could use contextual information if it were provided.  (I would especially suggest JSON serialization in this case.)
>>
>> The question isn't whether tools are available to do the job.  If that were the case we could bundle up log events, use free tools from NASA and route them via the Moon and back.   The question is whether it makes sense to do it that way.
>>
>> I used to write natural language parsers and I agree that they aren't that difficult.  Zork did a damn good job with imperative sentence structure back in the 1970s in a limited "adventure" frame of context.  But Zork didn't need to infer the context, it knew it was running an adventure game.
>>
>> Inferring a frame of context in a log stream is something you'd like to avoid at all costs.  Back to the 2nd law of thermo again --- if the information is available, include it in the event.  If it's destroyed or never included ... well entropy reversal tends to be expensive, unreliable and temporary.
>>
>> Now if your intent is to examine a specific event field containing human language from which you're trying to extract meaning, then natural language processing makes a lot of sense.  IOW, if you have an event that represents a Facebook post in an individually structured field named 'COMMENT', this would be an excellent candidate for NLP.
>>
>> Is this the primary use of your NLP system?  Extracting meaning from human language fields?
>>
>> In this case I don't think it's logging proper, it's data mining for fun and profit.  :-)
>>
>> In the end, I don't think it's a dichotomy.   How about CEE JSON serialization events and NLP pattern recognizers when called for?
>>
>> FWIW
>>
>> Jim Burnes
>> Application Security Engineer
>> USDA/NRCS/ITC/Fort Collins
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Keith Robertson
In reply to this post by Burnes, James - NRCS, Fort Collins, CO
On 01/26/2012 01:14 PM, Burnes, James - NRCS, Fort Collins, CO wrote:
> Keith Robertson said,
>
>> If we make all of the basic elements of the
>> <Event/>  (e.g. id, p_app, p_proc, etc.) required and non-optional then I
>> see no degradation in the standard in allowing for an optional
>> element(s) consisting of a free-form message.
> I can see this as being very useful.  Maybe a free-form and optional field called "trace".  This would allow debugging the logging system itself as well as provide an avenue for program development trace information that is usually quick and dirty.  That way listeners to the event stream could use it or throw it away.  Developers and debug systems wouldn't have to parse through a huge DOM to extract a simple debug field.
Exactly, without something like a <LogMessage/> element it will be very
hard to get buy-in from the development community because they'll be
forced to create a Profile/Extension from the base <Event/>.  If you
really want CEE to gain widespread adoption you've got to make the bar
for adoption very low and it must have a feature set that is equal to or
greater than what is currently available.

I think that it is unreasonable to think that people will stop reading
log files and immediately reach for a tool that can parse and display
CEE XML/JSON.  As such, what is needed is a compromise, like TMS, that
does both.
> Debuggers of the logging system itself would have an easy way to track event progress.
>
> Good idea there, Keith or IBM or whomever.
Thanks, and no I didn't invent the standard. I did; however, use it in
all of the products that I helped to send out the door while I was there.
> Jim Burnes
> App Sec
> USDA/NRCS
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

william.leroy
Bill,

I was just wondering, although it may or may not be part of the specification
but what is the max size of a CEE event?

Best Regards,
Bill LeRoy

On Thu, Jan 26, 2012 at 2:53 PM, Keith Robertson <[hidden email]> wrote:

> On 01/26/2012 01:14 PM, Burnes, James - NRCS, Fort Collins, CO wrote:
>>
>> Keith Robertson said,
>>
>>> If we make all of the basic elements of the
>>> <Event/>  (e.g. id, p_app, p_proc, etc.) required and non-optional then I
>>> see no degradation in the standard in allowing for an optional
>>> element(s) consisting of a free-form message.
>>
>> I can see this as being very useful.  Maybe a free-form and optional field
>> called "trace".  This would allow debugging the logging system itself as
>> well as provide an avenue for program development trace information that is
>> usually quick and dirty.  That way listeners to the event stream could use
>> it or throw it away.  Developers and debug systems wouldn't have to parse
>> through a huge DOM to extract a simple debug field.
>
> Exactly, without something like a <LogMessage/> element it will be very hard
> to get buy-in from the development community because they'll be forced to
> create a Profile/Extension from the base <Event/>.  If you really want CEE
> to gain widespread adoption you've got to make the bar for adoption very low
> and it must have a feature set that is equal to or greater than what is
> currently available.
>
> I think that it is unreasonable to think that people will stop reading log
> files and immediately reach for a tool that can parse and display CEE
> XML/JSON.  As such, what is needed is a compromise, like TMS, that does
> both.
>
>> Debuggers of the logging system itself would have an easy way to track
>> event progress.
>>
>> Good idea there, Keith or IBM or whomever.
>
> Thanks, and no I didn't invent the standard. I did; however, use it in all
> of the products that I helped to send out the door while I was there.
>
>> Jim Burnes
>> App Sec
>> USDA/NRCS



--
Bill LeRoy


[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

heinbockel
In reply to this post by Keith Robertson
To clarify, I have no problem with permitting free-form text in event
messages.

What I do not want to have happen is the same think that is happening with
the RFC5424 Syslog. When you allow for free-form content, there will be much
tendency to use the free-form INSTEAD OF the structured content.

While I do not mind supporting the free-form, I believe there needs to be
some motivation to encourage vendors to eventually migrate their logs to
support structured content.

Maybe there is a way that we can support the free-form text (primarily for
legacy compatibility) and find some way to encourage vendors to use more
structured content -- either coming as conformance requirements or removing
the Text option from a later CEE version.

Does anyone have any suggestions?

William Heinbockel
The MITRE Corporation


>-----Original Message-----
>From: Keith Robertson [mailto:[hidden email]]
>Sent: Thursday, 26 January, 2012 12:56
>To: Heinbockel, Bill
>Cc: cee-discussion-list CEE-Related Discussion
>Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages
>
>Bill,
>
>Glad that you brought our discussion on the list but, I am still not
>convinced that the base <Event/> shouldn't allow for some sort of
>free-form message string.  If we make all of the basic elements of the
><Event/> (e.g. id, p_app, p_proc, etc.) required and non-optional then I
>see no degradation in the standard in allowing for an optional
>element(s) consisting of a free-form message.  Specifically, I am
>proposing something like [1].
>
>Again, I would like to point you to the TSM [2] standard from IBM.  The
>TSM standard is a structured logging format that has been in use for
>over 10 years by many IBM  products.  The TSM standard strikes a good
>balance  between structured events and unstructured structured
>programmer data.  I'm not suggesting that the IBM model is perfect or
>that it would do everything CEE is intended to do but I am suggesting
>that it should be looked at to see how they successfully solved a very
>similar set of problems.  I will briefly run through the standard at the
>bottom of this email for the benefit of the list.
>
>Cheers,
>Keith
>
>
>-Overview of the TSM Standard:
>  -- The TSM standard has been in use by IBM for over 10 years.
>  -- It supports both plaintext and XML output.
>  -- It is a structured logging format with required fields that are
>used as meta-data to describe the event so that it can be easily
>consumed by automation.
>  -- It includes free-form element(s) (i.e. <LogText/> and <Param/>)
>that allows for a developer message and a structured list of arguments.
>  -- It can be easily consumed by automation.  In fact, IBM has various
>sets of tools designed to consume and diagnose the logs and they work
>very well.
>  -- It supports i18n and can be easily read and understood without the
>aid of tooling.
>
>-How a Developer Would Actually use the Standard:
>A developer employing the TSM standard in their project would find that
>it is quite similar to the popular log4j/log4c frameworks; hence, it is
>easy to use.  It is important to note that IBM was farsighted enough to
>build i18n support into TMS; hence, the tooling supports key look-ups in
>globalized resource bundles.
>
>   // Resource bundle containing the textual message.
>   1354a41e=Could not connect to the server {0}, on port {1}
>
>   // Example call in some bit of Java code
>   logger.msg(Level.ERROR, ... , "1354a41e", "HPDCO1054E", ..., "acld2",
>"7137", ... ); // See [3] for output from this.
>
>- What the Output Looks Like:
>As I said earlier, TMS output can either be plaintext or XML though it
>us usually XML.  If you open a TMS XML log file you will find sequenced
>XML message blocks like those in [3].  A user looking at the XML without
>the aid of a tool would generally gravitate to the "Id" and the
>"LogText" elements whereas automation would generally consume everything.
>
>
>//-----
>
>[1]
><xs:element block="#all" final="#all" name="Event">
><xs:complexType>
><xs:sequence>
><xs:element minOccurs="1" ref="crit" maxOccurs="1"/> <-- These should be
>*required*
><xs:element minOccurs="1" ref="id" maxOccurs="1"/>
><xs:element minOccurs="1" ref="p_app" maxOccurs="1"/>
><xs:element ref="p_proc" maxOccurs="1" minOccurs="1"/>
><xs:element minOccurs="1" ref="p_proc_id" maxOccurs="1"/>
><xs:element ref="p_sys" maxOccurs="1" minOccurs="1"/>
><xs:element minOccurs="1" ref="pri" maxOccurs="1"/>
><xs:element ref="time" maxOccurs="1" minOccurs="1"/>
><xs:element minOccurs="1" name="Type" maxOccurs="1">
><xs:element minOccurs="0" name="LogMessage" maxOccurs="1"
>type="xs:string"/> <-- New and optional.  Maybe even a bounded string.
>       [snip rest of event remains the same]
>
>[2]
>http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=%2Fc
>om.ibm.itame3.doc_5.1%2Fam51_pdg54.htm
>
>
>[3]
><Message Id="HPDCO1054E" Severity="ERROR">
><Time Millis="1067220550984">2003-10-26-20:09:10.984</Time>
><Component>ivc/socket</Component>
><LogAttribs><KeyName><![CDATA[Message Number]]]</KeyName>
><Value><![CDATA[0x1354A41E]]]</Value></LogAttribs>
><Source FileName="e:\am510\src\mts\mtsclient.cpp" Method="unknown"
>Line="1832"></Source>
><Process>pdmgrd</Process>
><Thread>0x000001c4</Thread>
><TranslationInfo Type="XPG4" Catalog="pdbivc.cat" SetId="1"
>MsgKey="1354a41e">
><Param><![CDATA[acld2]]]</Param>
><Param><![CDATA[7137]]]</Param>
></TranslationInfo>
><LogText><![CDATA[HPDCO1054E   Could not connect to the server acld2, on
>port 7137. ]]]</LogText>
></Message>
>
>On 01/25/2012 04:20 PM, Heinbockel, Bill wrote:
>> I have had this conversation many times over the past years, let me have
>it
>> once more just so I can get this monkey off of my shoulders.
>>
>> Free-form event logs should not be the basis of any type of system-
>focused
>> event management. Heck, free-form text should only be minimally present
>in
>> event data. By "free-form", I mean any event message that uses natural
>> language to describe an event and uses no (or very minimal) parsable
>> structure. For example, it is much easier to parse and understand name-
>value
>> pairs, XML, CSV, or JSON than it is to have a computer understand this
>> email.
>>
>> Why? Natural language processing is easy for humans, but hard for
>computers.
>> Conversely, structured data processing is trivial for computers and more
>> difficult for humans.
>>
>>
>> I argue that we should design events for the lowest common denominator,
>and
>> in this case that means we use structured event descriptions that are
>> optimized for machines.
>>
>> The basis of my argument is two-fold:
>>
>> 1. Structured data can always be easily converted into a more free-form,
>> human readable message via message templates. Performing the opposite
>> conversion (from free-form to structured) is much more difficult and
>lossy
>> (PCRE anyone?). (I bet that most vendors already do this to some extent
>to
>> support internationalization.)
>>
>> 2. Humans use machines as the interface to event logs. Machines are used
>to
>> filter, select, correlate, aggregate, and perform other actions on
>events.
>> The average user will only see a small fraction of most recorded events.
>For
>> the more skilled user, Argument 1 still holds -- the events can be
>> translated into a consistent, more readable format.
>>
>>
>> So, why are we still recording events using free-form strings instead of
>> structured data?
>>
>> My best answer is that strings are easier for developers to use and are
>the
>> default for most log libraries: Syslog, log4j, log4c, etc.
>>
>>
>>
>> William Heinbockel
>> Infosec Engineer, Sr.
>> The MITRE Corporation
>> 202 Burlington Rd. MS S145
>> Bedford, MA 01730
>> [hidden email]
>> 781-271-2615
>>


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

John Calcote-2
Bill Heinbockel is right... and Bill Scherr is right. The problem is that
there are two distinct and separate points of view here - two ways of seeing
the problem space:

1. The old way: log files should be human readable. If they're not, they're
not worth having. Any decent unix system will have grep, awk, and sed, which
is all you need to get a report. A master administrator can answer any
question about the security of your system within 5 minutes.

2. The new way: an event should be structure using a universal schema so it
can be machine read and processed by standardized tools.

People in camp 1 don't want structured events. They're hard to read, and
they're not (ironically) suitable for using with tools such as grep, awk,
and sed. These people are not interested in large event aggregation,
processing and reporting systems. They generally administer a handful of
unix or unix-like systems and have a set of scripts that use simple
pervasive command line tools to process their log files.

People in camp 2 don't want free-form text log messages. They want something
their million-dollar SMI-s or SNMP management infrastructure can process for
them and automatically generate alerts when something alert-worthy happens,
or publish polished security reports with little or no human intervention.

As Bill H. said below, this doesn't mean that no fields of the structure
event can contain free-form text. It simply means that that particular field
will have to be processed by a human reader. Machines can use heuristics to
get *some* data out of these free-form text fields, but even heuristic
search criteria have to be programmed by the system administrator, and
they're a guess (by definition) at best.

Let's just agree to disagree here. These two camps will never come together,
and no one's saying they have to; syslog isn't going anywhere. It'll be
around for as long as unix-like systems are in operation. On the other hand,
having a structured event system in place, next to syslog, costs nothing
(except a little cpu time).

John

> -----Original Message-----
> From: Heinbockel, Bill [mailto:[hidden email]]
> Sent: Thursday, January 26, 2012 4:12 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event
> Messages
>
> To clarify, I have no problem with permitting free-form text in event
> messages.
>
> What I do not want to have happen is the same think that is happening with
> the RFC5424 Syslog. When you allow for free-form content, there will be
> much
> tendency to use the free-form INSTEAD OF the structured content.
>
> While I do not mind supporting the free-form, I believe there needs to be
> some motivation to encourage vendors to eventually migrate their logs to
> support structured content.
>
> Maybe there is a way that we can support the free-form text (primarily for
> legacy compatibility) and find some way to encourage vendors to use more
> structured content -- either coming as conformance requirements or
> removing
> the Text option from a later CEE version.
>
> Does anyone have any suggestions?
>
> William Heinbockel
> The MITRE Corporation
>
>
> >-----Original Message-----
> >From: Keith Robertson [mailto:[hidden email]]
> >Sent: Thursday, 26 January, 2012 12:56
> >To: Heinbockel, Bill
> >Cc: cee-discussion-list CEE-Related Discussion
> >Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event
> Messages
> >
> >Bill,
> >
> >Glad that you brought our discussion on the list but, I am still not
> >convinced that the base <Event/> shouldn't allow for some sort of
> >free-form message string.  If we make all of the basic elements of the
> ><Event/> (e.g. id, p_app, p_proc, etc.) required and non-optional then I
> >see no degradation in the standard in allowing for an optional
> >element(s) consisting of a free-form message.  Specifically, I am
> >proposing something like [1].
> >
> >Again, I would like to point you to the TSM [2] standard from IBM.  The
> >TSM standard is a structured logging format that has been in use for
> >over 10 years by many IBM  products.  The TSM standard strikes a good
> >balance  between structured events and unstructured structured
> >programmer data.  I'm not suggesting that the IBM model is perfect or
> >that it would do everything CEE is intended to do but I am suggesting
> >that it should be looked at to see how they successfully solved a very
> >similar set of problems.  I will briefly run through the standard at the
> >bottom of this email for the benefit of the list.
> >
> >Cheers,
> >Keith
> >
> >
> >-Overview of the TSM Standard:
> >  -- The TSM standard has been in use by IBM for over 10 years.
> >  -- It supports both plaintext and XML output.
> >  -- It is a structured logging format with required fields that are
> >used as meta-data to describe the event so that it can be easily
> >consumed by automation.
> >  -- It includes free-form element(s) (i.e. <LogText/> and <Param/>)
> >that allows for a developer message and a structured list of arguments.
> >  -- It can be easily consumed by automation.  In fact, IBM has various
> >sets of tools designed to consume and diagnose the logs and they work
> >very well.
> >  -- It supports i18n and can be easily read and understood without the
> >aid of tooling.
> >
> >-How a Developer Would Actually use the Standard:
> >A developer employing the TSM standard in their project would find that
> >it is quite similar to the popular log4j/log4c frameworks; hence, it is
> >easy to use.  It is important to note that IBM was farsighted enough to
> >build i18n support into TMS; hence, the tooling supports key look-ups in
> >globalized resource bundles.
> >
> >   // Resource bundle containing the textual message.
> >   1354a41e=Could not connect to the server {0}, on port {1}
> >
> >   // Example call in some bit of Java code
> >   logger.msg(Level.ERROR, ... , "1354a41e", "HPDCO1054E", ..., "acld2",
> >"7137", ... ); // See [3] for output from this.
> >
> >- What the Output Looks Like:
> >As I said earlier, TMS output can either be plaintext or XML though it
> >us usually XML.  If you open a TMS XML log file you will find sequenced
> >XML message blocks like those in [3].  A user looking at the XML without
> >the aid of a tool would generally gravitate to the "Id" and the
> >"LogText" elements whereas automation would generally consume
> everything.
> >
> >
> >//-----
> >
> >[1]
> ><xs:element block="#all" final="#all" name="Event">
> ><xs:complexType>
> ><xs:sequence>
> ><xs:element minOccurs="1" ref="crit" maxOccurs="1"/> <-- These should
> be
> >*required*
> ><xs:element minOccurs="1" ref="id" maxOccurs="1"/>
> ><xs:element minOccurs="1" ref="p_app" maxOccurs="1"/>
> ><xs:element ref="p_proc" maxOccurs="1" minOccurs="1"/>
> ><xs:element minOccurs="1" ref="p_proc_id" maxOccurs="1"/>
> ><xs:element ref="p_sys" maxOccurs="1" minOccurs="1"/>
> ><xs:element minOccurs="1" ref="pri" maxOccurs="1"/>
> ><xs:element ref="time" maxOccurs="1" minOccurs="1"/>
> ><xs:element minOccurs="1" name="Type" maxOccurs="1">
> ><xs:element minOccurs="0" name="LogMessage" maxOccurs="1"
> >type="xs:string"/> <-- New and optional.  Maybe even a bounded string.
> >       [snip rest of event remains the same]
> >
> >[2]
> ><a href="http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=%">http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=%
> 2Fc
> >om.ibm.itame3.doc_5.1%2Fam51_pdg54.htm
> >
> >
> >[3]
> ><Message Id="HPDCO1054E" Severity="ERROR">
> ><Time Millis="1067220550984">2003-10-26-20:09:10.984</Time>
> ><Component>ivc/socket</Component>
> ><LogAttribs><KeyName><![CDATA[Message Number]]]</KeyName>
> ><Value><![CDATA[0x1354A41E]]]</Value></LogAttribs>
> ><Source FileName="e:\am510\src\mts\mtsclient.cpp" Method="unknown"
> >Line="1832"></Source>
> ><Process>pdmgrd</Process>
> ><Thread>0x000001c4</Thread>
> ><TranslationInfo Type="XPG4" Catalog="pdbivc.cat" SetId="1"
> >MsgKey="1354a41e">
> ><Param><![CDATA[acld2]]]</Param>
> ><Param><![CDATA[7137]]]</Param>
> ></TranslationInfo>
> ><LogText><![CDATA[HPDCO1054E   Could not connect to the server acld2,
> on
> >port 7137. ]]]</LogText>
> ></Message>
> >
> >On 01/25/2012 04:20 PM, Heinbockel, Bill wrote:
> >> I have had this conversation many times over the past years, let me
have

> >it
> >> once more just so I can get this monkey off of my shoulders.
> >>
> >> Free-form event logs should not be the basis of any type of system-
> >focused
> >> event management. Heck, free-form text should only be minimally
> present
> >in
> >> event data. By "free-form", I mean any event message that uses natural
> >> language to describe an event and uses no (or very minimal) parsable
> >> structure. For example, it is much easier to parse and understand name-
> >value
> >> pairs, XML, CSV, or JSON than it is to have a computer understand this
> >> email.
> >>
> >> Why? Natural language processing is easy for humans, but hard for
> >computers.
> >> Conversely, structured data processing is trivial for computers and
more

> >> difficult for humans.
> >>
> >>
> >> I argue that we should design events for the lowest common
> denominator,
> >and
> >> in this case that means we use structured event descriptions that are
> >> optimized for machines.
> >>
> >> The basis of my argument is two-fold:
> >>
> >> 1. Structured data can always be easily converted into a more
free-form,

> >> human readable message via message templates. Performing the
> opposite
> >> conversion (from free-form to structured) is much more difficult and
> >lossy
> >> (PCRE anyone?). (I bet that most vendors already do this to some extent
> >to
> >> support internationalization.)
> >>
> >> 2. Humans use machines as the interface to event logs. Machines are
> used
> >to
> >> filter, select, correlate, aggregate, and perform other actions on
> >events.
> >> The average user will only see a small fraction of most recorded
events.
> >For
> >> the more skilled user, Argument 1 still holds -- the events can be
> >> translated into a consistent, more readable format.
> >>
> >>
> >> So, why are we still recording events using free-form strings instead
of

> >> structured data?
> >>
> >> My best answer is that strings are easier for developers to use and are
> >the
> >> default for most log libraries: Syslog, log4j, log4c, etc.
> >>
> >>
> >>
> >> William Heinbockel
> >> Infosec Engineer, Sr.
> >> The MITRE Corporation
> >> 202 Burlington Rd. MS S145
> >> Bedford, MA 01730
> >> [hidden email]
> >> 781-271-2615
> >>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Keith Robertson
In reply to this post by heinbockel
On 01/26/2012 06:12 PM, Heinbockel, Bill wrote:
> Maybe there is a way that we can support the free-form text (primarily for
> legacy compatibility) and find some way to encourage vendors to use more
> structured content -- either coming as conformance requirements or removing
> the Text option from a later CEE version.
>
> Does anyone have any suggestions?
I have some suggestions and will start another thread with a
demonstration using the XSD and XML.

Cheers,
Keith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Burnes, James - NRCS, Fort Collins, CO
In reply to this post by heinbockel
I think the key to allowing free-form is to allow it within a field marked as such.  For example a 'trace' field.  Expose it at the most outer layer of the event structure as makes sense.  This way a casual or non-library assisted developer could code for it and default logging method calls could wrap the free form text in a properly structured serialization with very little overhead.  In other words, given a log method call such as:

log.INFO("unknown exception in module xyz.  made it to method xyzzy")

A CEE compliant log4j/log4net etc appender would simply include a 'trace' field with the log string as the value of that field.

No muss, no fuss.

But the event would still be considered CEE compliant and be able to be processed with the same stream tools.

The other key is to get log library vendors to create log appenders and IDE vendors to create interactive CEE dictionary assist features in Visual Studio and Eclipse.  That way developers are much more likely to add intelligent tags to the event if you make it easy.

Just my 5 cents.

Jim Burnes

________________________________________
From: Heinbockel, Bill [[hidden email]]
Sent: Thursday, January 26, 2012 4:12 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages

To clarify, I have no problem with permitting free-form text in event
messages.

What I do not want to have happen is the same think that is happening with
the RFC5424 Syslog. When you allow for free-form content, there will be much
tendency to use the free-form INSTEAD OF the structured content.

...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Christopher Byrd
I agree with Bill - there is little difference between a log.INFO
method and RFC-5424 syslog. Please consider how infrequently
structured data is used in syslog messages versus dumping everything
into MSG. I'm concerned a log method that dumps into a trace field
would be abused and recreate the same mess we have now.

Christopher

On Fri, Jan 27, 2012 at 11:11 AM, Burnes, James - NRCS, Fort Collins,
CO <[hidden email]> wrote:

> I think the key to allowing free-form is to allow it within a field marked as such.  For example a 'trace' field.  Expose it at the most outer layer of the event structure as makes sense.  This way a casual or non-library assisted developer could code for it and default logging method calls could wrap the free form text in a properly structured serialization with very little overhead.  In other words, given a log method call such as:
>
> log.INFO("unknown exception in module xyz.  made it to method xyzzy")
>
> A CEE compliant log4j/log4net etc appender would simply include a 'trace' field with the log string as the value of that field.
>
> No muss, no fuss.
>
> But the event would still be considered CEE compliant and be able to be processed with the same stream tools.
>
> The other key is to get log library vendors to create log appenders and IDE vendors to create interactive CEE dictionary assist features in Visual Studio and Eclipse.  That way developers are much more likely to add intelligent tags to the event if you make it easy.
>
> Just my 5 cents.
>
> Jim Burnes
>
> ________________________________________
> From: Heinbockel, Bill [[hidden email]]
> Sent: Thursday, January 26, 2012 4:12 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages
>
> To clarify, I have no problem with permitting free-form text in event
> messages.
>
> What I do not want to have happen is the same think that is happening with
> the RFC5424 Syslog. When you allow for free-form content, there will be much
> tendency to use the free-form INSTEAD OF the structured content.
>
> ...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured vs. Free-form Event Messages

Eric Fitzgerald
In reply to this post by william.leroy

Here are the limits from the 0.6 version of the specification:

 

Maximum Event Record Size: 64 KB

Maximum Field Value Size: 32 KB

Maximum Number of Fields in an event: 255

Maximum Number of Values in a  multi-valued field: 255

 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Thursday, January 26, 2012 12:40 PM
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Structured vs. Free-form Event Messages

 

Bill,

 

I was just wondering, although it may or may not be part of the specification but what is the max size of a CEE event?

 

Best Regards,

Bill LeRoy

 

On Thu, Jan 26, 2012 at 2:53 PM, Keith Robertson <[hidden email]> wrote:

> On 01/26/2012 01:14 PM, Burnes, James - NRCS, Fort Collins, CO wrote:

>> 

>> Keith Robertson said,

>> 

>>> If we make all of the basic elements of the <Event/>  (e.g. id,

>>> p_app, p_proc, etc.) required and non-optional then I see no

>>> degradation in the standard in allowing for an optional

>>> element(s) consisting of a free-form message.

>> 

>> I can see this as being very useful.  Maybe a free-form and optional

>> field called "trace".  This would allow debugging the logging system

>> itself as well as provide an avenue for program development trace

>> information that is usually quick and dirty.  That way listeners to

>> the event stream could use it or throw it away.  Developers and debug

>> systems wouldn't have to parse through a huge DOM to extract a simple debug field.

> 

> Exactly, without something like a <LogMessage/> element it will be

> very hard to get buy-in from the development community because they'll

> be forced to create a Profile/Extension from the base <Event/>.  If

> you really want CEE to gain widespread adoption you've got to make the

> bar for adoption very low and it must have a feature set that is equal

> to or greater than what is currently available.

> 

> I think that it is unreasonable to think that people will stop reading

> log files and immediately reach for a tool that can parse and display

> CEE XML/JSON.  As such, what is needed is a compromise, like TMS, that

> does both.

> 

>> Debuggers of the logging system itself would have an easy way to

>> track event progress.

>> 

>> Good idea there, Keith or IBM or whomever.

> 

> Thanks, and no I didn't invent the standard. I did; however, use it in

> all of the products that I helped to send out the door while I was there.

> 

>> Jim Burnes

>> App Sec

>> USDA/NRCS

 

 

 

--

Bill LeRoy

 

 

[hidden email]

 

 

 

Loading...