Re: Standardizing log contents (aka "taxono my") for CEE

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: Standardizing log contents (aka "taxono my") for CEE

Lagadec Philippe



I agree with this.

I believe that we can carefully choose event fields and taxonomy so that it can be stored either in XML, one-line text or binary formats. Each format has its own advantages and drawbacks.

XML is an interesting syntax for people working on a new format, because everybody can easily understand the fields and quickly develop test implementations without having to create a new parser/encoder. Then when it's time for operational use, it may always be converted to simple text or binary to become more effective.


One useful thing is to keep the XML schema as straightforward as possible. For example each field should be easily extracted with a simple XPath expression such as "/event/action" or "/event/source/address".





From: Eric Fitzgerald [mailto:[hidden email]]
Sent: 01 October 2007 05:00
To: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Standardizing log contents (aka "taxono my") for CEE


Hey Daniel,
I am not a huge XML fan- I think people often tend to leap to XML as a solution before they understand the problem- but I don't think it's inappropriate in this case and I will explain why.
I want to separate the concept of log storage from the concept of data exchange.
For data interchange between different systems- for example between a host and a log collector- XML was designed to solve this kind of problem.  By making the data carry its own description (called "infoset" in XMLspeak) and by using a well-understood format, XML minimizes the chance for error between the two systems and minimizes developer burden.
For log storage, XML is horrible.  First, since the log storage format is primarily consumed by the system generating the log, which completely understands the log format, storing an XML infoset (and XML formatting) is a huge waste of space.  Also, the primary file i/o operation on a log is append.  You can't append well-formed XML to well-formed XML and end up with well-formed XML.  QED
I don't really even want to discuss local log formats, because there is little likelihood Microsoft would adopt such a format in Windows.  However I would point out that if your log can be easily translated into XML, then XSL makes it easy to express in just about any format.  I think that if any non-XML format is proposed by the working group then my recommendation would be that the format be trivially and unambiguously convertible to XML- that is, that if the WG proposes something else, that we also demonstrate precisely how to convert to XML.
I disagree that XML parsing is expensive.  I would have agreed a couple of years ago, but since then I prototyped SEM using XML representation of events and, done properly, it's only marginally more expensive than parsing any other text format.  If you're doing high-volume work, then you need to use simple schemas and SAX parsing (which validates as a side effect).  DOM is probably not performant enough for most log-related purposes, and DOM with validation is definitely not.  NB There are a number of binary XML representations that make XML much more compact, and parse more quickly.
Finally, as a practical matter, another format imposes the burden of writing parsers, and of figuring out how to validate if you have a well-formed piece of formatted file.  IMO developers frequently make mistakes in this area.  XML makes this unambiguous and simple- the tools already exist and there is an unambiguous way to describe the data format, XSD.
From: Daniel Cid [[hidden email]]
Sent: Saturday, September 29, 2007 6:21 AM
To: Eric Fitzgerald
Cc: [hidden email]
Subject: Re: [CEE-DISCUSSION-LIST] Standardizing log contents (aka "taxono my") for CEE

Hi Eric,

I disagree regarding XML. It is expensive to parse, wasteful of
resources (too much duplicated information) and hard to read. I can't
imagine a project switching from a one-line
log, like syslog to XML (yes, I doubt that OpenSSH, Apache and others
would do that).

In addition to that, most admins* will not like the idea that their
"cat/grep" foo will not work
anymore with a multi-line log...

I like the way CEF works, with pipes as separators. We could use
something like that or
even tabs or other less used character.... but let's keep it simple.

UNLESS, we do something like tcpdump for pcaps, where the logs could
even be in binary
format, and we just run "log-dump" and get the logs line per line... Hum... :)

*Looking from a Unix point of view.


Daniel B. Cid
dcid ( at )

On 9/28/07, Eric Fitzgerald <[hidden email]> wrote:
> One of the 4 core items on the CEE to-do list was to define an interchange format.
> I am no fan of XML, but it seems we have an appropriate use case here- interchange of data between disparate systems.
> Don't forget that we have to make the solution attractive to developers.  "Yet another format == yet another parser" and the solution starts looking less desirable.
> Local storage is another matter; store in whatever format you want.  Just expose the data through in a standard format, and I think XML is appropriate for that.
> On a separate note taxonomy is representable as xml in an unambiguous way.  Not advocating, just observing.
> Eric
> -----Original Message-----
> From: Raffael Marty [mailto:[hidden email]]
> Sent: Friday, September 28, 2007 12:53 PM
> To: [hidden email]
> Subject: Re: [CEE-DISCUSSION-LIST] Standardizing log contents (aka "taxono my") for CEE
> > Common XML schema for events and nothing else -> see IDMEF failed
> > standard.
> >
> What is this? XML? Don't get me started. And why just a schema for
> events? Who was saying that? We NEED a taxonomy. If anyone wants more
> reasons for having a taxonomy, have a look at the white paper or I
> will past here...
>    -raffy