Quantcast

Fwd: Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names

Peter Czanik
Hello,
Gergely Nagy, a colleague asked me to forward his answer, as he is not on the list.
Bye,
CzP


-------- Original Message --------
Subject: Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names
Date: Mon, 06 Aug 2012 12:23:22 +0200
From: Gergely Nagy [hidden email]
Organization: BalaBit IT Security Ltd
To: Peter Czanik [hidden email]


Hi!

While I'm not subscribed to cee-discussion, I do read the list. I do not
believe my vote would be counted, but having implemented JSON parsing
for syslog-ng, and another (as of now, unreleased) CEE consumer, I'd
like to voice my opinion, from an implementor's point of view.

This very same topic has been discussed on the lumberjack list in the
past[1], by the way.

 [1]: https://lists.fedorahosted.org/pipermail/lumberjack-developers/2012-May/000774.html

I don't want to repeat the same arguments again, so I'll make it short:
dotted notation on the wire is stupid. It comes with very little gain
(speed, if and only if, the producer already uses a flat dotted-notation
structure, which I don't think will be the case for most future apps),
but a huge amount of drawbacks.

Dotted notation is great at the API level, but *NOT* on the wire.

With that out, with my implementor hat on, my vote would be:

>> 2.       There is a conceptual hierarchy and we should REQUIRE people
>> use structured markup to express it. So hierarchical names are
>> allowed, and MUST be represented using structured JSON or XML.

I believe logs has a conceptual hierarchy (a flat structure is better
than no structure at all, but if we can go the whole way, why stop
halfway?), and they should be represented in the most appropriate and
straightforward form: hierarchically.

If we need to transform the conceptual hierarchy, we're doing something
wrong, we're doing work that would not need to be done.

To make my point clearer, lets have a look at the examples:

>> As an example, imagine the source IPv4 address. You could conceptually
>> represent this as a hierarchy, with the parent object being "source"
>> and the field name being "ipv4". You'd have further fields under the
>> "source" object, like "port". You could also have "ipv4" nested under
>> some other parent object, like "destination". For each of the
>> proposals above, you'd have objects like:
>>
>> 1.       {"source_ipv4": "192.168.1.1"}

JSON consumers can understand this, but we'd need extra code to
recognise that this belongs together with the other source_* stuff. In
practice, this is nothing else but dotted notation without the dot.

This can also be forwarded as-is to anything that understands JSON,
without it having to know about CEE too, but it won't be able to group
things together.

>> 2.       {"source": {"ipv4": "192.168.1.1"}}

JSON consumers can understand this, and things that belong together, are
together already - no extra work needs to be performed.

Things work out of the box.

Similarly to the first, this can be forwarded as-is, and anything that
understands structures, will be able to group things together, without
having to have the faintest clue about CEE.

>> 3.       {"source": {"ipv4": "192.168.1.1"}} OR {"source.ipv4":
>> "192.168.1.1"}

JSON consumers can understand both, but will have to know about CEE, and
need to do extra work to normalize the input.

It can't be forwarded as-is, only when the receiving end is known to
understand CEE. You're not going to be able to easily push this to
CouchDB or ElasticSearch, or anything with a RESTful API that accepts
plain old JSON.

>> 4.       {"source.ipv4": "192.168.1.1"}

This will confuse the hell out of everything in existence, unless
they're CEE-aware. If you requre CEE-awareness, and break JSON
compatibility, it will never be widely adopted. People are lazy, and
like to reuse components.

Any application that accepts JSON, and allows one to manipulate it via
dotted notation, will break. Extra care need to be taken in this case to
address the field as input["source.ipv4"], instead of the usual
input.source.ipv4 dotted-notation.

And no, input.source.ipv4 cannot refer to "source.ipv4", that breaks
expectations, and is not how JSON dotted notation works.

Summary
=======

Option two would immediately work with existing software, whether they
directly support CEE or no. None of the other options would be able to
achieve the same.

As someone who'd want to create applications that produce CEE output, my
primary concern would be that I need consumers that can deal with
it. There are tons that work with JSON, so if I can leverage that,
awesome. Option two gives me that, none of the others do.

As someone who wrote both consumers and producers, I prefer to represent
hierarchical structures in their native form. Option two grants me that
(so does 3, but then I'd have to support *two* formats - no thanks).

-- 
|8]



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fwd: Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names

Evan Rempel
I agree with Peter Czanik but what to add some more details.

Just because you have a great design does not mean that it can not be used incorrectly.
I do think that there is hierarchy and we should REQUIRE people
use structured markup to express it, the CEE specification of the naming dictionary can result
if a very difficult breakdown.

Using the source IP address again as an exmple

{ "source": {"ipv4": "192.168.1.1"} }

does not provide and easy was to use this in an address without already knowing that it was an ipv4 address.

I would much rather see;

{ "source": {"address": "192.168.1.1" "addressclass": "ipv4"} }

so that anything that uses theis stream can get the source address ithout needing to check if there is an ipv4 or an ipv6 or anoy other
address that might come up.

In the context of how this information is used, it isn't until the address needs to be manipulated that you need to know what kind of address it is.

When I am following up on a break and providing details, I never get asked for the IPv4 address, I just get asked for the IP address.
The same would be true for hardware address, it could be ethernet, atm, firbre channel, infiniband etc. They are just different classes of
hardware addresses.

For the voting, I choose number 2, with the caveat that some serious thought needs to go into the design of the hierarchy.


Evan Rempel.

________________________________________
From: Peter Czanik [[hidden email]]
Sent: Monday, August 06, 2012 3:27 AM
To: [hidden email]
Subject: [CEE-DISCUSSION-LIST] Fwd: Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names

Hello,
Gergely Nagy, a colleague asked me to forward his answer, as he is not on the list.
Bye,
CzP


-------- Original Message --------
Subject:        Re: [CEE-DISCUSSION-LIST] Voting period: flat vs. Hierarchical field names
Date:   Mon, 06 Aug 2012 12:23:22 +0200
From:   Gergely Nagy <[hidden email]><mailto:[hidden email]>
Organization:   BalaBit IT Security Ltd
To:     Peter Czanik <[hidden email]><mailto:[hidden email]>



Hi!

While I'm not subscribed to cee-discussion, I do read the list. I do not
believe my vote would be counted, but having implemented JSON parsing
for syslog-ng, and another (as of now, unreleased) CEE consumer, I'd
like to voice my opinion, from an implementor's point of view.

This very same topic has been discussed on the lumberjack list in the
past[1], by the way.

 [1]: https://lists.fedorahosted.org/pipermail/lumberjack-developers/2012-May/000774.html

I don't want to repeat the same arguments again, so I'll make it short:
dotted notation on the wire is stupid. It comes with very little gain
(speed, if and only if, the producer already uses a flat dotted-notation
structure, which I don't think will be the case for most future apps),
but a huge amount of drawbacks.

Dotted notation is great at the API level, but *NOT* on the wire.

With that out, with my implementor hat on, my vote would be:

>> 2.       There is a conceptual hierarchy and we should REQUIRE people
>> use structured markup to express it. So hierarchical names are
>> allowed, and MUST be represented using structured JSON or XML.

I believe logs has a conceptual hierarchy (a flat structure is better
than no structure at all, but if we can go the whole way, why stop
halfway?), and they should be represented in the most appropriate and
straightforward form: hierarchically.

If we need to transform the conceptual hierarchy, we're doing something
wrong, we're doing work that would not need to be done.

To make my point clearer, lets have a look at the examples:

>> As an example, imagine the source IPv4 address. You could conceptually
>> represent this as a hierarchy, with the parent object being "source"
>> and the field name being "ipv4". You'd have further fields under the
>> "source" object, like "port". You could also have "ipv4" nested under
>> some other parent object, like "destination". For each of the
>> proposals above, you'd have objects like:
>>
>> 1.       {"source_ipv4": "192.168.1.1"}

JSON consumers can understand this, but we'd need extra code to
recognise that this belongs together with the other source_* stuff. In
practice, this is nothing else but dotted notation without the dot.

This can also be forwarded as-is to anything that understands JSON,
without it having to know about CEE too, but it won't be able to group
things together.

>> 2.       {"source": {"ipv4": "192.168.1.1"}}

JSON consumers can understand this, and things that belong together, are
together already - no extra work needs to be performed.

Things work out of the box.

Similarly to the first, this can be forwarded as-is, and anything that
understands structures, will be able to group things together, without
having to have the faintest clue about CEE.

>> 3.       {"source": {"ipv4": "192.168.1.1"}} OR {"source.ipv4":
>> "192.168.1.1"}

JSON consumers can understand both, but will have to know about CEE, and
need to do extra work to normalize the input.

It can't be forwarded as-is, only when the receiving end is known to
understand CEE. You're not going to be able to easily push this to
CouchDB or ElasticSearch, or anything with a RESTful API that accepts
plain old JSON.

>> 4.       {"source.ipv4": "192.168.1.1"}

This will confuse the hell out of everything in existence, unless
they're CEE-aware. If you requre CEE-awareness, and break JSON
compatibility, it will never be widely adopted. People are lazy, and
like to reuse components.

Any application that accepts JSON, and allows one to manipulate it via
dotted notation, will break. Extra care need to be taken in this case to
address the field as input["source.ipv4"], instead of the usual
input.source.ipv4 dotted-notation.

And no, input.source.ipv4 cannot refer to "source.ipv4", that breaks
expectations, and is not how JSON dotted notation works.

Summary
=======

Option two would immediately work with existing software, whether they
directly support CEE or no. None of the other options would be able to
achieve the same.

As someone who'd want to create applications that produce CEE output, my
primary concern would be that I need consumers that can deal with
it. There are tons that work with JSON, so if I can leverage that,
awesome. Option two gives me that, none of the others do.

As someone who wrote both consumers and producers, I prefer to represent
hierarchical structures in their native form. Option two grants me that
(so does 3, but then I'd have to support *two* formats - no thanks).

--
|8]
Loading...