1,3 and 4 are fairly well understood with the only requirement of storage from an engineering perspective is that your storage backend be able to consume events at least as fast as they are produced.
However the usefulness of your logging system will be highly dependent on the how you make the information available for search, how fast you can search and whether your solution is scalable.
My preference is to go with a high-speed, free text / tag-oriented search system built on something like Lucene. Those systems build reverse indexes on most every field in your logs and your search results are returned via a scoring system. What's more system built on lucene-like technology (such as ElasticSearch) are not encumbered by some of the more significant limitations of SQL storage engines. Vast amounts of log information can be consumed and indexed by sharding with ElasticSearch. Commercial systems like Loggly depend on these kinds of technologies.
I don't know if this is the question you're asking, but that's my two cents. In any case I believe the security, reliability and performance needs of many institutions are fairly diverse so there's probably not a lot of value in specifying the log implementation -- just the way that log systems communicate.
BTW: Structured log standards that specify taxonomy, dictionary and serialization (like CEE) make the events consumed by Lucene-like systems much more useful.
Application Security Engineer
From: Heinbockel, Bill [mailto:[hidden email]]
Sent: Thursday, December 15, 2011 7:48 AM
To: [hidden email] Subject: [CEE-DISCUSSION-LIST] Secure Log Storage
Recently, there have been several discussions concerning secure log storage
I would say that "secure log storage" systems have been available on the
market for years now and I'm not sure that creating a common, secure
logging format with is the right choice to make.
My understanding is that log storage is an implementation detail
compared to structuring information in log messages.
It doesn't matter if a message is stored in Hadoop cluster, SQL, MongoDB
or some kind of encrypted log file format, as long as the information
there can be extracted, searched and analyzed. From my experience,
people use different technologies for this and I'm not really sure if
either of those technologies are superior to the others or not.
For some, text based searches are important, others are satisfied with a
rigid structure. One has 1000 msg/sec, others 100000 msg/sec. It is a
matter of tradeoffs.
I'm not sure CEE wants to delve into this territory, I feel it much more
important to finally give some structure to messages. That allows a lot
more processing than available today, regardless of how messages are
stored at the end.
Returning to the main point of the question itself, certainly, I do feel
that secure log storage & access control is important, that being the
driving vision behind BalaBit's syslog-ng Store Box products and a
number of features in syslog-ng Premium Edition.
Our solution to store logs in a secure manner is proprietary but the
format itself is not secret as we've published that to a number of
customers. We would even be willing to publish the format in the future.
I think our implementation is a good approach (though I'm biased :) for
secure log storage. It is a format that:
- is stored in a simple file (one file per logstore), semantically
equivalent to a "text" logfile
- has an sequence ID for each log record within the file and makes
it possible to "seek" to any given ID, with O(log(N)) time.
- has a globally unique id for each log record to make it possible
to use data in a distributed environment
- it supports an indexer that implements free-text searches on the log
- separates the "framing" from the "content", e.g. makes it possible
to store plain text log record but also a binary representation that
keeps message structure intact (in our terming, "serialized" format)
- it supports compression
- it optionally encrypts the log file with a random key that gets
encrypted with an X.509 certificate. (more specifically its public key)
- it uses chained HMAC (in encrypted mode) or chained hashing (in
non-encrypted mode) for integrity
- it supports requesting a timestamp from a standard timestamp
authority (TSA) from time-to-time that authenticates file contents
- all algorithms are configurable, so it can meet FIPS or other crypto
- it supports journaling that keeps the format in sane order even in
the presence of software errors (crashes).
Of course other storage systems are possible depending on the
desired security attributes, like mongodb.