I like to balance my time between thinking about how to deal with the problems of the present and what we can do better in the future. This is of particular interest when it comes to the problem (or curse) of log files.
My view on this subject is pretty simple. See the image to the left …
I was thinking about how best to express what I think about log files. At first, a picture of a dinosaur came to mind, but that didn’t have quite the right technological twist. Then I remembered my Dad working on some old vacuum tube television sets. That worked well for me, but I figured there aren’t too many people around who know what a vacuum tube is.
It is a pretty good analogy though. They even made computers out of vacuum tubes, like the old ENIAC on the right (vintage 1946).
I think by now you can see where I’m going with this.
We build highly sophisticated business applications that generate billions of dollars of revenue. Entire corporate operations are completely dependent on these apps, yet we still use vacuum tubes (log files) to capture the current health state of the systems.
This has to change … it’s simple economics. A typical scenario involves hundreds of gigabytes of log file output scattered across dozens of servers. Anyone who has used a tool like Splunk knows how expensive it is to provide the horsepower to search for and parse out those few lines of information that really matter.
Currently, we are in the middle of a great revolution in computing … the shift to virtualization. As part of this, many organizations are revisiting how their applications are being monitored. They are attempting to not only upgrade their systems to new virtualized data centers, but also to rework the applications so that the health state can be monitored in a more modern fashion.
Writing to a log file is an extremely inefficient means of communicating information. Unfortunately, it is easy to do. App developers are often under tremendous pressure to get sophisticated application algorithms working and have little time to spare looking for ways to efficiently transmit monitoring information, so they take shortcuts … like writing out unstructured lines of text to a log file. They take a few important metrics, like the number of currently active users, and write it out as a line of text with numbers formatted as strings. The monitoring process must open that log file in a “live tail” mode and continually read the lines and parse the strings of numbers back into internal binary format. Bad idea … billions of CPU cycles wasted.
Fortunately, many technologies exist for transmitting monitoring information directly from one process to another. For example, JMX (Java Management Extension) has been around for awhile, but a lot of people just don’t know how easy it is to use. I gave a presentation last year and published the pdf on our website that shows how simple it is to use JMX to make monitoring data directly available to other applications. (View it at: JMX: Get The Most Out Of This Unsung Hero). I also suggested alternatives that use simple data structures like JMX, but they all have the benefit of communicating information directly instead of text that must be parsed.
The goal of this, of course, is to move into a modern era, where efficiency and performance are important criteria in your design. In a world where Information Technology is often a competitive advantage, you want to reduce waste and streamline operations wherever possible, and in the process, reduce your overall cost of operation. Dinosaurs, vacuum tubes, log files … they are all pretty much the same to me.