Enterprise Logging for the Uninitiated
Logging, that great after-thought.
A lot of developers don’t think too much past the problem at hand and may not give too much thought to what will happen in the live environment. If it works in the development environment then chances are it’ll be fine. Perhaps it’s ego that says our code is great and will never go wrong. But then it happens, there’s a bug in the live environment, you've got no logging and the data in the database is all you've got to work out what the problem was. From there it’s a combination of existing knowledge of the system, stepping through code and trial and error that will find the source of the problem. Good luck, you might be at it for some time!
If only we had a logging strategy in place.
At its most basic level logging is simply putting text into a file to say what happened. For a simple application where you can easily get at the logs this may be a great solution.
At an enterprise level, logging can provide much more than a simple text file, but also needs much more thought. Below are several things you should be taking into account.
Restricted information
You can log as much or as little information as you want. You can log every method call along with all of the parameter values if you wanted. So what happens if that includes plain text passwords, or bank details? Security is an important consideration here. When an issue arises, logs from a live system may get passed around with minimal protection. They could also be an easy target for an intruder in the system.
How much to log
With many logging frameworks you get the concept of trace levels. This is a predefined set of levels at which you can output log information. So you can have some log entries that are Critical, for example an OutOfMemoryException and you can have some entries that you might define as Verbose, for example entering and exiting a method. The levels are ordered by severity, for example:
- Critical
- Error
- Warning
- Information
- Verbose
In the configuration you can then define what trace level you wish to log, so logging Verbose information in a production environment wouldn't normally be good as you would slow things down, you might aim for Information or Warning. Whatever level you choose, the levels above will also be logged, so choose Warning and anything logged as Warning, Error or Critical will find its way into the logs.
An example of logging a message at the information level would look like (taken from Enterprise Library):
logWriter.Write(new LogEntry
{
Message = "logtest",
Severity = TraceEventType.Information
}
);
Internationalisation
Is logging always going to be in your language? Typically it is but it’s worth at least a fleeting thought.
How long to keep logs
If you’re doing lots of logging in a busy service then the logs can get very big very quickly. You need a strategy to deal with this. You might therefore have logs that are cleared after a week or even just a day. When a live issue is detected it’s important to have a process in place that ensures you get a copy of those logs before they get removed.
Usability
Can you find just entries for a specific user? Easily identify the source of a recurring error? You should ensure there is enough information in your logs to get what you want when investigating issues and also consider how to get what you want out of them.
Log file aggregation
If you've got multiple machines running a service and an API call from a user goes to machine 1, then their next call goes to machine 2, you may need a way of merging the two logs so you can see what the user did on the overall API. It may be that a manual process of merging files when it’s needed is enough or each machine may pass its logs at given intervals to an aggregation service. Alternatively, the destination for logs may always be a central location.
It’s also important to make it easy to aggregate your logs, if they’re all in different formats then gluing them together is going to get complicated.
Logging across internal services
Imagine we are using micro service architecture. A single request can spawn requests to multiple internal servers, so one call can require logging on multiple machines. It would be hugely useful if you could link logs from all of the internal requests back to the original request. This would require attaching an ID to the initial request and ensuring that was used in logging from all of the services.
Audit Logging
Audit logging is the process of logging important events in the system. This could be when resources are created, updated, or even when restricted information is viewed. The logs are likely to contain information of when the event happened and who or what instigated the event. Audit logs would typically be kept for the long term, potentially the life of the system.
Summary
Logging is important to get right and will save you many headaches when done properly, although there are many considerations when putting together your logging strategy. It can be just as useful while developing and debugging as it is in the live environment. In the live environment it has the potential to severely reduce the time it takes to understand the source of faults.
Logging should ideally be one of the first things you add to a new project of any real size. Adding it to an existing project is a time consuming but worthwhile task, it just tends not to be a task many people volunteer for!
Further reading:
Got anything to add? Leave a comment below…
Got a comment or correction (I’m not perfect) for this post? Please leave a comment below.
Subscribe to Gavin Johnson-Lynn
Get the latest posts delivered right to your inbox