Logberry: Structured, Semantic Logging for Go

Monday, June 22, 2015

One of our recent primary contracts involved delivering software components to be compiled into a larger package from another team, hooked up to apps from a third team, and then deployed and field tested by a fourth team, all coming from separate corporations. Suffice to say, considerable time was spent after each field test run just trying to decipher what had broken or not performed optimally. To make that easier in future projects, we’ve developed and are now releasing as open source our library Logberry, providing a lightweight API for structured, semantic logging. The code currently supports Go programs, but the concepts are generally applicable.

Chaos

Logging is critical to complex diagnostic efforts. By the end of that contract of ours, most components were outputting copious amounts of information, and each team had tools to rip logs and reconstruct entire runs. Extensive metadata was also included, e.g., clearly identifying the specific build configuration of the binaries.

However, all of that logging followed the traditional approach of not having any formal scheme or structure. Specific event messages were largely added and extended in an ad hoc fashion as issues came up, and all employed idiosyncratic formatting. A fairly typical message in our components looked like the following, reporting from a query graph traversal:

    VERBOSE("  Collection %s [%d] [%s] hash " KB_HASH "\n",
            mc_kb_table_labels[dest->type],
            subgraphs,
            tj_buffer_getAsString(buff),
            dest->hash);

Pumped out to logcat, that would produce a message like the following:

V/RaptorKB(27353):   Collection group [1] [id, cne, cnety] hash d3d6c0eee55e3d65

There is actually quite a bit of useful information in there, but it’s a mess, just an ad hoc string of arbitrary presentation and data. Unfortunately, while specific logging interfaces and syntaxes might look more or less elegant in various languages and libraries, this basic paradigm of creating an arbitrary string message to be logged through a flat interface is common to most frameworks. It engenders a number of potential problems:

A strawberry, the Logberry logo.

Logberry

That traditional approach is ultimately descendant from a philosophy that logging is something to be done during debugging, an unfortunately necessary side activity, ideally to be compiled out once the program works perfectly. Logberry takes a more intentional approach to logging. Your program is always going to generate logs, and they’re critical to its use and maintenance. It is therefore worth treating logging more like a proper program activity, and in return improving its utility and effectiveness.

Data

At the lowest level, Logberry provides an easy interface to output structured data. A very simple log event might be generated by code such as:

    // Create some structured application data and log it
    var data = struct {
        DataLabel string
        DataInt    int
    }{"alpha", 9}

    logberry.Main.Info("Reporting some data", data)

From this Logberry’s default text output produces:

2015-06-21T16:51:10-04:00 small main         Reporting some data                            info  0:0  DataLabel="alpha" DataInt=9

The package’s included JSON output for this takes the form:

{"TaskID":0,"ParentID":0,"Component":"main","Event":"info","Message":"Reporting some data","Data":{"DataInt":9,"DataLabel":"alpha"},"Timestamp":"2015-06-21T16:54:05.030526771-04:00"}

In this example an existing struct has been thrown at the log, the public fields of which have been captured and published in structured form, ready for easy extraction. Alternatively, the given data could as easily be a map of key/value pairs (for which a built-in convenience type is included), a data array, or a simple literal. The values present in complex objects could in turn have their own structure, recursively captured by the logger.

Tasks

Extending this core mechanism, Logberry events are generated within task contexts. Stepping back momentarily, a traditional idiom in executing some function that might fail is to:

This has a number of problems. In a multi-threaded program it may not be clear if the task ever completed successfully. The error message may not be clearly linked to the initial notification, and in general both may not be unambiguously contextualized in the overall program thread. Having two separate log calls and consequent duplicative marshaling code can also easily result in outputting different data for each, making it easy to leave out important information.

Logberry is built around a different idiom:

A typical example:

    // Execute a task within that component, which may fail
    task := computerlog.Task("Compute numbers", &data)
    res, err := somecomputation()
    if err != nil {
      task.Error(err)
      return
    }
    task.Success(logberry.D{"Result": res})

Assuming success, this reports:

2015-06-21T16:54:52-04:00 small computer     Compute numbers success                     success  2:1  Result=7 DataLabel="alpha" DataInt=9

On failure it reports:

2015-06-21T17:35:15-04:00 small computer     Compute numbers failed                        error  2:1  Error={ Message="Could not compute" } DataLabel="alpha" DataInt=9

Note that in either success or failure, termination of the task has been logged. For long running or multi-step tasks the start is easily logged as well by chaining task.Begin() on its construction, which will also automatically mark the task to be timed and its duration reported. Either outcome reports all data identifying the task, without duplicative marshaling. Data specific to the outcome has also been added, i.e., the result or error, and these have been easily aggregated into the overall data reported with the event. In doing so, the code sequence intrinsically encourages a useful mental process of identifying all of the data going into the task, to feed it into the task construction, and then identifying relevant result data, to pass it to the termination functions.

A simple tag has also been ascribed to the events denoting their class, i.e., success and error. Logberry’s API generates a number of events in common classes capturing basic task lifecycle and other events, but programs may also generate events of their own classes, e.g.:

    computerlog.Event("request", "Received request", logberry.D{"User": "tjkopena"})

Which produces:

2015-06-21T18:14:35-04:00 small computer     Received request                            request  1:0  User="tjkopena"

Further, the context of each event has been recorded. Included in each log statement are program-wide unique identifiers for the task which produced the event as well as its parent task, denoted by the task:parent ID pairs. By instantiating a hierarchy of task objects, programs can easily capture a lightweight representation of the program’s structure over its components, functions, and individual calls. This is then readily reconstructed from the log even among interleaved events in a multi-threaded program.

Conclusion

Through these core mechanisms, Logberry provides an easy to use interface that results in well structured, semantically annotated log data. Combined with structured serialization, e.g., JSON output, those logs may be readily parsed and analyzed with simple software. In addition, Logberry also includes some basic tools for colorized output, an easy interface to attach custom output mechanisms or formats, and useful supporting utilities to capture build environment signatures, e.g., by scanning Git repositories.

The basic concepts and these features are discussed at length in documentation in the Github repository, and full API documentation is available on GoDoc. Please feel free to post issues or pull requests to the Github repository , or to raise any questions in the comments below. Thanks!

A strawberry, the Logberry logo.
Output from Logberry small example program.