Saturday, June 6, 2020

Expanding the Stock Severity Levels

Original Severity Levels

The original Pantheios C/C++ library defined eight severity levels (and their integral values and string forms), as follows:


Manifest constant Integral value String form (1)
PANTHEIOS_SEV_EMERGENCY 0 "Emergency"
PANTHEIOS_SEV_ALERT 1 "Alert"
PANTHEIOS_SEV_CRITICAL 2 "Critical"
PANTHEIOS_SEV_ERROR 3 "Error"
PANTHEIOS_SEV_WARNING 4 "Warning"
PANTHEIOS_SEV_NOTICE 5 "Notice"
PANTHEIOS_SEV_INFORMATIONAL 6 "Informational"
PANTHEIOS_SEV_DEBUG 7 "Debug"

1. as obtained by pantheios_getStockSeverityString()

For anyone familiar with Syslog, these level names and values will be very familiar, as they exactly match:


Manifest constantIntegral value
Emergency0
Alert1
Critical2
Error3
Warning4
Notice5
Informational   6
Debug7

Experiences and Evolution

However, over the decade and a half of use of Pantheios (and its other language bindings - .NET, Ruby, etc.) experience and further theoretical work has raised a number of issues, as discussed throughout the rest of this post.

"Error" -> "Failure"

As discussed in a 2013 instalment of Quality Matters in ACCU's Overload, the term error is fraught with ambiguity: almost the whole industry uses the term incorrectly to mean a failure - "throw an error" - whereas an error is something that a human does in designing or building a system.

So, we've started to use the term failure instead. Newer Pantheios projects, such as Pantheios.Ruby, already make use of this term, and the extant projects will evolve in that direction over the coming months.

"Emergency" -> "Violation"

Just as the term error is ambiguous so is the term bug. The issues around both terms are dealt with more formally in Contract Programming / Design by Contract, which is beyond the scope of this post, but you can read up on it in Meyer's Object-Oriented Software Construction or Wilson's Imperfect C++.

Contract Programming makes a formal distinction between a failure of a system and a violation of a software contract. When a contract violation occurs the program has behaved in contradiction to its design so by definition it cannot "correct". The only sensible course of action in this case is to (attempt to) terminate the process, yielding control to the operating environment in the hope that the process <-> operating system abstraction can handle the situation.

Over the lifetime of Pantheios we have taken to using the Emergency severity level exclusively for contract violations, and are starting to use the name Violation instead. Again, newer projects already have this, and we're evolving others.

Later in this post we will discuss the meanings we ascribe to the severity levels.

More debug levels

Extensive use in recent years of the Pantheios C/C++ library into sophisticated commercial C++ codebases with precise runtime controls over enabling/disabling severity levels has taught us that having a single debug level is insufficient. For example, when handling network traffic it is useful to be able to control the logging of the number of bytes received on a channel separate to the logging of all of those bytes.

As such, we've tended to have a number of debugging levels: Debug0Debug1, and so on, where each level involves deeper call levels and/or more extensive logged information. In the commercial projects we've provided custom severities with such additional levels, but plan to evolve the main Pantheios project along these lines soon.

"Trace"

There is one (new) particular debugging level that stands out from the others: Trace. This is useful for tracing function/method entry (including arguments received) and exit. Because tracing can lead to heavy amounts of logging, we have found it useful to give it a separate debugging level.

Again, Pantheios.Ruby has this level already, and other projects will be getting it soon. The current tracing facilities provided by the Pantheios C/C++ library defined in pantheios/trace.h  will be adjusted to use the new severity level.

"Benchmark"

Finally, there is one (new) level that has emerged, for the exclusive purpose of benchmarking.

Pantheios.X 202x levels

Hence, what we might call the 202x levels for all Pantheios projects has emerged, as follows:

Level name String form Former name Integral value
Violation "Violation" Emergency 1
Alert "Alert" Alert 2
Critical "Critical" Critical 3
Failure "Failure" Error 4
Warning "Warning" Warning 5
Notice "Notice" Notice 6
Informational "Informational" Informational 7
Debug0 "Debug-0" Debug 8
Debug1 "Debug-1" - 9
Debug2 "Debug-2" - 10
Debug3 "Debug-3" - 11
Debug4 "Debug-4" - 12
Debug5 "Debug-5" - 13
Trace "Trace" - 14
Benchmark "Benchmark" - 15


These levels already exist within Pantheios.Ruby and will be included in the next version of the Pantheios C/C++ library (albeit with a compile-time option to use the prior levels) and in the next release of Pantheios.NET. (Each of these is likely to be within a month or two of this post.)

Value changes

As well as the addition of new levels, and the renaming of two extant levels, you may also note that the values of the levels corresponding to the original eight have changed. Specifically, they are no longer 0-based, but are now 1-based. The reasons for this are:

  • to help catch programming errors in some languages where a default-initialised level variable/parameter would be interpreted as Violation(/Emergency); and
  • to allow sophisticated front-ends to have an "all levels" value to facilitate group control. (We'll discuss this more fully in a future post.)
Note that all levels still fall within 4-bits, so the extended severity level information mechanism is still supported by the the Pantheios C/C++ library. (See this post about extended severity information. Also, the next release will include C and C++ examples of how to use extended severity information.)

Ascribing meaning to levels

A previous post on this blog discusses the meanings we tend to ascribe to the 8 Syslog-derived levels. Naturally, if we're expanding the stock levels we need to take care to update this picture. In summary, we can say that we interpret them as follows, and offer it as a guide to you:


Level Meaning
Violation Used exclusively in the attempt to record a contract violation
Alert Used to report a practically-unrecoverable runtime condition, i.e. one that will require imminent process termination but is not a fault
Critical Serious failure to achieve normative behaviour
Error Failure to achieve normative behaviour
Warning Warning condition
Notice Information to be logged in normal operation, meaning when there has been no need to increase the logging information above a normal condition
Informational Logging information that is useful when actively monitoring the health of a system, but that is not necessarily displayed as part of "normal" operation 
Debug-0 Highest conceptual level and/or most terse form of debugging logging
Debug-1 Next lower conceptual level and more verbose than Debug-0
Debug-2 Next lower conceptual level and more verbose than Debug-1
Debug-3 Next lower conceptual level and more verbose than Debug-2
Debug-4 Next lower conceptual level and more verbose than Debug-3
Debug-5 Lowest conceptual level and most verbose form of debugging logging
Trace Used exclusively to record function/method entry/exit (along with, optionally, function arguments)
Benchmark Used exclusively to emit terse statements - usually only a literal string, to cause least impact to system performance - for the express purpose of tracking system performance

No comments:

Post a Comment