Logging: Advanced

Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, through two mechanisms.

The first is Dagger.enable_logging!, which provides an easy-to-use interface to both enable and configure logging. The defaults are usually sufficient for most users, but can be tweaked with keyword arguments.

The second is done by setting a "log sink" in the Dagger Context being used, as ctx.log_sink. These log sinks drive how Dagger's logging behaves, and are configurable by the user, without the need to tweak any of Dagger's internal code.

A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per "consumer" (details in the next section). This is the log sink that enable_logging! uses, as it's easily the most flexible. The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog (ideally via enable_logging!) since it's far more flexible and extensible, and is more performant in general.

MultiEventLog

The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of "sub-log" streams internally, appending a single element to each of them when an event is generated. This element can be called a "sub-event" (to distinguish it from the higher-level "event" that Dagger creates), and is created by a "consumer". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:

ctx = Dagger.Sch.eager_context()
ml = TimespanLogging.MultiEventLog()

# Add the BytesAllocd consumer to the log as `:bytes`
ml[:bytes] = Dagger.Events.BytesAllocd()

ctx.log_sink = ml

As we can see above, each consumer gets a unique name as a Symbol that identifies it. Now that the log sink is attached with a consumer, we can execute some Dagger tasks, and then collect the sub-events generated by BytesAllocd:

# Allocates memory for their results
t1 = Dagger.@spawn 3*4
fetch(Dagger.@spawn 1+t1)
log = Dagger.fetch_logs!(ctx)[1] # Get the logs for worker 1
@show log[:bytes]

Note

TimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.

You'll then see that some number of bytes are allocated and then freed during the process of executing and completing those tasks.

There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events and Dagger.Events modules, respectively; see Dagger Types and TimespanLogging Types for details.

The MultiEventLog also has a mechanism to call a set of functions, called "aggregators", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}. The only one currently shipped with TimespanLogging directly is the LogWindow, and DaggerWebDash.jl has the TableStorage which integrates with it; see DaggerWebDash Types for details.