Logging and Graphing
Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl
package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a "log sink" in the Context
being used, as ctx.log_sink
. A variety of log sinks are built-in to TimespanLogging; the NoOpLog
is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog
, which generates multiple independent log streams, one per "consumer" (details in the next section). The second and older sink is the LocalEventLog
, which is explained later in this document. Most users are recommended to use the MultiEventLog
since it's far more flexible and extensible, and is more performant in general.
MultiEventLog
The MultiEventLog
is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of "sub-log" streams internally, appending a single element to each of them when an event is generated. This element can be called a "sub-event" (to distinguish it from the higher-level "event" that Dagger creates), and is created by a "consumer". A consumer is a function or callable struct that, when called with the Event
object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd
consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:
ctx = Context()
ml = TimespanLogging.MultiEventLog()
# Add the BytesAllocd consumer to the log as `:bytes`
ml[:bytes] = Dagger.Events.BytesAllocd()
ctx.log_sink = ml
As we can see above, each consumer gets a unique name as a Symbol
that identifies it. Now that the log sink is attached with a consumer, we can execute some Dagger tasks, and then collect the sub-events generated by BytesAllocd
:
# Using the lazy API, for explanatory purposes
collect(ctx, delayed(+)(1, delayed(*)(3, 4))) # Allocates 8 bytes
log = TimspanLogging.get_logs!(ctx)[1] # Get the logs for worker 1
@show log[:bytes]
You'll then see that 8 bytes are allocated and then freed during the process of executing and completing those tasks.
Note that the MultiEventLog
can also be used perfectly well when using Dagger's eager API:
ctx = Dagger.Sch.eager_context()
ctx.log_sink = ml
a = Dagger.@spawn 3*4
Dagger.@spawn 1+a
There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events
and Dagger.Events
modules, respectively:
Missing docstring for TimespanLogging.Events.CoreMetrics
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.IDMetrics
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.TimelineMetrics
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.FullMetrics
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.CPULoadAverages
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.MemoryFree
. Check Documenter's build log for details.
Missing docstring for TimespanLogging.Events.EventSaturation
. Check Documenter's build log for details.
Dagger.Events.BytesAllocd
— TypeBytesAllocd
Tracks memory allocated for Chunk
s.
Dagger.Events.ProcessorSaturation
— TypeProcessorSaturation
Tracks the compute saturation (running tasks) per-processor.
Dagger.Events.WorkerSaturation
— TypeWorkerSaturation
Tracks the compute saturation (running tasks).
The MultiEventLog
also has a mechanism to call a set of functions, called "aggregators", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}
. The only one currently shipped with TimespanLogging directly is the LogWindow
, and DaggerWebDash.jl has the TableStorage
which integrates with it:
Missing docstring for Timespan.Events.LogWindow
. Check Documenter's build log for details.
Missing docstring for DaggerWebDash.TableStorage
. Check Documenter's build log for details.
LocalEventLog
The LocalEventLog
is generally only useful when you want combined events (event start and finish combined as a single unit), and only care about a few simple built-in generated events. Let's attach one to our context:
ctx = Context()
log = TimespanLogging.LocalEventLog()
ctx.log_sink = log
Now anytime ctx
is used as the context for a scheduler, the scheduler will log events into log
.
Once sufficient data has been accumulated into a LocalEventLog
, it can be gathered to a single host via TimespanLogging.get_logs!(log)
. The result is a Vector
of TimespanLogging.Timespan
objects, which describe some metadata about an operation that occured and the scheduler logged. These events may be introspected directly, or may also be rendered to a DOT-format string:
logs = TimespanLogging.get_logs!(log)
str = Dagger.show_plan(logs)
Dagger.show_plan
can also be called as Dagger.show_plan(io::IO, logs)
to write the graph to a file or other IO
object. The string generated by this function may be passed to an external tool like Graphviz
for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunk
s); you can call Dagger.show_plan(logs, thunk)
, where thunk
is the output Thunk
of the DAG, to render argument nodes.
TimespanLogging.get_logs!
clears out the event logs, so that old events don't mix with new ones from future DAGs.
As a convenience, it's possible to set ctx.log_file
to the path to an output file, and then calls to compute(ctx, ...)
/collect(ctx, ...)
will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs!
and show_plan
: DAGs which aren't Thunk
s (such as operations on the Dagger.DArray
) will be properly rendered with input arguments (which normally aren't rendered because a Thunk
is dynamically generated from such operations by Dagger before scheduling).
FilterLog
The FilterLog
exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.