Usage examples
The following examples will attempt to illustrate the basic functionality of the package and how it can be employed to speed up computationally demanding processing pipelines. Although toy problems are being used, it should be straightforward to apply the concepts illustrated below to real-word applications. More subtle properties of the caching mechanism are exemplified in the unit tests of the package.
Basics
Let us begin by defining a simple computational task graph with three nodes
julia> using Dispatcher, DispatcherCache
julia> # Some functions
foo(x) = begin sleep(3); x end;
julia> bar(x) = begin sleep(3); x+1 end;
julia> baz(x,y) = begin sleep(2); x-y end;
julia> op1 = @op foo(1);
julia> op2 = @op bar(2);
julia> op3 = @op baz(op1, op2);
julia> G = DispatchGraph(op3)
DispatchGraph({3, 2} directed simple Int64 graph,NodeSet(DispatchNode[Op(DeferredFuture at (1,1,3),Main.ex-index.baz,"Main.ex-index.baz"),Op(DeferredFuture at (1,1,1),Main.ex-index.foo,"Main.ex-index.foo"),Op(DeferredFuture at (1,1,2),Main.ex-index.bar,"Main.ex-index.bar")]))
Once the dispatch graph G
is defined, one can calculate the result for any of the nodes contained in it. For example, for the top or leaf node op3
,
julia> extract(r) = fetch(r[1].result.value); # gets directly the result value
julia> result = run!(AsyncExecutor(), G); # automatically runs op3
julia> println("result (normal run) = $(extract(result))")
result (normal run) = -2
Using the DispatcherCache
run!
method caches all intermediary node outputs to a specified directory
julia> cachedir = mktempdir() # cache temporary directory
"/tmp/tmpFUGI5j"
julia> @time result = run!(AsyncExecutor(), G, [op3], cachedir=cachedir);
9.792003 seconds (10.03 M allocations: 503.198 MiB, 2.94% gc time)
julia> println("result (caching run) = $(extract(result))")
result (caching run) = -2
The run!
method with caching support needs explicit specification of the output nodes (the Dispatcher
one executes directly the leaf nodes of the graph). Through this, one may choose to cache only a subgraph of the full dispatch graph.
After the first cached run, one can verify that the cache related files exist on disk
julia> readdir(cachedir)
2-element Array{String,1}:
"cache"
"hashchain.json"
julia> readdir(joinpath(cachedir, "cache"))
3-element Array{String,1}:
"86141b1a6a4dd4ab.bin"
"8b7bfeeac5ee1b8d.bin"
"d12ba889c23ef4c7.bin"
Running the computation a second time will result in loading the last - cached - result, operation noticeable through the fact that the time needed decreased.
julia> @time result = run!(AsyncExecutor(), G, [op3], cachedir=cachedir);
1.218582 seconds (2.49 M allocations: 124.720 MiB, 4.44% gc time)
julia> println("result (cached run) = $(extract(result))")
result (cached run) = -2
The cache can be cleaned up by simply removing the cache directory.
julia> rm(cachedir, recursive=true, force=true)
If the cache does not exist anymore, a new call of run!(::Executor, G, [op3], cachedir=cachedir)
will re-create the cache by running each node.
In the examples above, the functions foo
, bar
and baz
use the sleep
function to simulate longer running computations. This is useful to both illustrate the concept presented and to overcome the pre-compilation overhead that occurs then calling the run!
method.