Usage examples

The following examples will attempt to illustrate the basic functionality of the package and how it can be employed to speed up computationally demanding processing pipelines. Although toy problems are being used, it should be straightforward to apply the concepts illustrated below to real-word applications. More subtle properties of the caching mechanism are exemplified in the unit tests of the package.

Basics

Let us begin by defining a simple computational task graph with three nodes

julia> using Dispatcher, DispatcherCache

julia> # Some functions
       foo(x) = begin sleep(3); x end;

julia> bar(x) = begin sleep(3); x+1 end;

julia> baz(x,y) = begin sleep(2); x-y end;

julia> op1 = @op foo(1);

julia> op2 = @op bar(2);

julia> op3 = @op baz(op1, op2);

julia> G = DispatchGraph(op3)
DispatchGraph({3, 2} directed simple Int64 graph,NodeSet(DispatchNode[Op(DeferredFuture at (1,1,3),Main.ex-index.baz,"Main.ex-index.baz"),Op(DeferredFuture at (1,1,1),Main.ex-index.foo,"Main.ex-index.foo"),Op(DeferredFuture at (1,1,2),Main.ex-index.bar,"Main.ex-index.bar")]))

Once the dispatch graph G is defined, one can calculate the result for any of the nodes contained in it. For example, for the top or leaf node op3,

julia> extract(r) = fetch(r[1].result.value);  # gets directly the result value

julia> result = run!(AsyncExecutor(), G);  # automatically runs op3

julia> println("result (normal run) = $(extract(result))")
result (normal run) = -2

Using the DispatcherCache run! method caches all intermediary node outputs to a specified directory

julia> cachedir = mktempdir()  # cache temporary directory
"/tmp/tmpFUGI5j"

julia> @time result = run!(AsyncExecutor(), G, [op3], cachedir=cachedir);
  9.792003 seconds (10.03 M allocations: 503.198 MiB, 2.94% gc time)

julia> println("result (caching run) = $(extract(result))")
result (caching run) = -2

Note

The run! method with caching support needs explicit specification of the output nodes (the Dispatcher one executes directly the leaf nodes of the graph). Through this, one may choose to cache only a subgraph of the full dispatch graph.

After the first cached run, one can verify that the cache related files exist on disk

julia> readdir(cachedir)
2-element Array{String,1}:
 "cache"
 "hashchain.json"

julia> readdir(joinpath(cachedir, "cache"))
3-element Array{String,1}:
 "86141b1a6a4dd4ab.bin"
 "8b7bfeeac5ee1b8d.bin"
 "d12ba889c23ef4c7.bin"

Running the computation a second time will result in loading the last - cached - result, operation noticeable through the fact that the time needed decreased.

julia> @time result = run!(AsyncExecutor(), G, [op3], cachedir=cachedir);
  1.218582 seconds (2.49 M allocations: 124.720 MiB, 4.44% gc time)

julia> println("result (cached run) = $(extract(result))")
result (cached run) = -2

The cache can be cleaned up by simply removing the cache directory.

julia> rm(cachedir, recursive=true, force=true)

If the cache does not exist anymore, a new call of run!(::Executor, G, [op3], cachedir=cachedir) will re-create the cache by running each node.

Note

In the examples above, the functions foo, bar and baz use the sleep function to simulate longer running computations. This is useful to both illustrate the concept presented and to overcome the pre-compilation overhead that occurs then calling the run! method.