
WARDEN Explained
Javier Sanchez Alvarez
June 26, 2025
Source:vignettes/articles/warden_explained.Rmd
warden_explained.Rmd
Introduction
This document explains the logic behind WARDEN main approach to simulate discrete event simulations, as well as explaining briefly the rationale for certain design decisions.
In a Nutshell
WARDEN main simulation engine at its core is nothing but a nested
loop at different levels. However, for this to work we need to delay the
execution of the inputs provided by the user, so the relevant inputs
provided through add_tte
,
add_item
/add_item2
and
add_reactevt
are substituted for delayed execution and
stored as lists.
-
Per Analysis (DSA, scenarios) “sens”
- Load inputs sequentially. If it’s an unnamed list, unlist it and assign to the “sens” input list.
-
Per Simulation (PSA or deterministic) “simulation”
- Load inputs sequentially. If it’s an unnamed list, unlist it and store its components. The “sens” list is integrated into a new list with the “simulation” input list.
-
Per Patient “i”
- Load inputs sequentially. If it’s an unnamed list, unlist it and store its components. The “simulation” list is integrated into a new list with the “i” input list.
-
Per Arm “arm”
- Load inputs sequentially. If it’s an unnamed list, unlist it and store its components. The “i” list is integrated into a new list with the “arm” input list.
- Load initial time to events. First look into the initial time to
event expression declared by user; if not found, look into the input
list already declared; if not found, set equal to
Inf
. -
While
curtime
(simulation time) is <Inf
- Select the next event by checking the event with minimum time to
event; in case of ties, untie using the order declared in
add_tte
for initial time to events. If there are no events left, setcurtime = Inf
(end simulation) - Evaluate the reaction of the event by looking at the relevant expression from the list of event reactions
- Select the next event by checking the event with minimum time to
event; in case of ties, untie using the order declared in
- Once the specific “simulation” is done, compute outputs vectorized (discount outcomes as relevant based on their type, aggregate data as relevant, obtain timed frequency outputs if needed, etc.)
The debug mode will store in a log the relevant data that is loaded or changed by the event reactions, and will be exported when the simulation stops (also on error). WARDEN allows to continue on error (though not recommended)
WARDEN handles the random numbers automatically, setting the seeds
differently at the simulation, patient and arm level. WARDEN makes sure
that the starting seed is cloned for a patient across interventions.
However, it could be that conditional statements can alter the random
state of R if they conditional trigger random expressions (e.g.,
if(arm==2){runif(1)}else{5}
) that change per intervention.
To keep the random number cloned as intended, it’s recommended to
pre-draw random numbers for each type of random object used and use
those (see the
Example for a Sick-Sicker-Dead model - Random Number Streams & Luck Adjustment
vignette for more information). WARDEN uses L’Ecuyer-CMRG random number
generator.
The way WARDEN processes events through e.g.,
modify_item
is by looking at what inputs are currently
available from the relevant input list, evaluating the passed list
(e.g., modify_item_seq(list(a = 1, b = a + 1))
) and adding
the resulting objects (a
and b
) to the parent
environment as well as to the relevant input list for storage. With the
latest update of WARDEN, objects not included in
modify_item
will be stored in the relevant input list and
they will be available for evaluation, e.g.,
add_reactevt(name_evt = "event_1", input = {
z <- 2 #this will be stored in the main input list, and will be available in the next event reaction!
modify_item_seq(list(a = 1, b = z + 1, z = z)) #this will work, because z is available
c <- b + 5
}
#The above expression is equivalent to writing:
add_reactevt(name_evt = "event_1", input = {
z <- 2
a <- 1
b <- z + 1
modify_item(list(a = a, b = b, z = z))
c <- b + 5 #b will be available
}
#As well as
add_reactevt(name_evt = "event_1", input = {
modify_item_seq(list(a = 1, z = 2, b = z + 1)) #modify_item_seq evaluates sequentially, so the value of z will be available for b
c <- b + 5 #b will be available
}
#And
add_reactevt(name_evt = "event_1", input = {
a <- 1
z <- 2
b <- z + 1
c <- b + 5 #b will be available
}
#But note that this will not work!
add_reactevt(name_evt = "event_1", input = {
modify_item(list(a = 1, z = 2, b = z + 1)) #modify_item is faster but does not evaluate sequentially (all at once), so the value of z will NOT be available for b
}
Storing Inputs, Making it Faster
Multiple ways of storing inputs and processing events can be thought of. A few of these could be 1) data.frames, 2) lists, 3) environments, or 4) utilize a C++ implementation (among others). WARDEN uses lists to store inputs and to process events.
Data.frames can be slow and memory-intense to manipulate, so they were avoided for this purpose.
Lists and environments can behave quite similar, with environments
being modified by reference, which can speed things up, and it could
give more freedom for the user to declare how to set their event
reactions instead of requiring them to declare modify_item
.
However, using environments and more “natural” R expressions (e.g.,
a <- 1; b <- a + 1
instead of
modify_item_seq(list(a = 1, b = a + 1))
) made the debugging
mode harder to handle. Some internal experiments showed a 20% to 40%
speed increase by switching from lists to environments in the normal
mode. [Changed with WARDEN 1.0] With the most recent update, now the
user does not need to use modify_item
or
modify_item_seq
, as the transition to environment has been
performed. The limitation with the debugging mode has been handled by
extract the abstract syntax tree of the event reactions and looking for
any type of assignments. A limitation of this is that “dynamic”
assignments (e.g., assign(paste0("x_",i), 5)
where
i
is created by a loop) are NOT captured by the debugging
engine, and therefore will be excluded from the debugging log file. So
the user should try to assign variables explicitly whenever possible,
e.g., x_1 <- 5
.
A C++ implementation was avoided as the purpose of WARDEN is to be user-friendly and to give the user as much as freedom as possible on how to define their inputs. While likely much faster, implementation in C++ would require very careful handling of every user input, likely forcing us to restrict it to very specific data types and functions.
Parallel engine approach
Furthermore, a parallel core/thread implementation is also available at the simulation level, i.e., it will perform the “simulation” loop in parallel. The reason to select the simulation and not the patient is that each patient normally takes a small amount of time to run, and the simulation level offers the right balance in terms of time to run.
However, the user should expect it to be only slightly more efficient
(perhaps 20-40% speed increase for medium to large simulations), as
opposed to radically faster. Two factors will be important: the number
of simulations to be run (n_sim
), and the size of each
simulation (given by the number of events and the number of patients and
arms). If n_sim
is small, it may not be worth it to use a
parallel approach as there is a time loss to set up the different
cores/threads (normally 2 to 5 seconds), so if each simulations runs
fast because they are simple (a couple or seconds or so) it may not be
worth it. Even if n_sim
is large and each simulation is
complex, the efficiency gain may be ~20-40%, even if using >5 cores.
The reason is that RAM use increases fast as R creates new sessions with
duplicated data (it’s not shared among the cores/threads), and a medium
to large simulation can easily become >2 GB of RAM use per
simulation, so systems with large processing power AND large RAM (e.g.,
32 or 64GB) will benefit the most from this approach.
The parallel implementation also has limitations in terms of exporting logs if there is an error in a simulation (due to the parallel set-up), so this approach is recommended when the user is quite confident that the simulation will run without issues.