Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages

Go Home

Data & Uncertainty in System Dynamics

Data & Uncertainty in System Dynamics

by | Oct 28, 2022

Jay Forrester cautioned that “fitting curves to past system data can be misleading”. Certainly, that can be true, if the model is deficient. But we can have our cake and eat it too: a good model that passes traditional System Dynamics quality checks and fits the data can yield unique insights. This talk discusses how data, calibration optimization, Kalman filtering, Markov Chain Monte Carlo, Bayesian inference, and sensitivity analysis work together. The emphasis is on practical implementation with a few examples from a real project, and pointers to resources.

Using all available information, from informal estimates to time series data, yields the best possible estimate of the state of a system and its uncertainty. That makes it possible to construct policies that are robust not just to a few indicator scenarios, but to a wide variety of plausible futures. Even if you don’t use the full suite of available tools, there’s much to be gained from a simple application of eyeball calibration, traditional reference modes as pseudo-data, and exploratory sensitivity analysis.

About the Speaker

Tom Fiddaman is the CTO of Ventana Systems and part of the development team for Vensim and Ventity. He created the Markov Chain Monte Carlo implementation in Vensim that facilitates Bayesian inference in System Dynamics models. He got his start in environmental models and simulation games, and worked on Fish Banks, updates to Limits to Growth, and early versions of C-ROADS and En-ROADS. Tom worked on data-intensive projects in a variety of settings, including consumer goods supply chains, mental health delivery systems, pharmaceutical marketing, state COVID-19 policy, and recently Chronic Wasting Disease in deer.

Watch the recording below

Whoops, this recording is available for members only. If you have a membership, please log in. If not, you can definitely get access! Purchase a membership here. If you're not a member but have purchased a ticket to this webinar, please contact us at


Answers by Tom Fiddaman

Before launching into the written items, I’ll mention Jim Hines’ opening question, which was something like,

Q: What are the consequences of “assuming the model is right” when it turns out to be untrue?”

I think it’s nearly certain that a policy model will be wrong to a significant extent (despite Not Models Are Wrong). I think the facile answer here is that no model available to us will be perfect, and “no model” is not an option, so the best we can do is try to improve the models we have – and data comparisons help (at some cost).

I think I failed to give the most important part of the answer. When the model is wrong, hopefully, the problem will reveal itself through the poor fit to data, really wide uncertainty interval results, and other diagnostics. However, data by itself may be a weak test. I think the problem of overparameterized models that can fit anything is vastly overblown when the model is dynamic and nonlinear, but it can certainly happen. This is why other tests – units, extreme conditions tests, conservation laws, etc. – are so important.

Q: How to deal with structural uncertainty? (The uncertainty of how the real world could be modeled by us?) Making 100 model variations would take a lot of time 😉

100 variations would definitely be a lot of work, but it would be really cool if we could automate the generation and selection of these variations. One option would be to specify the behavior of stock-flow chains at a more granular level (in terms of the entities within) and then automatically generate different aggregate descriptions in terms of coflows, aging chains, etc.

We can’t do that yet, but in the CWD project, we did explore a number of variations: infection chains with and without age and sex structure, and with and without spatial detail and diffusion across geographic boundaries. We tried several variations from the 2nd order to the 44th order for the SIR chain. To some extent, you can do this with subscription (or entities in Ventity) – for example, you can build the model with a “county” subscript populated by real detail, but collapse that to an aggregate “all” county for experiments, without rewriting the equations.

Another facet of this question is that reality always contains some structure that we don’t model. This could be systematic (a missing feedback loop) or random (weather effects on the deer population). Particle filtering, including the special case of Kalman filtering, at least partially addresses this by moving the model state toward the data as the simulation progresses.

Q: How did you build a structure in Ventity to assess the evolution and “burn-in” of the parameters with MCMC? 

Ventity doesn’t yet do MCMC, but in Vensim there are at least four options. 1. Use the built-in PSRF diagnostic, which you can watch in the runtime error reports. 2. You can load the or file generated as a dataset, and inspect the trajectories of the parameter values as well as the diagnostics. 3. You can load the same files in other software (Python/pandas, R, Excel, etc.) for inspection, visualization and diagnostics. 4. You can rerun the analysis with a different random number seed and compare samples.

We consider this an area of weakness, where the state of the art (e.g., in Stan) has advanced a lot, and expect to make substantial improvements in the coming year.

Q: How can we choose from different methods? Any criteria?

I think it’s hard to give a general answer to this – the answer depends a lot on the data, time available, existing tools you’re familiar with, and other nontechnical features.

Personally, I have a very definite preferred path:

  1. Build a model-data comparison control panel with some key parameters and experiment by hand.
  2. Start doing preliminary calibrations using loosely defined likelihoods and priors pretty early. At this point, just seek the maximum likelihood or posterior using Powell searches, in the interest of time and simplicity.
  3. As you learn about the model and the data, gradually transition to better likelihood and prior definitions and full exploration of the posterior with MCMC.
  4. Even if you don’t calibrate and use an MCMC sample to assess uncertainty, do multivariate sensitivity runs to see the distribution of outcomes from your proposed policies.

Q: “question of semantics on ‘forecasting’ the alternatives are more explicit but don’t they all involve looking into the future with a modeling approach which is forecasting by another name? Am I missing something here?

I think the short answer is “yes – it’s all forecasting” or perhaps better to say “prediction.”

Traditionally, forecasting implies that you’ll know the state of the system at some point in the future. If your goal is to predict the future and respond to it, that’s an open-loop strategy, with lots of pitfalls JWF warned against, rightly.

I think we’re seeking prediction more broadly. Even if we can’t know the future state of the system, we can make contingent predictions about the response of the state to our policies. Ideally, we’d like to formulate closed-loop decision rules that perform well under a variety of possible futures, i.e. they improve the system state, regardless of what it is.

Q: Were there any initiatives to create rapid tests and protocols for infected deer?  i.e. decrease prions in the field

Rapid tests would be a big improvement. One problem hunters face, for example, is that by the time test results arrive (a week or two currently), they’ve already invested the trouble and expense of moving and processing the deer. This also means prions have moved, and possibly been consumed. We didn’t test this option in the Phase 1 model, but it’s on the list for the next iteration.

An ideal test would let you spot infected deer on the landscape while they’re still alive, but this is probably a long way off.

Q: Regarding the Bayesian approach: Which distributions should be chosen (as a starting point) for discrete and continuous variables?

There are lots of situation-specific options, so it’s hard to give a general answer here. Probably referring to BDA is the best option ( ).

By far the most common things I use are:

  • Normal, i.e. -((param-belief)/belief SD)^2/2 for location parameters that can take mixed pos/neg values, or just for convenience
  • LogNormal, i.e. -LN(param/belief)/belief SD for scale parameters like time constants or fractional rates of change
  • -LN(param) for an improper noninformative prior for scale parameters (a bit lazy usually)
  • Beta for fractions between 0 and 1. The PERT distribution might be an attractive alternative.

You can also use a lookup table to simply draw a distribution.

Q: If we use Mean Absolute Percentage Error (MAPE) as model evaluation/validation in comparing the System Dynamics Model output parameter with the historic data, in what % maximum of MAPE the model is good or valid? are 5% good as a limitation?

I think this can’t be answered in general, because the MAPE depends in part on how much measurement error is embedded in the data. If you predict the next roll of a fair six-sided die as 3.5, means your % error is at best 14% and on average something like 40%. That sounds terrible, but you can’t improve on it without cheating.

It’s possible to estimate the scale of the errors in the data, either a priori or as part of the calibration process. In that case, the uncertainty in your parameter and outcome estimates would reflect the quality of the data.

Generally, I would hesitate to rely on the goodness of fit metrics as the final word on model validity. There might be good reasons for the lack of fit to some features (for example, inessential features that you didn’t model) and it might also be possible for a bad model to nevertheless fit the data reasonably well. Still, it’s certainly a reasonable thing to pay attention to.

Even though there isn’t a general rule, I do use something like a rule of thumb in preliminary calibration work. If I don’t know the scale of errors in the data, I just assume it has a standard deviation of 10%. It can’t be 0%, because nothing is perfect. It probably isn’t 50%, because then no one would bother collecting it. Using 10% as a guess is often good enough for getting started.

Q: How did you stratify the SEQUENCE of actions? e.g. some upstream, preventive measures may have a significant impact on downstream outcomes.

For simplicity, most of the policy packages we simulated for stakeholders were “ballistic” in the sense that they don’t respond to changes. This was partly constrained by the 5-year horizon remaining in the current plan, which is fairly short compared to the disease evolution (we did run out to 2040 though).

There’s one important exception. Among the three representative geographies we simulated, one is a newly infected area, where the disease is present but not yet detected. For that situation, we explicitly model the testing process, tracking the composition and prevalence of harvested deer, and sampling them with random Binomial draws. This makes the discovery of the disease stochastic and dependent on the level of surveillance in the area. Other policies – baiting and feeding bans, accelerated harvest, etc. – only commence with discovery. This makes the effectiveness of surveillance dependent on the subsequent response, and the effectiveness of the response package contingent on the adequacy of surveillance, plus some luck.

There’s also some feedback from perceived disease prevalence to hunter participation in or compliance with control efforts. This isn’t strictly sequencing, but it does affect the future effectiveness of policies.

What role do predators play in the spread or containment of the CVD and how would this be reflected in the model structure?

This was certainly mentioned, but we didn’t model it explicitly. We do have a proxy, which is the ability to selectively harvest infected animals. There’s some reason to think that predators, and also to some extent hunters and sharpshooters, can do this. It’s extremely effective.

I think the basic challenge is that predator management is not a matter of reason, but rather a quasi-religious debate that’s almost untouchable for resource agencies.

Q: Is there any way we can import the algorithm in Vensim or Ventity?

I’m not sure what it would mean to import the algorithm, but there are some other options for doing this kind of work.

Q: Would the Kalman filtering approach just mentioned potentially run the risk of being misleading if the data referenced are lagged, or distorted in some way?

Certainly, this is always a possibility, and not only for the Kalman filter. Any calibration process that moves the model towards the data is subject to problems with the data. Lags are straightforward – you can model the lag explicitly so the model-data comparison is apples to apples. But often distortions will be unknown to the modeler. However, they’re likely to reveal themselves in poor fit or other distortions to model behavior, and bad uncertainty intervals on the parameter estimate. You can examine the residuals to find and reject data points that are particularly problematic, but of course, this requires a little care because it could be the model that is wrong.

Q: The time lags associated with data collection would, I think, create some distortions that would perhaps need to be accounted for or addressed

This is definitely the case. Reporting of deaths from COVID is a good example – they take weeks to months to trickle into the official statistics. So, you might model this with a stock-flow structure that lags the unobservable instantaneous death flow. Using something like a third-order delay is often a reasonable starting point.

CWD is a bit unusual, in that almost all the testing data arrives in one big annual pulse, during hunting season. This of course corresponds with a big spike in deer mortality. Ideally, the model would capture this, along with the spike in births in the spring. However, we currently gloss over these discrete time events and model things continuously.

Q: How did you generate permutations for the 80 action packages?

The stakeholder participants developed the action list, and the state implementation team did the final assembly into packages. Then we developed a spreadsheet that translated the qualitative descriptions of the action packages into model parameters.

The explosion into 80 packages wasn’t ideal – it arose from the curse of dimensionality: 3 geographies x 5 actions x 2 agents, plus some combinations. I think a purely model-driven process would have led to fewer.

Characterizing a large number of policies was a pain, but it did lead to some good discussions: What does “do nothing” really mean? What are the resource tradeoffs involved in implementing the same policy in regions with different characteristics?

Once we had the parameters describing the policies, it was pretty easy to automate running them all, using VenPy with the Vensim DLL (see the last question).

Q: Is there any data collection of SARS-CoV-2 (all subtypes) seropositivity in the white-tailed deer populations that you are testing for CWD positivity?  Do you have any reason to be suspicious of possible co-seropositivity for both covid and CWD in the deer? 

This didn’t come up, but there are certainly reasons to think that CWD-compromised deer would be more susceptible to other diseases.

Q: If you are modeling just one mode of behavior, instead of all of them, can these methods still be used? (E.g. modeling a cycle of a certain period where the real data has cycles of other periods as well as perhaps exponential adjustment type modes, etch). Do you filter the data in some way?

I can think of cases where it might be possible to aggregate or filter some dynamics out of the data. For COVID, for example, a lot of states didn’t test on weekends or at least didn’t report on weekends, so there were big gaps on Sat/Sun and a spike on Mon or Tue. If you aggregate to weekly reporting, that noise goes away, at the expense of introducing half a week of lag on average. For a lot of purposes that would be fine.

Generally, though my preference would be to introduce the unwanted or unmodeled features to the model as parameterized exogenous inputs. That way the model matches the raw data, and it’s easier to attribute what’s going on explicitly to the exogenous and endogenous features of the model.

Q: It would be good to get some videos or briefings about automating the modeling/simulation/policy analysis process with scripts. This is highly interesting but came short at the ISDC.

I’ll put this on my to-do list. There are some examples in the VenPy repository, like the SDM Consequence Model. Some images are here.

Recent Posts

Society Governance Updates

Society Governance Updates Welcome, Allyson! New President Allyson Beall King joined the Policy Council as our 2024 President. Her primary role is as director of the Washington State University School of the Environment, which focuses on regional ecologies and our...

Call for Presenters: Seminar Series

Call for Presenters: Seminar Series We at the System Dynamics Society are continually seeking vibrant and knowledgeable presenters for our ongoing Seminar Series. As we unfold the calendar, there’s always a place for more insights, experiences, and expertise to enrich...

Upcoming Events

Recent Business cases

Management Design for Planted Forests in Japan Using System Dynamics

Management Design for Planted Forests in Japan Using System Dynamics EXECUTIVE Summary The simulation model offers a novel approach to sustainable forest management in Japan, enabling detailed analysis of labor requirements and changes in forest conditions. It enables...

Join us