## Unit 9 - Modeling and Simulation

This unit gives us the occasion to step back and try to grasp the "big picture". In Unit 5 we talked about variables (measurements). The objective of any statistical or mathematical approach to understanding measurements is to try to discover relationships betweeen variables and, if possible, to conclude some kind of causality.

For some time now, "modelling and simulation" has been fashionable in scientific statistics. What does "modelling and simulation" really refer to, and is it different or better than "old fashioned" statistical testing?

Either modelling or simulation must start with observations of something and some assumptions about relationships among the observations. To make this statement more concrete, we wish to observe some phenomena (in our usual context, biological or medical) and then describe the phenomena and relationships with mathematical formulas.

The presumption is that, in a mathematical world, relationships may be understood in terms of the mathematics and that conclusions or predictions may be made. For a more detailed treatment of the relationship between mathematical models and reality, please have a look at this article.

Let's get down to concrete ideas in the video.

Review of and comments on the video:

Deterministic model for the elimination of a drug from the body:

C′(t) = -αC(t), C(0) = C0

The solution to this equation is: C(t) = C(0)exp(-αt)

If we take the log (always natural) of this solution, we get:

log(C(t) = log(C(0)) - αt.

If we set Y = log(C(t)), a=log(C(0)) and b=-α, the equation becomes:

Y = a + bt, which is just a straight line formula. Note that, for the moment, there is no randomness in the model.

To analyse a set of real data, we must incorporate randomness into the model. So, we presume that the observed data is equal to the theoretical value plus some random fluctuation, or error term, error(t).

So the stochastic model is:

Y = a + bt + error(t), a straight line with some random fluctuation built in.

The R command to estimate the best values of a and b to fit observed Y and time data is:

lm(Y~t)

"lm" meaning linear model... Note that the a and b are found by the lm and do not have to be put into the R command.

Here is a zip file with the R-code for our examples, including the simulations, and the outline for this unit's video.  Please download it and have a look. There is more material than just presented in the video. Contact me at: dtudor@germinalknowledge.com