## Unit 12 - Power and Sample Size ### The two types of error

Classical statistical tests start by defining a "null hypothesis", H0. An example: "The mean of a normally distributed variable is 10."

Then we draw our sample, calculate the sample mean, and see how close it is to 10. If the distance is so large that the observation of such a mean is a rare event (rare usually meaning a maximum of 5%), the null hypothesis is rejected. Rejecting the null hypothesis is the holy grail of statistical testing. So, when the null hyothesis is false, then its negation is accepted as true.

The negation of the null hypothesis is called the alternative hypothesis, Ha.

Since we are dealing with measures that fluctuate, we will sometimes make a false conclusion. There are two types of errors we can make:

• Type I: rejecting a true null hypothesis, and
• Type II: accepting a false null hypothesis.

We use alpha (α) and beta (β) to denote these two errors. Thus:

• α = Probability that we reject H0 when it is true.
• β = Prabability that we accept H0 when it is false.

These values are depicted in the image above. In this case, we are comparing two means and the null hypothesis is:

H0 : μ1 = μ2

Under this null hypothesis, the two curves overlap and our test, after drawing our sample is to look at the difference of the two means as test statistic:

1 - x̄1

Now, if H0 is true, we will reject wrongly if our test statistic is in the region corresponding to the pink (for illustration we took α/2 to get 1.96 as the rejection threshold.

On the other hand, if H0 is false, then the blue curve is correct, but we will accept the false null hypothesis if our test staistic is below the green line (corresponding to β). It is very important to notice that if we move the vertical green line at 1.96 to the left, β decreases, but α increases and likewise, if we move the vertical green line to the right, α decreases, but β increases. So, we must be careful about balancing the two, depending on which errors are the most important to avoid.

We choose α and β (usually α is 0.05 and β was 20% (in the past)). Since β is the probability of accepting a false null hypothesis, 1 - β is the probability of rejecting a false null, which is actually the whole objective of what we are doing (proving true differences). In this sense, 1 - β is our probability of a successful study. 1 - β is called the power.

Review of and comments on the video:

Applying the rules presented in Unit 4, var(X-Y) = var(X) + var(Y) - 2•cov(X,Y), and when X and Y are independent, cov(X,Y)=0.

The video shows an example of the type of sample size calculation that is typical.

Guenther (1981 - see references) published refinements of many sample size calculations. The one we used here is: Here is a presentation on sample size that I gave recently:

Here is a link to a zip file with the Excel "calculators" featured in the video above.

It is also important to note here that every type of variable, paired with the study design, has its own sample size calculation.

 Here are just some of the possibilties: Variable Study Design continuous binomial (yes/no, heads/tails, success/failure) nominal (categorical, but no ordering) ordered categorical survival you get the picture (see the next unit for more detail) In each of these cases, the first question is whether the variable is measured once or repeatedly in the study. The crossover design is designed to measure treatment differences "within subject". It is often used for equivalence studies. Finally, for this very brief introduction, "time to event" designes study exactly that, if treatments may prolong or reduce the time to the event of interest.

### Multiplicity

If we perform many tests, we increase the probability of making errors. Here is an example:

Suppose we do 6 statistical tests with a 5% probability of type I error. What is the probability that I will make at least one type I error?

The probability of "at least one error" is one minus the probability of no errors, which is 1-(0.95)6.

This calculation gives a probability of 26.5%, which is an enormous increase in the probability of error. So, if I plan to do several tests, I must increase the sample size in order to control the error.

G*Power is a good software.

The package "pwr", for power and sample size, is available in R.

http://www.psychologie.hhu.de Contact me at: dtudor@germinalknowledge.com