## Unit 3 - setting the workspace, loading data, simple analysis of variance, factors

- Setting the workspace
- Data revisited
- Analysis of Variance for a single factor

Review of and comments on the video:

Commands used (note:the two trt commands could have been combined, and the 2 center commands):

trt <- mydata$trt

center <- mydata$center

trt <- factor(trt, labels=c("a","b","c"))

center <- factor(center, labels=c("center1","center2"))

sbp <- mydata$sbp

dbp <- mydata$dbp

boxplot(sbp˜center) (note**:** the tilde is centered, not a superscript)

anova(lm(sbp˜trt))

Remember that "lm" is a function of R that yields the "linear model" specified in the parentheses. More later...

The anova is most appropriate when applied to observations that follow the normal probability distribution (bell-shaped curve). In the video, we have looked at a simple treatment effect. Many more complications are possible. If the data (variables) are not continuous measures, then other methods are more appropriate. Thus, "yes or no" responses are not best tested with anova. Look at chi-squared tests or logistic modeling elsewhere on this site for those kinds of variables. Also, time to event data is not properly treated by anova since for some observations the event may not occur, thus more sophisticated models must be used so as not to treat the missing data as non-informative (if the event does not occur, then the time to event is missing). More about these cases elsewhere on this site).

Contact me at: dtudor@germinalknowledge.com

©
Germinal Knowledge. All rights reserved