While it is important to use visual tools such as the histogram to ascertain if your data is normal, it is equally important to statistical test for normality as well (Razali & Wah, 2011)*.
Using our Social Worker Safety Data we will determine if the data is normally distributed
First we load the data
load("C:/Users/pruer/Google Drive/Research_UMKC/5537/SW5537_SP_15/Social Worker Safety/SWsafety.RData")
Next we plot a histogram of our Perception of Safety scale. Remember it was on a 1 to 10 scale. Is it ordinal or is it interval level data? It could be considered either. It is not exactly like a Likert scale, which would be ordinal and should not be typically considered interval, but researchers do it all the time (Jamieson, 2004)**, the sneaky devils.
hist(SWsafety$PerceptionSafety, col='skyblue')
Alternatively
You could use the qqnorm and the qqline to produce Quantile-Quantile Plots. Which plots your data compared to a theoretical normal.qqnorm(SWsafety$PerceptionSafety,col='blue') qqline(SWsafety$PerceptionSafety, col ="red")
Two Statistical Tests for Normality of Data
Razali and Wah (2011) tested four statistical procedures that check for normalness in your data. Of those four, they found the the Shapiro-Wilks test to be superior, but has problems with smaller data sets. Our Social Worker Safety data has an n= 64, for the Perception of Safety variable. Use the length command to determine how many cases you have. But you have to decide if the results is 'small' or not. Sixty-four is probability not small.
length(SWsafety$PerceptionSafety)
## [1] 64
The Shapiro-Wilks Test in R
Performing a Shapiro-Wilks test in R is easy-peazy. Just type in shapiro.test(x), where 'x' is the variable you wish to test.First state your hypothesis H0: Perception of Safety is part of a theoretical normal distribution. The level of risk we are willing to take will be the typical not more than .05.
We run the test. We get a W value, which is all well and good, but we are interested more in the p-value, which in this case is 1.653e-05, or .00001653. Small. We reject the NULL, which means our data is NOT normally distributed.shapiro.test(SWsafety$PerceptionSafety)
## ## Shapiro-Wilk normality test ## ## data: SWsafety$PerceptionSafety ## W = 0.881, p-value = 1.653e-05
The Anderson Darling Test
The Anderson Darling test for normality is run exactly like the shapiro.test, only you use ad.test, which is part of the nortest package in R. You need to enable it first, by typing in: require(nortest).ad.test(SWsafety$PerceptionSafety)
## ## Anderson-Darling normality test ## ## data: SWsafety$PerceptionSafety ## A = 2.4824, p-value = 2.412e-06
Again we can see a tiny tiny tiny little p-value 2.412e-06.
Like the shapiro.test, the ad.test compares your data to a theoretical normal sample and if the p-value is below your level of risk (.05), you reject the NULL. Which means your data differs from the normal distribution.
When you have figured out whether your data is normal or not, i.e., parametric or nonparametric, then click here to go back to the start page.
Click on Dorothy to go home.*Razali, N. M & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Komogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
**Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38(12), 1217-1218. doi: 10.1111/j.1365-2929.2004.02012.x