Tests for Normality of Data

Tests for Normality of Data
You can use the information on this page to determine the normality of you quantitative data (ratio or interval level only). When you have figured out whether your data is normal or not, i.e., parametric or nonparametric, then click here to go back to the start page.

While it is important to use visual tools such as the histogram to ascertain if your data is normal, it is equally important to statistical test for normality as well (Razali & Wah, 2011)^*.

Using our Social Worker Safety Data we will determine if the data is normally distributed

First we load the data

load("C:/Users/pruer/Google Drive/Research_UMKC/5537/SW5537_SP_15/Social Worker Safety/SWsafety.RData")

Next we plot a histogram of our Perception of Safety scale. Remember it was on a 1 to 10 scale. Is it ordinal or is it interval level data? It could be considered either. It is not exactly like a Likert scale, which would be ordinal and should not be typically considered interval, but researchers do it all the time (Jamieson, 2004)^{**, the sneaky devils.}

hist(SWsafety$PerceptionSafety, col='skyblue')

As you can see the data is skewed to the left considerably. We would reject normally based on sight alone here.

Alternatively

You could use the qqnorm and the qqline to produce Quantile-Quantile Plots. Which plots your data compared to a theoretical normal.

qqnorm(SWsafety$PerceptionSafety,col='blue')
qqline(SWsafety$PerceptionSafety, col ="red")

Had our data been normally distributed the small blue circles that represent each of our data-points would be overlapping all or nearly all of the red qqline.

Two Statistical Tests for Normality of Data

Razali and Wah (2011) tested four statistical procedures that check for normalness in your data. Of those four, they found the the Shapiro-Wilks test to be superior, but has problems with smaller data sets. Our Social Worker Safety data has an n= 64, for the Perception of Safety variable. Use the length command to determine how many cases you have. But you have to decide if the results is 'small' or not. Sixty-four is probability not small.

length(SWsafety$PerceptionSafety)

## [1] 64

The Shapiro-Wilks Test in R

Performing a Shapiro-Wilks test in R is easy-peazy. Just type in shapiro.test(x), where 'x' is the variable you wish to test.

First state your hypothesis H₀: Perception of Safety is part of a theoretical normal distribution. The level of risk we are willing to take will be the typical not more than .05.

We run the test. We get a W value, which is all well and good, but we are interested more in the p-value, which in this case is 1.653e-05, or .00001653. Small. We reject the NULL, which means our data is NOT normally distributed.

shapiro.test(SWsafety$PerceptionSafety)

## 
## 	Shapiro-Wilk normality test
## 
## data:  SWsafety$PerceptionSafety
## W = 0.881, p-value = 1.653e-05

The Anderson Darling Test

The Anderson Darling test for normality is run exactly like the shapiro.test, only you use ad.test, which is part of the nortest package in R. You need to enable it first, by typing in: require(nortest).

ad.test(SWsafety$PerceptionSafety)

## 
## 	Anderson-Darling normality test
## 
## data:  SWsafety$PerceptionSafety
## A = 2.4824, p-value = 2.412e-06

Again we can see a tiny tiny tiny little p-value 2.412e-06.

Like the shapiro.test, the ad.test compares your data to a theoretical normal sample and if the p-value is below your level of risk (.05), you reject the NULL. Which means your data differs from the normal distribution.

When you have figured out whether your data is normal or not, i.e., parametric or nonparametric, then click here to go back to the start page.

Click on Dorothy to go home.

*_{Razali, N. M & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Komogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.}

**_{Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38(12), 1217-1218. doi: 10.1111/j.1365-2929.2004.02012.x}