Visual inspection is essential in data analysis since it allows to identify the major features of the data distribution and determine which tests are applicable to the particular data set. In this case, the data is divided into the two groups depending on the gender of the students. The variable of interest is the total number of points earned by the students. The data is summarized in the form of absolute frequency histograms and is presented below.
It can be observed from the graphs that the two groups of students follow the similar pattern in the distribution of data. There is no single peak representing the most frequent score. Therefore, it can be assumed that the distribution of data is multimodal for both male and female students. There seems to be a gap around the value of 100 with the neighboring peaks corresponding to higher frequencies. It is evident that the distribution is not symmetric in both cases with outliers on the left side of the distribution. It can be observed that both graphs are negatively skewed since they have the main bundle of the most frequent scores and a long tail towards the lower scores (Bennet, 2013). Therefore, a conclusion can be made that the outliers in both groups will correspond to the low total scores. From both graphs, it can be observed that the value around 50 points is an outlier since one student in both groups got this score, which is the lowest across the sample. Therefore, the value of skewness is expected to be negative for both female and male students.
Kurtosis is the measure of the shape of the distribution of data. The standard value of the kurtosis in the SPSS software is 0, which corresponds to the normal distribution (UCLA, 2017). From the graphs, it can be observed that the distribution in the two groups of data is rather leptokurtic, that is, they have a higher peak and produce less extreme outliers than a normal distribution (Lane, 2008). Therefore, it is expected that the value of kurtosis will be above 0 for both groups of students.
The data set under analysis contains various scales of measurement, including the scale, nominal, and ordinal variables. Ordinal variables refer to the categories with some ranking. In this case, there are no ordinal variables of interest. When the variable is nominal, its values do not represent any ranking and cannot be compared to each other as higher or lower. Nominal variables are often referred to as the qualitative measures since they cannot be used for performing any forms of mathematical computations (Geert van den Berg, 2017). The only descriptive statistic that can be applied to the nominal scale variables is the mode, which is not required in the research question. As for the mean, standard deviation, and the shape measures, they are meaningless for the nominal data. Therefore, it can be concluded that the id, gender, and ethnicity cannot be analyzed using the mean, standard deviation, skewness, and kurtosis.
Scale variables in SPSS represent the values with the meaningful distance between the two points. In addition, the zero value is meaningful in the scale variables. The scale variables can be analyzed using the measures of the central tendency, dispersion, and shape of distribution (Geert van den Berg, 2017). Therefore, for GPA, Quiz 3, and Total the analysis is meaningful.
However, it should be noted that while for the above mentioned scale variables the analysis of central measures is applicable, skewness and kurtosis are not always appropriate measures of the shape of distribution. In this case, the GPA and Total are ideal for the skewness, and kurtosis analysis since the variables are continuous, and therefore can be used on the interval scale. Meanwhile, Quiz 3 is a nominal variable with the discrete values ranging from 1 to 10. Therefore, the Quiz 3 is acceptable but not excellent for the skewness and kurtosis analysis. In the outlined range of scale variables, there are no unacceptable ones. However, the nominal variables id, gender, and ethnicity can be identified as unacceptable for skewness and kurtosis analysis.
From the table presented above, it can be observed that the value of skewness for the GPA is -0.52, which indicates that the distribution is slightly negatively skewed with the negative outliers. The value of kurtosis is -0.811. Thus the distribution is platykurtic with more unusual values than the normal distribution. For the Quiz 3, the value of the skewness is -1.177 with the negatively skewed distribution. The value of kurtosis is 0.805 indicating a leptokurtic distribution with the higher peak and fewer outliers than the normal distribution. The distribution of the total scores of the students is slightly negatively skewed with the value of skewness of -0.837. The distribution is leptokurtic with the highest kurtosis value of 0.943.
In summary, it should be noted that the graphical analysis should be combined with the descriptive statistical analysis since it is necessary to identify whether the particular analysis tools are applicable for the variables of interest.
- Abebe, A., Daniels, J., McKean, W. (2001). Statistics and Data analysis. Western Michigan University.
- Bennet, J., Briggs, W. (2013). Statistical Reasoning for Everyday Life. 4th edition. Pearson.
- Geert van den Berg, R. (2017). Measurement Levels. SPSS Tutorials. Retrieved from: https://www.spss-tutorials.com/measurement-levels/
- Lane, D. (2008). Introduction to Statistics. Developed by Rice University, University of Houston Clear Lake, and Tufts University. Retrieved from: http://onlinestatbook.com/
- UCLA. (2017). Descriptive statistics. SPSS Annotated Output. Retrieved from: http://stats.idre.ucla.edu/spss/output/descriptive-statistics/