Descriptive Analysis: Central Tendency and Variance

By Nathan B. Smith

Quantitative research makes heavy use of descriptive statistics. Descriptive statistics can be subdivided into evaluating central tendency measures and variability (or spread). Central tendency (or center measures) include mean, median, and mode. If the dataset under review is perfectly distributed, the mean, median, and mode will be equal. The spread measures indicate how close or far apart data observations are. The measure of spread is typically determined with respect to the measure of center that best characterizes the dataset and includes standard deviation, variance, minimum and maximum values, skewness, and kurtosis.

Discussion

Independent and dependent variables

When conducting a scientific experiment, a researcher must identify two types of primary variables: the independent and the dependent variables. The independent variable is manipulated, controlled, or changed to study the effects of the dependent variable. On the other hand, the dependent variable is being measured and evaluated. In other words, the dependent variable depends on the independent variable. As the researcher manipulates or changes the independent variable, the effect on the dependent variable can be observed or measured  (Huck, 2012).

To better understand the concept of independent versus dependent variables, one could consider a study in which a researcher is studying a sample of 500 doctoral students in the United States in order to determine the implications of the sample’s academic performance on the overall population of doctoral students in the United States. For this example, the sample can be described according to its makeup (for instance, males/females, military veteran/ non-veteran, or online/traditional classroom settings). The grades in the sample represent some attributes of academic performances (for instance, grades). In this study, the makeup and performance represent independent versus dependent variables. These variables must be definable and measurable to perform statistical analysis. Variables can have multiple values or levels. For example, male/female may be binary or level of performance measured on a Likert scale (Levine, Ramsey, & Smidt, 2001). 

A scenario involving one independent variable (IV) and 1 dependent variable

In this scenario, the study evaluates the effect of military veteran status on the academic performance of doctoral students.

Independent variable: military veteran status (veteran or non-veteran) 

Dependent variable: Academic performance (Grade Point Average (GPA))

A scenario involving 2 independent variables (IV) and 1 dependent variable

In this scenario, the study evaluates the job performance of a sample of 100 technical writers during the COVID-19 pandemic. These technical writers typically author (write) aircraft engine repair procedures for distribution to maintenance, repair, and overhaul (MRO) facilities.

Independent variable 1: Worker location (on-site or remote (work from home))

Independent variable 2: Level of physical exercise on a Likert scale of 1 to 10

Dependent variable: Job performance (measured by the number of engine repair procedures authored and approved for distribution.

Analysis of variance

Analysis of variance (commonly abbreviated as ANOVA) allows a researcher to understand and make comparisons among groups of study participants. Three basic statistical measures must be understood before considering ANOVA. First, the mean of a dataset represents the average of all given variable values. Second, the variance indicates the variation among a given variable's value. Variance is determined by adding all the squared differences of each value and the mean and viding the resulting sum by the total number of values. Finally, the fourth statistical value is standard deviation, which is the square root of the variance.

Also related to ANOVA are the terms population and sample. The population represents all elements in a group. For example, doctoral students in the United States represent a population that includes all doctoral students in the United States. Another example may be 30-year-old people in California representing a population that includes all people who fit that description. In many cases, it is not feasible or practical to analyze all the people in a population because of unmanageably large numbers. Therefore a sample is studied, representing a subset of an overall population. For example, a researcher may obtain statistics on 500 doctoral students in the United States, a subset of doctoral students in the United States.

Two samples  (or groups) can be compared using the t-test to determine if there is any significant difference in the means of the two groups. However, when examining more than two samples (groups), the t-test is not considered an adequate measure because applying the t-test to each pair of samples is necessary, which quickly becomes unmanageable.

When comparing three or more samples (groups), Analysis of Variance (ANOVA) is a better option. There are two components of ANOVA: variation within and between groups. The ANOVA test represents the F ratio, the ratio between variation within and between groups. The F ratio indicates how much of the total variation represents the variation between groups and how much represents the variation within groups. Suppose the majority of variation is due to variation within groups. In that case, the researcher can assume that the elements in a sample (group) differ from those in all groups. Generally speaking, the larger the F ratio value, the more likely the groups have different means (Stahle & Wold, 1989). 

Job performance assumptions considering the variance

In this scenario, the aircraft maintenance, repair, and overhaul (MRO) team perform. This MRO team uses two independent processes to repair structural damage to engine fan cowls resulting from bird strikes. Process A uses “fiberglass repair kit type 1.” Process B uses “fiberglass repair kit type 2.” performance or quality of the repair is measured on an arbitrary quality scale from 1 to 100 using non-destructive testing (NDT). In this scenario, the researcher measured Process A's effectiveness to 83.2 and Process B to 80.5.

The MRO executive cannot automatically assume that process A consistently outperforms process B. In this study, two sample groups are being evaluated. One sample draws from the production stream (population) that uses Process A, and a sample that draws from the production stream (population) that uses Process B. Variance of the measure of quality (determined by the NDT score) must be considered in terms of variance within groups and variance between groups. The between-group variation gives the total variation between each group's mean and the overall (or population) mean. The within-group variation gives the total variation in the individual values in each sample (group) and the respective group mean.

Depending on the F ratio value, more variance could be attributed to variance within groups or between groups. Larger F-values indicate a more significant variation between sample means with respect to the variation within the sample groups. Therefore, larger F-values provide more convincing evidence of a difference between the group means. In this scenario, the variance between the values in means within one group may indicate that the production kit type is more effective than the other. However, different F-values may indicate that the effectiveness of the alternative kit is greater (Zach, 2021).

Conclusion

F-tests compute the ratio of two variances, so one could assume they are only appropriate for detecting if the variances are identical. However, it is capable of that and more. F-tests are highly versatile since they may assess a broad range of features by including different variances in the ratio. F-tests may be used to assess the fit of different models, determine the overall significance of regression models, identify the significance of specific variables in linear models, and determine whether a set of means is equal. 

References

Huck, S. W. (2012). Reading statistics and research. Boston. MA: Pearson Education, Allyn & Bacon.

Levine, D. M., Ramsey, P. P., & Smidt, R. K. (2001). Applied statistics for engineers and scientists. Upper Saddle River, NJ: Prentice-Hall.

Stahle, L., & Wold, S. (1989). Analysis of variance (ANOVA). Chemometrics and Intelligent Laboratory Systems, 6(4), 259-272. https://doi.org/10.1016/0169-7439(89)80095-4

Zach, P. N. (2021). How to interpret the F-value and P-Value in ANOVA.  Statology Web Site: https://www.statology.org/anova-f-value-p-value/




Comments

Popular posts from this blog

Innovative Discoveries: Serendipity, Errors, and Exaptation

Think Tanks and Futuring