top of page
Basic Statistical Concepts Used in Analysis
Usually, statistical analysis is categorized into descriptive and inferential statistics. Let us see those in brief in the following ways. We will use the sample data to understand these concepts. Let us take a set of numbers :1,2,3,4,4,4,5,6,6,8,9,10 and a set of letters as A, B C,D, E, E,E, F, G,G,H.
Descriptive Statistics
-
Frequency or Count
-
​​Number of times an element occur in a data. Here, we see numbers "4" and "6" occur three times and twice respectively in the above data set of numbers. So, frequency of "4" is three. Likewise, in case of letters, frequency of "E" is three.
-
-
Percentage
-
If we take 10 out of 100, then percentage (%) is 10. In the same way, taking 10 out of 200 will be (10/200)​* 100 i.e. 5%.
-
-
Averages (Mean, Mode, Median)
-
Mean There are various types of mean (Geometric, Arithmetic, Harmonic). In case of first three numbers (1,2,3), the arithmetic mean will be sum of terms (1+2+3)​/number of terms(3) i.e. 2.
-
-
Range
-
Difference of Maximum value and Minimum value . So, in our data set, range is 10-1=9
-
Inferential Statstics
​We use sample from the universe (population) data and make hypothesis. After testing hypothesis, we make inferences about the whole population under study.​​​ Some of the techniques used in inferential statistics are regression (simple and multivariate regressions), Z-test, ANOVA, T-tests.
​
For Example: Developing Medicine by testing a drug
If a drug company is developing a tablet (pill), that can increase the recovery time from the cold. How would the company actually know if the pill would work? They might get two groups of people from the same population (say, people from a place "A" who had caught cold) and then give the pill to one group of people, and give the other group a usual drug. They could then calculate the mean days of recovery of each group. Let's say that the mean recovery time for the group with the new drug was 6.2 days, and that for the other group was 6.7 days.
The question becomes, does taking the pill actually help people recover from the cold at a faster rate? The simple comparison does not answer the question and we conduct further statistical tests by making hypothesis.
In inferential statistics, we usually make two hypotheses: a null hypothesis, and the alternative hypothesis.
Null Hypothesis: States that the two groups under study are the same. (in terms of means, means of two groups are equal)
Alternative Hypothesis: States that the two under study are different. (if using means, means are not equal)
A low p-value associated with the tests (e.g. T, Z) that we use indicates a low probability that the null hypothesis is correct. Hence, providing the evidence for the acceptance of the alternative hypothesis.
Remember: It's good to have low p-values.
​
Meaning of P-value: The obtained p-value from a test tells the following. It is the probability that we obtain these results assuming that the null hypothesis is true. For example, a p-value of 0.05 (or 5%) for the drug experiment, it would mean that the probability of obtaining a difference between these two groups is 5%, assuming that the two groups are SAME.
​
bottom of page