What is Variance and Covariance & How to calculate both

What is Variance?

Variance is a measure of the difference between data points and average. The variance is a measure of the extent to which a group of data or numbers disperses by its mean (average) value.

Variance denotes the expected difference between the actual value.

The more the variance is dispersed from the average and the lower the variance value is, the more the variance value is dispersed.

Consequently, it is considered a measure of data distribution from the mean and variance thus depends on the standard deviation of the data set.

Types of Variance

The variance has 2 major types. These types are as follows:

Population Variance
Sample Variance

Polulation Variance

Population variance having the symbol σ2 informs you how the data points are dispersed throughout a given population. The population variance is the mean distance between the population’s data point and the average square.

The actual variance is the population variation, yet data collection for a whole population is a highly lengthy procedure. Rather, a population sample may be taken and population variation can be determined using sample variance.

Sample Variance

Sample variance is a type of variance by means of which metrics are examined and quantified through a systemic process of any particular sample data. Different algebraic formulae are utilized for the analytical process.

For analysis of small data sets, mostly the sample variances are employed. In general, information about 50 to 5,000 items is included in the sample variance dataset. The sample variance is used to avoid lengthy calculations of population variance.

Is variance always positive?

The variance is calculated by taking the square of the standard deviation. When a square (x²) of any value is taken, either its positive or a negative value it always becomes a positive value.

Therefore, while calculating the variance, when the standard deviation is squared ultimately a positive outcome is received. We can say that, now the variance is always positive because of taking the square of values as per formula.

How to interpret variance?

Typically referred to as σ², the variance is just a square of standard deviation. The formula for finding a variance in the dataset is σ² = Σ (x – μ)² / N.

Whereby μ is the mean of the population, x is the element in the data, N is the population's size and Σ is the symbol for representing the sum.

For instance, if a dataset’s standard deviation is 4, the variance would be 4² = 16. Similarly, if a dataset's standard deviation is 50, the variance would be 50² = 2500.

Otherwise let’s look at a value having standard deviation in points, i.e. 4.5, the variation is 4.5² = 20.25.

The more the values are distributed in a dataset, the greater the variance. Take into account three datasets together with their respective variances to interpret variance in a better way.

If the dataset is having 3 times 5 [5, 5, 5], then the variance would be equal to 0, which means no spread at all.

And if in the dataset [3, 5, 7] the variance is calculated, it comes out as = 2.67, which is some sort of spread. Moreover, if the data collection has values [1, 5, 99] a lot of spread is found as the variance will be 2,050.67

How to Calculate Variance?

Variance is computed by calculating a variable’s covariance and the square of the standard deviation, as represented in the equation below:

σ² = Σ(x-μ)² / N

In the formula represented above, u is the mean of the data points, whereas the x is the value of one data point, and N represents the total number of data points.

It should be noted that, as the method operates by taking the square, the variance always will be positive or zero.

When the variance is zero, then the same value will probably apply to all entries. Likewise, a wide variance indicates that the numbers in the collection are distant from the average.

Moreover, the formula of variance can also be modified to scale the variance by the square of that constant if, for example, the data set values are scaled by a constant.

Step by Step Variance Calculation

You have become familiar with the formula for calculating the variance as mentioned above. Now let’s have a step by step calculation of sample as well as population variance.

The general procedure and first four calculation steps of sample and population variance are similar, however, the last step is distinct in both the types.

For variance calculations first of all you have to calculate the data set’s average or arithmetic mean.
Then, the second step includes the subtraction of each number of the data set from the mean.
Then, each value we get after the subtraction process, has to be squared (x2).
Lastly, all the squared values are summed up altogether.
In population variance calculation, the last step constitutes dividing the summed results by the data set’s total number.
While in sample variance in addition to dividing the resultant value by the total number in the data set you also have to subtract one from the data set.

What is Covariance

Covariance is the measurement of two random variables in a directional relationship. This means, how much two random variables differ together is measured as covariance.

Directional relationship indicates positive or negative variability among variables.

However, a positive covariance indicates that, relative to each other, the two variables vary in the same direction.

Contrarily, a negative covariance indicates that both variables change relative to each other in the opposite way.

Types of Covariance

The covariance also has 2 major types. These types are as follows:

Positive Covariance
Negative Covariance

Positive Covariance

There exists a positive covariance if both of the variables move in the same direction. The variables show a comparable behavior in this situation.

This shows that if the values of one variable (more or less) match those of another, it is said that the positive covariance is present between them.

Negative Covariance

If both variables move in the opposite direction, the covariance for both variables is deemed negative.

In negative covariance, higher values in one variable correspond to the lower values in the other variable and lower values of one variable coincides with the higher values of the other variable.

How to Calculate Covariance?

The following steps are involved in the calculation of covariance:

First of all, the mean of each variable, for example, µx and µx needs to be calculated.
Then the deviation of each value of x and y from their respective means, i.e. (xi - x) and (yi - y), respectively is calculated.
The product of the deviation of x and deviation of y is then calculated. It is done by taking the difference between two values of x, the difference between the two values of y and multiplying both the variables i.e. (xi - µx) × (yi - µy).
All the products of deviations, then are added up altogether.
Then by N, which is the total number of observations, divides the resultant value.

Conclusion

Variance is used in statistics to explain how various numbers are correlated within a data collection, rather than utilizing more complete mathematical approaches like quartile organization of the amount of data.

Variance takes into account that regardless of their direction, all deviations of the mean are the same. The squared deviations cannot be added to zero and thus do not represent any variability in the data set.

Everything You Need To Know About Variance and Covariance

Table of contents