How to Remove Outliers in Google Sheets
Sometimes your spreadsheet can include unusually high or low values that skew your results. Learn how to remove outliers in Google Sheets here.
When you’re analyzing data, one or two results that can throw everything out of whack. Data points that are significantly higher or significantly lower than the majority of your data can skew measures such as the mean average significantly. In some cases, you’re better off removing these outliers from your data before you start your analysis.
If you have some data points that don’t really fit, then learn how to remove outliers in Google Sheets below.
What is an Outlier?
An outlier is a data point that is significantly larger or smaller than the majority of the data in the set.
For example, if you measure the growth of five plants, and the results were 5 cm, 7 cm, 6 cm, 7 cm, and 25 cm, then the final result would be an outlier. The majority of the results are all around the same value, and the final result is significantly higher than the rest.
There are methods to determine what values would be considered outliers that are based on the range or spread of your data.
Why Remove Outliers?
In some cases, outliers in your data can skew the results when you are analyzing your data.
For example, in the case above, if we take the mean average of the growth of all five plants, it would be (5+7+6+7+25)/5 or 10 cm. Four of the plants grew much less than this, with only one plant that grew significantly more, so the mean average isn’t really representative of any of the data points.
However, if we remove the outlier, the mean average would be (5+7+6+7)/4 or 6.25. This is much more representative of the remaining values and gives a more accurate reflection of the data.
Some statistics, such as the median, will be unaffected by outliers, but in many cases, outliers can have a significant impact.
How to Find an Outlier
There are multiple ways to determine outliers, based on the range or spread of data. One of the simplest involves using the interquartile range.
A quartile is simply a data point found by splitting your data into four equal parts. The lower quartile is the number a quarter of the way through your data when listed in order from lowest to highest. The upper quartile is the value three-quarters of the way through your data.
The interquartile range is the difference between these two values. Since it is the difference between the lower quartile and upper quartile, exactly half of your values will lie within this interquartile range.
You can determine the upper limit for outliers by multiplying the interquartile range by 1.5 and adding the result to the value of the upper quartile. Any value above this number is an outlier. Similarly, you can find the lower limit for outliers by multiplying the interquartile range by 1.5 and subtracting this from the value of the lower quartile. Any number below this value will be an outlier.
How to Find an Outlier in Google Sheets
If all of the above sounds like a bit too much math for your liking, never fear. Google Sheets does all the hard work for you. It will help you to calculate the upper quartile, lower quartile, and interquartile range for your data, and you can then create a formula to determine whether each value is an outlier or not.
To find an outlier in Google Sheets:
- Select a cell where you want to calculate the lower quartile.
- Enter the following:
=QUARTILE(
- Select all of your data.
- Type a comma, and then a 1, followed by a closed bracket.
- Press Enter and the lower quartile will calculate.
- Select the next cell down and enter the following:
=QUARTILE(
- Select all of your data.
- Type a comma, then a 3, followed by a closed bracket.
- Press Enter and the upper quartile will calculate.
- Select the next row down and type = then select the cell containing the upper quartile.
- Type a minus (-) and select the cell containing the lower quartile.
- Press Enter and the interquartile range will calculate.
- Select the cell next to the top cell of your data and enter the following formula, replacing C3 with the location of your top cell of data, $C$18 with the cell containing the lower quartile (make sure you put a $ sign before the letter and number), $C$19 with the cell containing your upper quartile, and $C$20 with the cell containing your interquartile range:
=IF(C3<$C$18-1.5*$C$20,"Outlier",IF(C3>$C$19+1.5*$C$20,"Outlier",""))
- If your first data point is an outlier, your formula will return the text outlier. Otherwise, it’ll be blank.
- To apply the formula to the other cells in your data, click and hold the drag handle in the bottom corner of the cell.
- Drag down over the cells where you want to apply the formula.
- Let go, and your formula will be copied and will calculate. You can now clearly see any values that are outliers.
How to Remove an Outlier in Google Sheets
You now know how to find outliers, but what about removing them?
We can do that using a simple IF formula that will remove the data from any cells that are listed as outliers, but return the original value for any that are not.
To remove outliers in Google Sheets:
- Follow the steps in the section above to determine which values are outliers.
- Select the cell two columns along from your top data point.
- Enter the following formula, replacing C3 with the position of your top cell and D3 with the position of the cell to its right:
=IF(D3="Outlier","",C3)
- Press Enter. If the top cell is an outlier, the formula will return a blank. Otherwise, it’ll return the value of your top cell.
- To apply the formula to the rest of your cells, click and hold the drag handle in the bottom corner of the cell.
- Drag down over the cells where you want to apply the formula.
- When you release your mouse, the formula will be copied and Google Sheets will calculate the end result. Any values listed as outliers will be removed.
Manipulating Data in Google Sheets
Learning how to remove outliers in Google Sheets ensures that your data isn’t skewed by values that are significantly outside the range of the majority of your data. In some cases, this can have a big impact on the statistics that you use to analyze your data.
Knowing how to remove these values gives you the option to get rid of them if you wish, or include them if you want to analyze the entire data set. You can even try both and compare the results.
Google Sheets is great for statistical analysis but it can do much, much more. You can collaborate with other users when you’re creating a Google Sheets document and even comment and react to their work. You can insert Google Sheets tables into your Google Documents, and you can even create QR codes in Google Sheets.