How to Remove HTML Tags in Google Sheets
If you copy data from web pages into Google Sheets, it may include a lot of unwanted HTML code. Here’s how to remove HTML tags in Google Sheets.
Google Sheets is a great way to organize and analyze data. Like every good piece of software, however, garbage in equals garbage out. If you’re pulling your data from other sources, you may end up with more than you bargained for; you may inadvertently import the information you want but also a bunch of useless HTML tags. Until you get rid of these tags, you’re not going to be able to do much with your data.
Thankfully, it’s not too tricky to strip away all of those HTML tags and just leave the information that you want. Here’s how to remove HTML tags in Google Sheets.
What Are HTML Tags?
HTML tags are part of HyperText Markup Language (HTML) code that tell a web browser how to display the contents of those tags. For example, an HTML tag might tell your browser to display a section of text in bold or italics, or to align the text to the center, or to create a frame where content can be embedded.
These HTML tags always take the same format. They start with the name of the tag enclosed in angle brackets, such as <header>, and finish with the same tag preceded by a forward slash, such as </header>. The information that you want to extract lies between these tags. To make some text bold, for example, you would use the following: <b>Hello World!</b>.
When you copy data from a web page, as well as the values that you want to copy, you may also inadvertently copy these HTML tags too. These aren’t useful to you in your spreadsheet, so you’ll want to remove them and just leave the information that you actually want.
How to Remove HTML Tags in Google Sheets Using a Formula
You can create a formula that will look at a cell, remove all of its HTML tags, and return everything that’s left. The original cell will remain untouched, but the new cell will have all of the HTML tags removed, leaving just the data that you want. You can create this formula for one cell, and then quickly apply it to multiple cells.
To remove HTML tags using a formula in Google Sheets:
- Select the cell next to your first cell that includes HTML tags.
- Enter the following:
=REGEXREPLACE(
- Select the cell containing your HTML.
- Type a comma, and then:
"<[^<>]+>","")
- Press Enter and the HTML tags should be stripped from your data.
- To apply the formula to other cells, click and hold the drag handle in the bottom corner of the cell.
- Drag downward over the cells that you want to copy the formula to.
- Release and the formula will be copied to those cells and will strip the HTML tags from your remaining data.
This formula works by using the REGEXREPLACE function. This function allows you to replace part of a string with a different string. You provide the text to replace by selecting the cell containing the HTML tags.
You then provide a regular expression that the function will search for. In this case, the regular expression is:
<[^<>]+>
This expression looks for a string that starts with an open angle bracket, contains at least one character that is not an angle bracket, and ends with a closed angle bracket. As we saw earlier, since all HTML tags start and end with angle brackets, this regular expression will find all instances of these tags.
The final part of the formula replaces any instances of matches for the regular expression with nothing at all. Since all of the HTML tags are replaced with nothing, all that is left is the remaining text.
How to Remove HTML Tags in Google Sheets Using Find and Replace
One flaw with this method is that although it will strip the HTML tags from your data, it creates a new column of data, and still leaves you with the original tagged data as well. If you delete the original data, the new stripped values will also disappear, as the formula has nothing to work on.
You can copy and paste your data as values, but this isn’t ideal. An alternative is to use Find and Replace to strip the HTML tags from your cells and leave the result in the same cell.
To remove HTML tags using Find and Replace in Google Sheets:
- Click the Edit menu.
- Select Find and Replace.
- In the Find field, enter the same regular expression that we used in our formula:
<[^<>]+>
- Ensure that Search Using Regular Expressions is checked, as otherwise, Find and Replace will search for the exact text of our regular expression.
- Click Replace All.
- The HTML tags should disappear, leaving only contain the data that you want.
Learn More About Google Sheets
Learning how to remove HTML tags in Google Sheets ensures that you don’t have to waste hours manually editing all of your data to remove the unwanted tags. Using either of the methods above, you can quickly get rid of the tags and just leave the data that you need. The ability to use regular expressions is a real lifesaver here; it’s definitely worth learning more about RegEx, as it can save you a huge amount of time.
There are plenty of other useful Google Sheets tips and tricks you can learn. If you’re trying to track time, you can learn how to use timestamps in Google Sheets. If your formulas aren’t giving you the results you want, you can learn how to show formulas in Google Sheets so you can check them for errors. And if you want to make use of data from a different document, you can learn how to query another sheet in Google Sheets.