[Guide] How to bulk remove undesired URLs from GSC after an attack

SirLouen

Jr. VIP
Jr. VIP
Joined
Jan 17, 2015
Messages
3,410
Reaction score
2,306
Website
linktr.ee
Nowadays it's pretty easy that your WordPress installation gets hacked and infected with thousands if not millions of subpages pretending to be indexed. Since the attacker genreally gets access to your directories, it's common that they insert the GSC code and validate an user to start indexing them like mad just for the sake of making profit of the hack as soon as possible.

After that, not only a classic recover with backup and all that will make things right. You need to perform further actions to restore your Google ranks (and probably you should be prepared to drop, unless you act really, really fast).

Fortunately, GSC offers a tool to deindex manually and works pretty well. The problem is that if we have like 5K url to remove, doing this manually can take ages. We need a tool that performs this task almost automatically.

And after doing a great research I have found that this free Chrome extension is the key:

There are two versions: the one that you can pick straightly from the Chrome store, which is not free. And one that can be compiled from sources from Github and is 100% free.

1. We need to compile then and you can't compile under Windows, we can do it either over Linux or Windows with a WSL2 or other virtualization tool.

1644320861220.png

2. Now we need a CSV of the URL we are looking to remove to use with the Bulk URL removal tool. There are multiple options to retrieve this data.

Option 1: Scraping real indexed sites

We could extract URL from SERPS with the command site and some software to help for so. There are some Chrome extensions that do this job, but the simple Bookmarklet by Chris Ainsworth does the task flawlessly (create a Bookmark and put this in the URL part)

https://pastebin.com/raw/Lr27v0js
(I can't paste the code here because the forum shoots an error)

Just remember that you can show up to 100 results per page to ease things up.
If you need more because you have been hacked with thousands if not millions of indexed page, you will need to search for other solutions. Comment and maybe we can find other tools that fullfil this task

Option 2: is using the Coverage tool in GSC

Just go to Coverage and check for the Valid tab -> Index, not submitted in sitemap (we have to assume that your sitemap contains all the URL you want to index some days after you have recovered, otherwise, just wait until you see a recent valid data)

From there you can easily export with the GSC tool offers.

If you have picked the attack really fast, then you may find some URLs at Crawled currently not indexed, which means, that they won't affect yet, and they won't probably index (just check some days after) since they will throw a 404 after you recover.

Conclusions

Just remember to keep your indexing profile as clean as possible, not only from the attack perspective, but your own site (remove 404, remove pages that are seldom visited and put a noindex tag, etc) It's important to have the least indexed pages as possible if they don't offer any value to your site, to give to Google the maximum value possible, with the least crawl budget required. This is a classic on-page SEO trick everyone should be taking advantage.
 
Last edited:
Top