Step 1: Find a Competitor's Sitemap Basically any site large enough to be worth scraping will have an XML sitemap. It should be located here [domain].com/sitemap.xml Step 2: Copy the Full Sitemap Easiest way to do this is a simple Ctrl+A and Ctrl+C Step 3: Extract the URLs Go to this URL: https://regex101.com Paste (Ctrl+V) your URLs into the "Test String" field. Next, paste this into the "Regular Expression" field. Code: http[^<"'\n\r]* If you're wondering, this will search for all strings that start with "http" and return everything from that point until the search runs into a "<", a single or double quote, or a linebreak. This will match all URLs. Step 4: Extract the Meta Tags This is one of various free tools that will do this: http://tools.buzzstream.com/meta-tag-extractor Tool has no cap, no account required. Extracted 5,000 URLs in 1-3 minutes or so. Scroll to the bottom of the page to download as CSV For each URL, the CSV will contain the HTML title tag, the meta desctiption, and the meta keywords. Step 5: Exploit the Data for Profit At this point, you have the data. Use your imagination -- there's a lot that you can do with it -- check search volume, find seed keywords, etc. Bonus: Quickly Parse Common Data from Common Conventions/Locations Quick tricks to parse data in the scraped fields: Meta Keywords Some sites use the meta keywords field to keep track of keywords, which gives you a ton of comma-separated keywords in the CSV's "Meta Keywords" field to work with right off the bat No parsing required Title Tags Title tags are also a great place to start and frequently contain keywords verbatim. It's very common for title tags be formatted like these: Some Keyword Here | Non-SEO Comment Here Buy Green Shoes | We're the #1 Company! How to Buy a Couch | Top 3 Tricks You can split those up into multiple cells with this spreadsheet formula: =SPLIT(B2,"|") URL Keywords are often included in the URL split by dashes. Those can be extracted with these formulas: =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z0-9]", " ") Removes the base URL and replaces all non-alphanumeric characters in the path with spaces: =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z]", " ") Removes all non-alphabetical (removes numbers as well) Note: If the domain you scraped is not a .com (a .net or other TLD), change the ".com" in either of those to your TLD.