How to scrape any website whole posts with the help of sitemap.xml or RSS Feed

Ranking Boss

Newbie
Joined
Feb 22, 2022
Messages
12
Reaction score
2
Hello seniors how are you hope you all are doing great. I am new on BHW and this is my first post I need some help in web scraping and web automation if anyone can guide me I really thank full to you. My English is not good so plz ignore it :).


The problem is that I want to scrape all the posts website with the help of a sitemap or RSS feed and post to my own site's draft box, is it possible? For example here a website post sitemap ( https://whatsappinstalling.com/post-sitemap.xml ) this sitemap has 81 posts and for example i want to scrap all 81 posts and post on my site. How i can do this you know any plugin, software, or script, etc using that I can do this easily? Plz, guide me.


I also have a CyberSEO Lite free plugin and an octoparse account. I don't know much about octoparse but i try cybersave lite but when I provide the site RSS feed ( https://whatsappinstalling.com/feed ) the plugin extract 10 recently added post. How I can increase the RSS feed limit so that I extract all 81 posts?

Please guide me if anyone knows how to extract entire website posts using sitemap.xml or any way to increase the RSS feed limit. I am really tanksful to you.
 
Only the Pro version can do it. I mean to import all published posts from any WordPress site, even if its feed keeps only 10 recent ones.
 
Look here, friend. I believe this is exactly what you are searching for:

https://www.blackhatworld.com/seo/how-to-steal-thousands-of-wordpress-articles-all-automatically-making-100s-of-passive-income-using-this-tutorial-festinger-explains.1424441/
 
Well, like you scrape normally...

1) I would suggest Python.
2) Depending on the amount of pages and/or domains you want to scrape, get a DB. I prefer noSQL for this, like MongoDB
3) Get a parser to read the sitemaps - there are some usefull libraries, but it's mostly 5 lines of code.
4) request the url
5) if blocked, try again with proxy
6) parse html
7) if it's well formated, use something like Newspaper3k, otherwise get the relevant HTML Tags and do some dirty sorting.

Once written this well, with fallbacks, multithreading and stuff like this, a script like this should be able to scrape the most content of most sites with a performance of some hundred pages per second - depending on your system.
 
Everybody's suggest Python nowadays. Here in my country it's extreme popular among newbie programmers, because it's easy to learn even for housewives ;) The tread starter asks for a WordPress solution such as plugin. WordPress doesn't work with Python, BASIC, Assembler, C or Pascal. It's written in PHP and works with PHP only :)
 
Everybody's suggest Python nowadays. Here in my country it's extreme popular among newbie programmers, because it's easy to learn even for housewives ;) The tread starter asks for a WordPress solution such as plugin. WordPress doesn't work with Python, BASIC, Assembler, C or Pascal. It's written in PHP and works with PHP only :)
Yeah, but if you want to scrape on a serious level, you will have to script. And WP has a Database, you can feed with data. WP even has a powerfull API, so it actually can work with a python script ;)
But WP is not a scraping system and scraping on PHP base sucks. Yeah, you can basicly do it - but it really lacks in comfort stuff. And shouldn't be done on a webserver level as well ;)
 
why don't you usehttps://codecanyon.net/item/wordpress-automatic-plugin/1904470?

Does it scrape ALL existing posts from a given WordPress-powered site? How? :eek:
 
Last edited:
Yeah, let's be honest - PHP tends to be unneeded complicated in stuff, python handles in a line of code. It's not as comfortable, when it comes to large amount of data. Multithreading works great - but you really have to be into PHP. And DB Drivers like the MongoDB Driver are just not as good and tend to create bottlenecks. There are reasons, that basicly no one in Data Science uses PHP - but Python, Rust, JS, etc - even though it's basicly more performant and can do the same. I didn't want to say, PHP is worse - as said, it can even be more performant since php8. It's just really unhandy in comparisson.
When it comes to Frontend-Dev, I still use PHP. Where JS might even be a better choice, but I hate JS. On a personal level =D

In the end, whatever does the job, does the job.
 
Yeah, let's be honest - PHP tends to be unneeded complicated in stuff, python handles in a line of code.

One line? Show me it please. I don't think it will be a problem to post it here if you consider yourself a Python coder. I believe you read the the OP's initial post and understood his task.

In short: you have a WordPress-powered site and want to import all existing posts from yet another WordPress-powered site to your own.

So go on? shoot! I'm so excited to see your magic line of code... For real. I'm coding since 1991 - that's my primary job for the past 30 years, but this will definitely ruin all my programming skills.
 
Last edited:
WP Automatic plugin is the solution for all your needs mate.
 
WP Automatic plugin is the solution for all your needs mate.

Yet another one... Once gain: the mentioned above plugin as no tools for that task. So it's definitely not that the OP is asking about.

Why you guys post your "solutions" w/o reading the initial request? Look at my post right under the OP's one. It gives the exactly correct answer.
 
Yet another one...

Why you guys post your "solutions" w/o reading the initial request? Look at my post right under the OP's one. It gives the exactly correct answer.

You give one answer and it costs money. Do YOU actually read the thread? WP Automatic Plugin would work perfectly for what OP wants. I don't understand why you are saying it wont.

Take a look at the second reply 'by me' (right under yours). I've linked the full guide on how to copy articles from other sites. THAT is the correct answer as well.
 
You give one answer and it costs money. Do YOU actually read the thread? WP Automatic Plugin would work perfectly for what OP wants. I don't understand why you are saying it wont.

  1. I gave a solution which does exactly that was asked.
  2. I read the initial post and yes, the suggested by you plugin is useless for his task. If you think otherwise, just show me a quote from that plugin's description that says it can scrape all the posts from a given WordPress site even if it's feed has only 10 most recent ones, but not all 50, 1000 or 5000 of them? Ah, you can't.. So why you argue?
  3. The plugin you suggested also costs money. Surprised? Or maybe (I hope not of course) you suggest him to just steal someone's product?
 
In short: you have a WordPress-powered site and want to import all existing posts from yet another WordPress-powered site to your own.
I don't see any point, where he said, that he owns it. Why should he scrape em at all, if he owns it anyway?
I would argue ANY website whole posts would include ANY website and is not even limited to WP sites, surely not on the own websites, where you can just export/import posts.
 
https://www.blackhatworld.com/seo/free-wp-automatic-free-download-v3-56-2.1424241/

I've worked with WP Automatic and it's easy to configure pulling all articles from the blog of any WP site (even supports pagination). Similarly, if a website didn't want to work I used their sitemap inserted into WP Automatic to scrape all posts.

And, this is my last post in this thread. I know, you want to push your own product and it's fine, but I see no more reason to engage in any argument about this. Both products work I guess and possibly it's more work to set up WP Automatic but I have first-hand experience with it and it works just fine for what OP wants.
 
I don't see any point, where he said, that he owns it.

He doesn't say that he wants the site he want to scrape. He said he owns a WordPress-powered site because he needs a plugin for WordPress. Don for Drupal and not for Joomla. But even if that was so, both these CMS also written in PHP and Python is just useless there. Can you read?
 
And, this is my last post in this thread. I know, you want to push your own product and it's fine, but I see no more reason to engage in any argument about this.

I just want that you don't suggest someone to use a "nulled" software. Firs of all it's dangerous for a user and that's a crime, punishible by the Law.
 
Back
Top