ScrapeBox does not see AJAX?

Can ScrapeBox scrape from AJAX websites?

  • Yes

    Votes: 1 20.0%
  • Well... No

    Votes: 4 80.0%

  • Total voters
    5

callmemaximillian

Regular Member
Joined
Feb 9, 2016
Messages
265
Reaction score
92
I looked through the forum wanted to figure out, how can handle websites, where AJAX is used (you will not see data in the source code) and found the thread:

https://www.blackhatworld.com/seo/s...ith-prstorm-mode.129096/page-919#post-9376157

Basically no. Scrapebox uses raw sockets and threads, which don't support javascript, flash and other scripts. So if a page loads via ajax for example Scrapebox can only see what you would see in a browser if you turn scripts off.



In the custom harvester you could set the proxy change interval to 1 and it would change it after every request/page. Settings >> connections timeouts and other settings >> more harvester settings >> proxy change interval.

Maybe did someone had the experience with that? Can we handle AJAX websites with the ScrapeBox to scrape the data?o_O
 
You could rig something up using selenium pretty easily, although sockets is much much faster.

How much data do you need to scrape?
 
You could rig something up using selenium pretty easily, although sockets is much much faster.

How much data do you need to scrape?
Thanks for the reply, actually it was a test, wanted to figure out if ScrapeBox can handle AJAX itself. Just couple of the websites, will learn Selenium, think that it will be helpful. Also found information about Octoparse (think could be useful for smal projects)
 
Scrapebox dont have ability to scrape data that is behind js or using js. Other than that it can scrape anything. I don't understand why scrapebox team didn't updated this still after all these years. What you want to scrape exactly?
 
Scrapebox can't scrape pages that are generated on the fly, after the page has loaded. So basically, not it can't scrape AJAX pages.
 
As noted above Scrapebox uses raw sockets and threads. By convention these do not support scripts, javascript, ajax etc...

So scrapebox sees the internet as a browser with scripts turned off. Many sites will have no script versions of the site, but not all sites do this. So if the content is rendered with scripts off then scrapebox can see it, so you can turn off javascript in your browser and then load the page and look at the source.

Take note that often times a lot of content if not all of it is still in the source with javascript off, but many times the browser will just display a blank page even if the content is there and even if scrapebox can scrape it. So look at the html source.
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock