1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox 2.0 Get Image Url

Discussion in 'Black Hat SEO Tools' started by Rongreu, Jun 3, 2017.

  1. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    Hello everyone. I just started using Scrapebox and I have a problem expect people to help me. I just want to get only one primary image URL for a product URL. I use the "Grab images from harvested URL list" method but with a product URL I get a lot of related image links. Is there any way to help me achieve that? I am very grateful for all the help. Thank you very much
     
  2. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    Click on the Settings under "Grab images from harvested URL list" window and tick the Dont grab below and Dont grab above checkboxes and enter your primary image's resolution there. If there are multiple images in same resolution, there's no other way to bypass it within scrapebox.

    [​IMG]

    If you don't mind, paste a single product's URL here and I'll see if there's any other way to pick only the primary image.
     
    • Thanks Thanks x 3
  3. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    I am very thankful for your help. Updating image resolution is not possible because I want to get image url from a list of product links. Now for a specific example: I would like to get the image link "hxxp://images/ir/s7product/4WFD4_AS01.jpg" from the product link "hxxp://ams-threaded-regular-auger-heads_36814571/?selectedsku=221080&searchterm=221080". But when using the "Grab images from harvested URL list" method, I get a lot of image links returned. If you can help me then I am very thankful for that. Thanks for your attention.
     
  4. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    I believe the site you're trying to scrape the image URL's from is benmeadows.com. Also, I noticed you'd like to GET the product image's URL and not download it. Right?

    If that is the case, you shouldn't be using the "Grab images from harvested URL list" option but the Link Extractor addon.

    Install the Link Extractor addon from the Addons menu.

    Open Link Extractor addon.
    Click Settings and enter /s7product/ under the "Remove URLs not containing:" field and click ok.

    [​IMG]

    Load your product URL's list and click start.
    The addon will now extract all the links of the primary product image and save it in a txt file.

    If you want to download those images, simply import the list generated by link extractor addon into the scrapebox's main window and use the "Grab images from harvested URL list" option.

    Here's a sample product links list and it's product image URL's.

    [​IMG]
     
    • Thanks Thanks x 2
  5. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    Thank you. I'm trying to collect data from benmeadows.com. I'm doing it with your help. I do it automatically, only get product links and image links. Is there a way I do it for the whole site, not a specific case (as you have added "/ s7product /" under the "Remove URLs not containing:"). Again I appreciate your help. Thank you very much
     
  6. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    You'll have to find out all the products links first and you can do that by crawling the site.

    Click on the "Grab/Check" button in the main window and select "Grab links by crawling a site"

    Enter "http://www.benmeadows.com/product-index/" in the site box and 5 under Level and click start and then have a short nap.

    Import the generated list into scrapebox and click "Remove/Filter" button and select "Remove url's not containing" and enter 3. Click the same button and select "Remove duplicate URL's"

    You should now have all the product URL's on the site. Repeat the steps I said in post #4 to get each product image's url.

    That's it!
     
    • Thanks Thanks x 2
  7. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    Thank you very much. Wish you a good work day
     
  8. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    Hello. I am very grateful for your help with my problem. I just used Scrapebox and it is not clear about it yet. I use it to collect Website data. I have tried to learn the scrapebox 2.0 video tutorial but have not found a solution that can get the "Prices" from the URL product. Can Scrapebox do that? Again thank you for all the help.
     
  9. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    You could build your own module to get the "Prices" of the product using Custom Data Grabber under "Grab/Check" button. You should know regex to extract the prices.

    Here's a regex code which picks up all the prices on the page with "$" symbol.

    Code:
    \\$\\d+[.]?\\d*
    There's a catch though. Let's take this product for example. The product is available in four different sizes which can be selected from the drop-down list and the pricing is updated through Ajax but you can't automate it through Custom Data Grabber Module YET. The module is best used for scraping STATIC data that doesn't change after the page is loaded.
     
    • Thanks Thanks x 1
  10. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    I follow the steps you have taken, but the results are not just product URLs. The results did not return as expected. I tried the Addons "Scrapebox Sitemap Scraper". Filtering your product URL is easier, but I'm not sure it's all product URLs. Is there a better solution?
     
  11. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    You have come up with a solution. I will study about it. Thank you very much
     
  12. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    Of course, you'll get mixed results after crawling. You'll have to clean the list and that's what step is involved about :

    Code:
    Import the generated list into scrapebox and click "Remove/Filter" button and select "Remove url's not containing" and enter 3. Click the same button and select "Remove duplicate URL's"
    If you look closely, almost all the product URL's on that site has number 3 on it so we use that as a starting point to filter the product URL's from the crawled list.

    If the result is not satisfactory, try setting the crawl level to 10 instead of 5 and see if you're able to get more product URLs.
     
  13. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    The result is after filtration. A large number of URLs do not match the results. Elimination is really very difficult.
    [​IMG]
     

    Attached Files:

  14. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    Click "Remove/Filter" button and select "Remove url's containing" and enter "sign-in" in the Mask field. You'll have to repeat this process several times using the URL footprints that doesn't look like product URL's.
     
    • Thanks Thanks x 1
  15. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    Now that's the only method. I will do it and try to find a better solution. Thank you for helping a new person like me
     
  16. SpoonFeeder

    SpoonFeeder Senior Member

    Joined:
    Mar 19, 2017
    Messages:
    985
    Likes Received:
    768
    Gender:
    Male
    Occupation:
    SpoonFeeding & Babysitting the Noobs.
    Location:
    Click the link below if you're new to BHW!
    Home Page:
    Great! I don't mind helping new peoples. Especially when they have a beautiful girl's profile pic. Have a great weekend! :)
     
    • Thanks Thanks x 1
  17. Rongreu

    Rongreu BANNED BANNED

    Joined:
    Jun 3, 2017
    Messages:
    10
    Likes Received:
    1
    Gender:
    Female
    :D:D:D. Thank you. Wish you a happy weekend
     
    • Thanks Thanks x 1