1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrape amaz0n.c0m product reviews?

Discussion in 'Black Hat SEO Tools' started by conjug8, Jun 26, 2009.

  1. conjug8

    conjug8 Newbie

    Joined:
    Dec 9, 2008
    Messages:
    16
    Likes Received:
    1
    Ok, I am going to ask a question that I know somebody has an answer to...

    Before I get slammed let me say Ih have searched google and I have searched BHW looking for the answer. After about 3 hours I have tried many things and still do not have what I want (need).

    I can make an rss feed on "amaz0n product discussions", I can make an rss feed from specific "amaz0n reviewers"

    I created one on dapp3r.net (but I don't trust them to always be functional).

    There has to be an easy way to pull all the "product reviews" from Amaz0n, but know I think I am too close to the problem and just cant see the solution.

    Does anyone know a simple way to create an RSS feed to pull all the reviews for a specific product?

    thanks so much.

    Conjug8
     
  2. conjug8

    conjug8 Newbie

    Joined:
    Dec 9, 2008
    Messages:
    16
    Likes Received:
    1
    Anyone know how to do this?
     
  3. doczoidberg

    doczoidberg Junior Member

    Joined:
    Feb 10, 2008
    Messages:
    173
    Likes Received:
    55
    Occupation:
    Student
    Location:
    Germany
    you can write your own scraper for example in .NET or with php.

    in .NET htmlagility pack does a good job.

    ...i nevertheless think that amazon has the copyright on the reviews.
     
  4. whitecat

    whitecat Registered Member

    Joined:
    Dec 5, 2008
    Messages:
    95
    Likes Received:
    26
    Checkout BlackHatter rawasch's plugins. They are awesome.
     
  5. conjug8

    conjug8 Newbie

    Joined:
    Dec 9, 2008
    Messages:
    16
    Likes Received:
    1
    Thanks Doczoidberg... for the tip.

    However I would not even know where to begin to write my own scraper.

    As for the copyright... well there is always a way around that... wink, wink.

    Thanks for responding though.

    Conjug8
     
  6. conjug8

    conjug8 Newbie

    Joined:
    Dec 9, 2008
    Messages:
    16
    Likes Received:
    1
    Whitecat... Thanks for the recommendation on the plugins... I am already using them and they are awesome...

    It was his plugins that inspired this new path I want to try.

    I just need a way to pull the "reviews" into an RSS feed.

    conjug8
     
  7. chazz

    chazz Newbie

    Joined:
    Jun 26, 2009
    Messages:
    26
    Likes Received:
    4
    Home Page:
    Doczoidberg mentioned PHP, you have a few options for modules for PHP, i usually just use a series of explodes.

    You need to find where you want to start, and where you want to end, that's common with all scraping (VB, PHP, etc.)

    To scrape title in code below you could do:

    PHP:
    $title explode("<title>",$code); //Where $code = the page code
    $title explode("</title>",$title[1]);
    $title $title[0];
    Sample Code:
    Code:
    <html>
    <head>
    <title>Scrape Me</title>
    </head>
    <body>Demo</body>
    </html>
     
    • Thanks Thanks x 1
  8. biffbangpow

    biffbangpow Newbie

    Joined:
    Apr 18, 2009
    Messages:
    13
    Likes Received:
    8
    Get an AWS (Amazon Web Services) account. That gives you direct access through their API to their product database including all the other stuff that appears on product pages such as images and product reviews.

    They provide sample code for doing API access in their developer area - there's probably something you can use and modify from there to get the product reviews.

    Anything BH you're intending to do with the data will probably be against their TOS, and they can of course track your usage of their API since you need an account API key to use it. May lead to a ban of your AWS account, but I shouldn't think it's too difficult to get another one with Amazon. AFAIK the AWS and Associate/Affiliate accounts are separate, so getting banned on one wouldn't mean the other getting banned too - but maybe it would.
     
    • Thanks Thanks x 1
  9. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    999
    This is an example REST request that will output reviews for a particular item:
    Code:
    http://webservices.amazon.com/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=[AWS KEY GOES HERE]&Operation=ItemLookup&ItemId=[ITEM ASIN GOES HERE]&ResponseGroup=Reviews
    
    Now it's just a matter of structuring that data into a rss feed, pretty easy stuff if you know a little php.

    GL
     
    • Thanks Thanks x 1
  10. conjug8

    conjug8 Newbie

    Joined:
    Dec 9, 2008
    Messages:
    16
    Likes Received:
    1
    Chazz... Thanks for the input, however the php route is a road I don't really want to follow on this journey.

    biffbangpow... got the account... am looking around inside. (overwhelming what you can do), I hope to find a solution.

    Grizzy... Thanks for the REST request format... it is exactly what I am looking for I plugged in an AWS ID and the Product ID and it retrieved the results... only one problem, it only pulls the first five reviews... would you know what the request is to pull more?

    Thanks to all who have participated in this thread... I am almost there.

    Conjug8
     
  11. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    999
    Hey you're welcome conjug8! :)

    Hm, you know I haven't worked with reviews for a while, but my guess is if there are only 5 results being outputted (and you know there should be more), there should also a <TotalPages> tag towards the top of the document. If so, you can append "&ItemPage=[PAGE #]" to the end of your REST query to pull a particular page number.

    So what you can do, is pull the "TotalPages" and then loop through all of them. For PHP it would look something like this:
    Let me know if you catch my drift.

    GL man!