1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapping Amazon, Best Tunnel to start?

Discussion in 'Affiliate Programs' started by yeasin33, May 7, 2012.

  1. yeasin33

    yeasin33 Junior Member

    Joined:
    Mar 1, 2012
    Messages:
    114
    Likes Received:
    16
    Occupation:
    Student, Web Developer
    Location:
    1600, Amphitheatre Parkway Mountain View, CA 94043
    For the last 24+ hours, I am trying to understand the html structure of Amazon but in vain.
    Main Target: Collect as many ASIN id as i can. Their TITLE Their Category and the Largest Possible IMAGE.

    Problems:
    1. There is no tunnel that can reach me more and more ASIN ID.
    2. Html Structure is different for different categories.
    3. Api is too poor and i have to be zend Certified to understand. They no more linking api requests as the conversion is not as expected.
    4. Images are always under the shadow as they don't want images to be copied. Manually finding the largest possible image is easy but Pro-grammatically can't depend.

    IS There any way, that can reach me MAX ASIN ID,
    Get title, category, Largest Image without Scrapping.
     
  2. Chris Devon

    Chris Devon Power Member

    Joined:
    Jul 2, 2008
    Messages:
    507
    Likes Received:
    192
    If you are willing to pay for it, I have a gr8 Russian programmer that I can recommend.
     
  3. yeasin33

    yeasin33 Junior Member

    Joined:
    Mar 1, 2012
    Messages:
    114
    Likes Received:
    16
    Occupation:
    Student, Web Developer
    Location:
    1600, Amphitheatre Parkway Mountain View, CA 94043
    thanks but i need a better option. :)
     
  4. Chris Devon

    Chris Devon Power Member

    Joined:
    Jul 2, 2008
    Messages:
    507
    Likes Received:
    192
    Then I can only suggest that you use simplehtmldom class to simplify scraping so you don't need to deal with regex.
    And definitely use rollingcurl class so you get something like treading to speed things up since php doesn't have native threading.
    Oh, and maybe activemq (windows) or gearman (linux) if you want to implement a queue system.
    That's the best advice I can give you.
     
    • Thanks Thanks x 2
  5. Scripteen

    Scripteen Elite Member

    Joined:
    Sep 19, 2009
    Messages:
    1,811
    Likes Received:
    1,918
    Home Page:
    Totally agree with this. simplehtmldom+RollingCurl are god's gift for fast scraping with php.
     
  6. raylless

    raylless Newbie

    Joined:
    May 16, 2012
    Messages:
    1
    Likes Received:
    2
    no, I think amazon product advertise api can do all things.
    just google it. I could help you if you need.
    or search "Aba:auto build amazon " , the site is also what you want.
     
  7. yeasin33

    yeasin33 Junior Member

    Joined:
    Mar 1, 2012
    Messages:
    114
    Likes Received:
    16
    Occupation:
    Student, Web Developer
    Location:
    1600, Amphitheatre Parkway Mountain View, CA 94043
    Ooh dude. That sucks. APAA is extreme limited and the limit is considered when i know the ASINS. So, where to find the ASINS? Again i have to hit for Scrapping.
     
  8. HealeyV3

    HealeyV3 Power Member

    Joined:
    Mar 4, 2009
    Messages:
    521
    Likes Received:
    344
    Hi everyone,

    I'm an an avid Amazon Affiliate (That runs CustomAzon.com) and PHP programmer, and I can tell you from experience a couple things:

    1. There is no way to crawl Amazon and return over 50% of their products by their website itself.
    2. The API Limits 2,000 Requests per hour. Amazon has roughly 80,000,000 products. If you were running a single threaded API scraper that stays within their limits, you'd be looking at 1,666 days to scrape 80 Million products.
    3. The API is hard. Unless you are an avid programmer, I'm going to tell you right now you won't be able to use it.

    Now, if you are SERIOUS about this, and want to spend some time /and or / money on this, you could do it, but it'll require a lot of work.

    You'll obviously need reliable proxies, a database. You'll also need someone that can parse information from Amazon's actual site. You'll have to use proxies to scrape product details. I've scraped Amazon.com before and as long as you are using some proxies, and a reasonable thread rate based on the amount of proxies you have, you shouldn't have an issue. Using Proxies / DB / Scraper you can easily scrape the information you want.

    The HARD part is always getting ASIN's. The only way I know of getting ASIN's is by directly downloading gigs and gigs of sitemap data from Amazon, parsing that data into ASINs. It's all in XML, so parsing that by itself is a major complication for those that don't know a lot of programming.

    If you manage to get past all the road blocks and end up with a Database of Amazon products, it's kind of useless unless you can keep the products up to date. If you have a product with an outdated price on your website that links to Amazon.com, Amazon will ban your affiliate account. This is strictly against their TOS. All pricing must be up to date.

    All said, unless you are REALLY serious about doing what you are talking about, and unless it can make some SERIOUS money, I wouldn't go the route of compiling an Amazon DB.

    If you ARE really serious, and have a budget, I can help :)

    Cheers!
    -HealeyV3
    www.CustomAzon.com
     
  9. maximviper

    maximviper BANNED BANNED

    Joined:
    Oct 25, 2010
    Messages:
    338
    Likes Received:
    86
    us one for simplehtmldom+RollingCurl