1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Data Scraping

Discussion in 'Cloaking and Content Generators' started by blackbob1, Jul 30, 2013.

  1. blackbob1

    blackbob1 Newbie

    Joined:
    Jul 30, 2013
    Messages:
    5
    Likes Received:
    0
    I need to create a 1000 record database.

    Is there any free software that will let me grab content ie title, description etc from a webpage and store it into an database, or am I trying to do this wrong
     
  2. TZ2011

    TZ2011 Senior Member

    Joined:
    Jun 26, 2011
    Messages:
    832
    Likes Received:
    864
    Occupation:
    Cleaning servers
    Question is too wide to be answered just like that.
    But in general, pretty much everything is doable with curl and regex, there is a plenty of tutorials and half-done scripts on internet to look for, you will need to get the hands "dirty" to make it to work since each site has their structure and you will need to setup regex and specific elements to extract.
    You can start from here and see is it that what you need. If you need something simplier, you can try Chrome extension Scraper extension , if you need something more complex you can try with Python (if you are into programing) etc etc it's really up to your task and site that you need to work on.

    edit: I think that I misunderstood your question. You are looking to scrape specific elements on website or to put elements from your site in another database ? Can you be more specific ?
     
    Last edited: Jul 30, 2013
  3. Web Echo

    Web Echo Regular Member

    Joined:
    Apr 5, 2012
    Messages:
    328
    Likes Received:
    125
    Location:
    Online
    You may use imacros to get the data and save it into .csv database

    Here is a sample code:

    Code:
    VERSION BUILD=7601105 RECORDER=FX
    SET !ERRORIGNORE YES
    
    TAB T=1
    CLEAR
    URL GOTO=http://www.blackhatworld.com/blackhat-seo/cloaking-content-generators/591267-data-scraping.html
    
    TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
    
    SAVEAS TYPE=EXTRACT FOLDER=C:\Data.csv
    
     
  4. blackbob1

    blackbob1 Newbie

    Joined:
    Jul 30, 2013
    Messages:
    5
    Likes Received:
    0
    yeah I am looking to scrape very specific data related to products that go over a number of pages
     
  5. nnsjw702

    nnsjw702 Registered Member

    Joined:
    Jan 3, 2010
    Messages:
    77
    Likes Received:
    1
    Its very tough as the structure of every website differs from another. You can get lot of materials about it in google. It will take time to implement but worth taking efforts.
     
  6. hpv222

    hpv222 Power Member

    Joined:
    Feb 8, 2010
    Messages:
    736
    Likes Received:
    274
    Well, I think you pretty much answered your own question ;) If the data is "very specific," then it is unlikely that you will find a ready script or software and you will have to build a custom. Go to the programming section here and see if someone is willing to help
     
  7. Debugger

    Debugger Junior Member

    Joined:
    Aug 16, 2009
    Messages:
    174
    Likes Received:
    34
    Location:
    India
    I was searching on net.But did not find tuts like one you've.Do you have more links for tuts..more complex thingy..i can code in python or any language ..if you know any please do tell..thanks
     
  8. blackbob1

    blackbob1 Newbie

    Joined:
    Jul 30, 2013
    Messages:
    5
    Likes Received:
    0

    thx for this looks perfect I can get bits of it to work but looks like theres problems with the way urls are form I'll post in programming the specific question
     
  9. competent123

    competent123 Registered Member

    Joined:
    May 30, 2009
    Messages:
    55
    Likes Received:
    14
    arent' normal scrapers good enough to do that, assuming its the usual targets ( wordpress/forums?)
     
  10. blackbob1

    blackbob1 Newbie

    Joined:
    Jul 30, 2013
    Messages:
    5
    Likes Received:
    0
    no its products off an ecommerce site to populate Magento, after looking into it I dont think it is possible (with my limited knowledge)
     
  11. B0rman

    B0rman Newbie

    Joined:
    Apr 12, 2013
    Messages:
    5
    Likes Received:
    2
    and what would you do with that info? sounds useless to me
     
  12. j-pac

    j-pac Newbie

    Joined:
    Sep 6, 2011
    Messages:
    1
    Likes Received:
    0
    i used to scrape sites daily for title, description, price etc. There's a certain methodology that i developed that worked very well.

    1st you will need to scrape all URLs of the pages that you need to scrape data from - as someone mentioned above, iMacros works magic here. you will need to create a macro to scrape the URL's of All the pages that need to be scraped for the info you mentioned (i believe title, desc, etc.), and put those URLS in a CSV file.

    2nd step, you will need to create another iMacro to "loop" through the CSV file and go to each one of the URL's in the file and scrape the necessary info into another CSV file.

    the key here is that all pages that need to be scraped have the same HTML layout, or else your iMacro will fail out.

    PM if you need more help, i can try to help you out.