1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Best way to scrape an html site for legitmate reasons

Discussion in 'Black Hat SEO Tools' started by MikeHunt00, Aug 31, 2012.

  1. MikeHunt00

    MikeHunt00 Registered Member

    Joined:
    Jul 19, 2012
    Messages:
    51
    Likes Received:
    15
    Hi BHW,

    I seek your wisdom once again (i hope this is the correct section).

    I am busy moving one of our old html sites to a new wordpress site i have designed and developed.

    The site itself is a tourism site with over 400 tours.

    Obvously if i do this manually it will take forever and since i am the only web dev here i simply dont have the time.

    Basically i need to scrape the html site and then get the data into a spread sheet so i can upload it to wordpress.

    The hard part is scraping the data. I know how to code so technical solutions wont be a problem.

    Ive tried a few methods already with no luck, as i mentioned time is off the essance so if anyone can suggest a script or relatively cheap programme for me to use it would be appreciated.

    Thanks alot
     
  2. schwarpitz

    schwarpitz Newbie

    Joined:
    Oct 27, 2008
    Messages:
    24
    Likes Received:
    3
    Location:
    Morocco
    Try to code your own. It's easy to do in Python.
     
  3. neutralhatter

    neutralhatter Jr. VIP Jr. VIP Premium Member

    Joined:
    Jun 23, 2010
    Messages:
    430
    Likes Received:
    330
    You could code it yourself or use software like zennoposter.... anyways you will still have to use regular expressions.
     
  4. SEOWhizz

    SEOWhizz Power Member

    Joined:
    Oct 22, 2011
    Messages:
    606
    Likes Received:
    432
    Location:
    Lat: 38N 43' 11.298" Long: 27W 12' 7.733"
    Here's a few options:

    - webextract. net can automate data extraction.
    - Software like: Httrack, Teleport Pro or Black Widow (sbl. net) can rip complete websites.
     
  5. soklot

    soklot Newbie

    Joined:
    Aug 24, 2012
    Messages:
    19
    Likes Received:
    3
    i have similar task for this weekend and i intend to use "jsoup"
     
  6. blogbd1

    blogbd1 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 19, 2008
    Messages:
    551
    Likes Received:
    353
    Location:
    Undetected
    You can create a scraper in vb.net using httpwebrequest. That will does the job.