Best way to scrape an html site for legitmate reasons

Discussion in 'Black Hat SEO Tools' started by MikeHunt00, Aug 31, 2012.

  1. MikeHunt00

    MikeHunt00 Registered Member

    Joined:
    Jul 19, 2012
    Messages:
    51
    Likes Received:
    15
    Hi BHW,

    I seek your wisdom once again (i hope this is the correct section).

    I am busy moving one of our old html sites to a new wordpress site i have designed and developed.

    The site itself is a tourism site with over 400 tours.

    Obvously if i do this manually it will take forever and since i am the only web dev here i simply dont have the time.

    Basically i need to scrape the html site and then get the data into a spread sheet so i can upload it to wordpress.

    The hard part is scraping the data. I know how to code so technical solutions wont be a problem.

    Ive tried a few methods already with no luck, as i mentioned time is off the essance so if anyone can suggest a script or relatively cheap programme for me to use it would be appreciated.

    Thanks alot
     
  2. schwarpitz

    schwarpitz Newbie

    Joined:
    Oct 27, 2008
    Messages:
    24
    Likes Received:
    3
    Location:
    Morocco
    Try to code your own. It's easy to do in Python.
     
  3. neutralhatter

    neutralhatter Regular Member

    Joined:
    Jun 23, 2010
    Messages:
    437
    Likes Received:
    332
    You could code it yourself or use software like zennoposter.... anyways you will still have to use regular expressions.
     
  4. SEOWhizz

    SEOWhizz Power Member

    Joined:
    Oct 22, 2011
    Messages:
    608
    Likes Received:
    434
    Location:
    Lat: 38N 43' 11.298" Long: 27W 12' 7.733"
    Here's a few options:

    - webextract. net can automate data extraction.
    - Software like: Httrack, Teleport Pro or Black Widow (sbl. net) can rip complete websites.
     
  5. soklot

    soklot Newbie

    Joined:
    Aug 24, 2012
    Messages:
    19
    Likes Received:
    3
    i have similar task for this weekend and i intend to use "jsoup"
     
  6. blogbd1

    blogbd1 Power Member

    Joined:
    Apr 19, 2008
    Messages:
    568
    Likes Received:
    356
    Location:
    Undetected
    You can create a scraper in vb.net using httpwebrequest. That will does the job.