1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Regular expressions

Discussion in 'Black Hat SEO' started by cobra, Jul 30, 2010.

  1. cobra

    cobra Registered Member

    Joined:
    Dec 6, 2007
    Messages:
    82
    Likes Received:
    75
    I'm doning some amazon scraping, but the data I'm getting is not quite as clean as I wish. So I'd like to ask, if someone is skilled in using regular expressions. Basiclly I would like to extract the selected string (.*) EXCEPT the content of these html tags: <script>.*</script> and <div>.*</div>. I went through basic tutorials, but still haven't been able to come up with a regex that works.
     
  2. mazgalici

    mazgalici Supreme Member

    Joined:
    Jan 2, 2009
    Messages:
    1,489
    Likes Received:
    881
    Home Page:
    That's so complicated....
     
  3. sionsmith

    sionsmith Registered Member

    Joined:
    Jun 8, 2009
    Messages:
    80
    Likes Received:
    9
    Occupation:
    Professional Forex Trader
    Location:
    London
    Home Page:
    paste me or pm me the raw html and what you want the result to be and i'll do it.
     
  4. heavyweight

    heavyweight Junior Member Premium Member

    Joined:
    Aug 10, 2009
    Messages:
    131
    Likes Received:
    65
    You're better off learning regular expressions than requesting a specific regex.
    They may seem difficult at first but you need to find the right tutorial.
    Open several tutorials, start reading and you will get the hang of it.
     
  5. remotedb

    remotedb Registered Member

    Joined:
    May 24, 2010
    Messages:
    92
    Likes Received:
    7
    I'd just do it in Javascript, that's not really what Regex is for. Regex is for formatting and validation, not stripping.
     
  6. risefromdeath

    risefromdeath Power Member

    Joined:
    Jul 1, 2009
    Messages:
    650
    Likes Received:
    107
    in php
    PHP:
    preg_match("/<script>(.*?)<\/script>/ims",$html,$match,$flag);
    Hope that helps
    Thanks
     
  7. SEO20

    SEO20 Elite Member

    Joined:
    Mar 25, 2009
    Messages:
    2,017
    Likes Received:
    2,259
    http://regexlib.com/ is a great starting point and a source for the most common used regex.