1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[HELP] Working with Regular Expressions in Dreamweaver | Or content cleaner

Discussion in 'BlackHat Lounge' started by 67731, Oct 25, 2012.

  1. 67731

    67731 Regular Member

    Joined:
    Aug 27, 2011
    Messages:
    231
    Likes Received:
    47
    Occupation:
    SEO TECH - Looking to work for myself. . .
    Location:
    Las Vegas NV
    So I have a program that will scrap content, but the problem is it adds google adds to it. :(

    I am thinking of just using Dreamweaver to find and replace this junk I dont need. but the Regular Expressions are killing ME!

    There are two kinds of things I am working on first.

    <script type="text/javascript" src="www.link-to-junk.com/junk.js"></script>

    AND

    <script type="text/javascript"><!--
    google_ad_client = "fgh-pgh-547456856856789006";
    google_ad_slot = "dgfhdfgjfgj";
    google_ad_width = 675;
    google_ad_height = 160;
    google_ui_version = 1;
    google_ad_type = "text";
    google_override_format = true;
    //-->
    </script>

    I am using this,
    <script [\w\W]*</script>

    But it keeps removing 90% of content as well, I think this is due to where "*" is placed as it will do it one or more times, and most of the time there is an ad at the top and bottom of the content.

    Any pointers would be great, I want wanting to do this with dreamweaver as I have some where around 900 .txt files to clean, with more coming.
     
  2. 67731

    67731 Regular Member

    Joined:
    Aug 27, 2011
    Messages:
    231
    Likes Received:
    47
    Occupation:
    SEO TECH - Looking to work for myself. . .
    Location:
    Las Vegas NV
    DAMN: POSTED TO SOON!

    So I found the answer, well I found something that worked for me.

    <[^>]*>

    What this does is looks for "<" then removes that and anything between it and ">" so, this will remove ALL html tags! :)