1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Recognize sentences

Discussion in 'General Programming Chat' started by Pfuesch, Sep 2, 2009.

  1. Pfuesch

    Pfuesch Newbie

    Joined:
    Jul 22, 2009
    Messages:
    10
    Likes Received:
    1
     
    Last edited: Sep 2, 2009
  2. heiska

    heiska Junior Member

    Joined:
    Dec 5, 2008
    Messages:
    139
    Likes Received:
    170
    Compare contents of each (div/table) tag against the search query used to locate the site. If a match is found, you have found your div tag which should contain the content. Also remember to strip eg. javascript (in order to avoid google ads/unrelated content in your article).

    Not a bulletproof solution but the best I could think of in a minute.
     
    • Thanks Thanks x 1
  3. Pfuesch

    Pfuesch Newbie

    Joined:
    Jul 22, 2009
    Messages:
    10
    Likes Received:
    1
    Thanks for the advice, heiska!

    What I came up with yesterday:

    1. Just allow a-z A-Z 0-9 , ! . ? -
    If there's any other character in it, it's not a sentence! This will filter out some correct sentences but works quite good...

    2. Check for the length and the number of spaces in it.

    3. Only grab content between p-html-tags!

    The results are pretty good now...