Recognize sentences

Discussion in 'General Programming Chat' started by Pfuesch, Sep 2, 2009.

  1. Pfuesch

    Pfuesch Newbie

    Joined:
    Jul 22, 2009
    Messages:
    10
    Likes Received:
    1
     
    Last edited: Sep 2, 2009
  2. heiska

    heiska Junior Member

    Joined:
    Dec 5, 2008
    Messages:
    139
    Likes Received:
    170
    Compare contents of each (div/table) tag against the search query used to locate the site. If a match is found, you have found your div tag which should contain the content. Also remember to strip eg. javascript (in order to avoid google ads/unrelated content in your article).

    Not a bulletproof solution but the best I could think of in a minute.
     
    • Thanks Thanks x 1
  3. Pfuesch

    Pfuesch Newbie

    Joined:
    Jul 22, 2009
    Messages:
    10
    Likes Received:
    1
    Thanks for the advice, heiska!

    What I came up with yesterday:

    1. Just allow a-z A-Z 0-9 , ! . ? -
    If there's any other character in it, it's not a sentence! This will filter out some correct sentences but works quite good...

    2. Check for the length and the number of spaces in it.

    3. Only grab content between p-html-tags!

    The results are pretty good now...