1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Critique my Content Rewrite Tool

Discussion in 'BlackHat Lounge' started by scoop, May 31, 2008.

  1. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    Hello...

    I have been working on an online content rewrite tool. It's still a work in progress.

    Can you all try it and critique it for me?

    http://www.contentrewriteengine.com

    Thanks
     
  2. parsibagan

    parsibagan Junior Member

    Joined:
    May 27, 2008
    Messages:
    117
    Likes Received:
    22
    This was my submission and the highlighted words had a change option...

    "There is this friend of mine. He swears by Excel right down to the point of writing his love letters with the same. He has heard of Word and discards the same as trash. I shall not be surprised if he too gets trashed by his girlfriend one day"

    For friend I had options `acquaintance' & `alter ego'... 2nd is not correct

    Options for `swear' are acceptable

    Options for `trash' and there are several, cannot be used without changing context

    `Girlfriend' was treated as 2 words and my boss will kill me if I used `baby doll alter-ego' :p

    Software can achieve only up to a certain level, but cannot think as radically as the human brain does.. still I appreciate your effort :)
     
  3. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    Thanks for the critique. You are spot on with the limitations. It will never think as radically as a human despite the fact that submit articles remove absurdities from the database.

    As for the word choices, I have configured the software to allow the original word or phrase to be the first option just in case the other options are unacceptable.

    Thanks again.

    Edit: I'll make edits based on your critique.
     
    Last edited: May 31, 2008
  4. interpro

    interpro Registered Member Premium Member

    Joined:
    Feb 27, 2008
    Messages:
    92
    Likes Received:
    210
    Home Page:
    I did a quick check of your script and it basically does about the same as a few other scripts I've seen for 'rewriting' articles, which is to use a thesarus for word substitution. This approach, while somewhat helpful, does require human intervention in order to produce readable output.

    One of the reasons that word substitution doesn't work in 'full automatic' is that many words have different meanings, depending upon how they're used. For example, take the word 'run'.

    Run could mean to operate, as in 'Do you know how to run this machine?' It could also mean to walk quickly, as in 'I'm going to run 20 laps aroung the gym'. It could also mean to participate in a political campaign, as in 'He's going to run for office'. As you can see, the word 'run' can be a verb or a noun, depending upon how it's used in a sentence.

    If you could make your script more intelligent, perhaps it could work on full automatic - but it would need to evaluate each word and determine the context of its usage. That way, you'd have a good chance to produce good output without manual processing.

    Another approach would be to use phrase replacement instead of just word replacement. By replacing phrases, you'd have a much better chance of maintaining the proper context of usage, as opposed to individual word replacement.

    Any chance of you incorporating such changes to your script?
     
  5. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    You may be on to something. Yes, I could incorporate phrase replacement into the script. In fact, I could create a database that is strictly phrases and give users the option of individual word replacement or phrase replacement.

    If phrase replacement shows to work better than word replacement, then I would abandon the word database entirely.

    Thanks for the excellent critique.
     
  6. interpro

    interpro Registered Member Premium Member

    Joined:
    Feb 27, 2008
    Messages:
    92
    Likes Received:
    210
    Home Page:
    I think that doing the phrase replacement is not a technically challenging task. But coming up with a phrase replacement list will take some time and effort. I suspect that in order to be a useable program, the phrases must number into the thousands, don't you think?

    Let's say that you make a list of 2 or 3 thousand phrases, along with 1 or 2 substitutes for each phrase - that's a lot of phrases! I've searched for such a list, but haven't been able to find one. I'd be willing to bet that such a list does exist somewhere.

    If you happen to find a comprehensive phrase list, please let me know!
     
  7. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    Yep, you are correct. It will take a yeoman's effort. The word list (1600 words) took me about 45 days to construct. The phrase list will take much longer because you actually need to THINK of substitutes.

    I should have the phrase option complete by late-August or early-September.

    Later
     
  8. heretolearn

    heretolearn Registered Member

    Joined:
    Jul 1, 2008
    Messages:
    97
    Likes Received:
    8
    Wow that's a really cool and good idea. If you could tell the script to use a word as a noun then tell it to use the same word as a verb that would increase accuracy a great deal.
    [word]^noun
    [word]^verb
    [word]^my synonym list
     
  9. interpro

    interpro Registered Member Premium Member

    Joined:
    Feb 27, 2008
    Messages:
    92
    Likes Received:
    210
    Home Page:
    Determining if a word is a noun or a verb can be a little tricky. For example, take the word 'run'. Run can be a noun, as in 'He just hit a home run!'. It could also be a verb, as in 'I'm going to run around the block'.

    And, as a verb, the word 'run' can have different meanings. For example, in the following sentences, run is a verb -

    1) He's going to run for political office.

    2) Do you know how to run this machine?

    Although 'run' is a verb in both sentences, the word has different meanings.

    In sentence number 2, run could be exchanged with the word 'operate' and that would make sense -> Do you know how to operate this machine?

    But if you used the same substitution in sentence 1, you'd have 'He's going to operate for political office'.

    As you can see, doing direct word substitution doesn't always make sense.

    In order to get the exact meaning of words in a sentence, the software would not only have to decide the 'part of speech' of each word, it would also need to determine the specific meaning based on the context of usage.

    Not an easy problem to solve.
     
  10. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    Thanks for the input. I have been busy as of late with other projects and haven't had time to address some of the ideas presented.
     
  11. interpro

    interpro Registered Member Premium Member

    Joined:
    Feb 27, 2008
    Messages:
    92
    Likes Received:
    210
    Home Page:
    I understand your hesitation to make progress with this project. Compiling a big list of replacement phrases is not a quick or easy task. Writing the code to make everything work is the easy part - I've already done that.

    Building the phrase list is a tedious, time consuming job that you may want to consider outsourcing.
     
  12. Jcrueger

    Jcrueger Newbie

    Joined:
    Sep 14, 2008
    Messages:
    5
    Likes Received:
    0
    Home Page:
    Any chance of getting source for this? Cause i would'nt mind helping out in any way i can possibly with the Phrase list or something along those lines
     
  13. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    There is no chance in getting the source for this at this time. The source is really simple, but it took a lot of researching on the web and viewing CBTs at home to create it. I am no a web/PHP programmer at all.

    When I develop version 1, I would be willing to give away the beta source.
     
  14. cokely

    cokely Registered Member

    Joined:
    Jun 17, 2008
    Messages:
    64
    Likes Received:
    4
    I tried using it but it split up too many words for me to use, for example:

    researchers: it gave me the option to change the "search" part of researchers which is just annoying since it did that on a lot of words, I think it should only match whole words
     
  15. Jcrueger

    Jcrueger Newbie

    Joined:
    Sep 14, 2008
    Messages:
    5
    Likes Received:
    0
    Home Page:
    I'll keep any eye out for that and imo i believe replacing keywords that work in the same context mixed with keyphrases would definately be ideal for this tool. So i would'nt be dumping the keword list just yet.
    Hopefully i can get back to working on this parser for this thesaurus file
    i have, cause i would definately like to see this project move forward.
    Thx Jc
     
  16. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    That's a legitimate critique and a flaw I am going to eliminate.

    Thanks.
     
  17. loreen

    loreen Jr. VIP Jr. VIP Premium Member

    Joined:
    Feb 6, 2008
    Messages:
    276
    Likes Received:
    58
    Occupation:
    reporter
    Location:
    Neverland
    If you're looking for someone to help you with the phrases, you can contact my friend Elena - ICQ 353013665. She's looking for this kind of jobs :)
     
  18. headspin

    headspin Regular Member

    Joined:
    Jun 3, 2008
    Messages:
    234
    Likes Received:
    140
    Home Page:
    Just had a look at your demo. I'm thinking of making something like this myself, although I have a slightly different idea of how to go about things that I won't discuss here. A few pointers:

    1) You NEED a natural language parser. Visit hxxp://opennlp.sourceforge.net/projects.html to get some ideas on how to go about it the hard way. There is an easy way involving Google's API but it isn't as accurate. You have to reach a balance between the two (or come up with something new altogether).

    2) Thesaurus files are useful but they do tend to give weird results sometimes. A good trick if you're just spinning keywords is to determine whether other websites with similar keywords also have the suggested replacement within their keywords. If they do, there's a good bet that the replacement makes sense, and even if it doesn't make sense to a human at least a search engine will like it.

    3) I appreciate that you may have your reasons for coding this as a web app, but if it's gonna output quality articles in bulk, it will have to plunder server resources. Since investing in private hosting wouldn't really make sense at this point, I recommend coding this in a .NET language and distributing it as an installable. Alternatively, you could always leave what you have on the server and embed the more demanding code as Silverlight, that way it would run client side (which is also why I recomment .NET so you can always change your mind between the two later).


    EDIT:

    4) About phrase replacement: What you need is an idioms dictionary. There are bound to be one or two in XML format that you can download that will suit your needs perfectly. Alternately, you can always scrape it off idioms.thefreedictionary.com.
     
    Last edited: Sep 11, 2008
  19. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2
    With respect to #3, I don't know the first thing about .NET, or how to create a client-side executable. That would require extensive research on my behalf.
     
  20. scoop

    scoop Newbie

    Joined:
    Feb 10, 2008
    Messages:
    12
    Likes Received:
    2

    Yes, I am looking for someone to help with the words and phrases. I have found a way to prevent breaking up words, but the process is tedious and exhausting.