1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Why you can not build an automated spinner/rewriter.

Discussion in 'Cloaking and Content Generators' started by madoctopus, Feb 11, 2012.

  1. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    You can NOT build an automated spinner with just grammar checking. TBS already does grammar checking. What you need is more advanced NLP. Even so, it is stil imposible to get it to work well because of the word senses and context. For example you can have:


    1. The shoelaces from my boots are dirty. - boot = shoe. It is a noun.
    2. The computer boots quite fast. - boot = start - a process a computer does at startup. It is a verb.
    3. Jack gave him the boot with a satisfied grin on his face. - boot is part of an expression - "to give the boot" - to send him away. Here it is a complement but that is useless information because it is part of an expression.
    4. The Puss in Boots had a great success. - Here is a proper noun, it is part of the name of a movie.
    Now think of this sentence:


    Once your computer boots and you play The Puss in Boots, you realize it is not a movie about boots but about a pussy.


    Correct and makes sense. Do you think there's a chance to spin that and maintain the message correctly?


    Hence, Impossible to accurately detect sense. You can detect the POS (part of speech) but that is not enough. Even POS tagging is not entirely accurate without custom training on a specific domain/corpora.

    That being said, you can use more advanced NLP to improve the results of automated spinning a bit, but you still won't take it to a readable level. I do know at least two people who managed to successfully build automated spinners that produce very good results. However they have limitations - they either don't produce completely readable text or they require a certain amount of human intervention and work well only for articles with sentences structured in a certain way and no figures of speech or unusual expressions. Hence, they work well for spinning articles you write from scratch but not a solution to 100% auto generate content by spinning all the news/articles/blog posts that were created on the internet in a particular day. By the way, that would be my wet dream in terms of SEO - to be able to rewrite all content produced on the internet in a day, every day.

    So what can you do? It depends what you want. It is quite easy to build a content generator that can not trigger flags with Google. All you have to do is:

    1. Have mostly correct grammar
    2. Have mostly common n-grams - in other words NOT have n-grams that do not exist. For example a sentence like "the car eat an apple". "car eat an apple" has tiny/no chances to exist in real text because a car does not eat fruit. This is one of the reasons why Markov chains do not work well.
    Another thing you may want is to have a generator for readable and useful text. In this case you need to write a content generator that is based on massive amounts of hand spun text. I did this personally and the results are impressive. Problem is doing this takes months of spinning (at least if you want to be able to produce massive amount of content). Even so, the return of investment of this approach is better than that of any other aproach. The cost of an article written by hand would be in best case about $1-2 (usually it is $5-10, but you can find very cheap and good writers if you look hard). The cost of an article built by my generator, taking into consideration the human work time put into it, comes down to $0.02/article or less.


    The other solution, for those who do not have the knowledge and skills to build a complex generator as I did, is to simply hand spin lots of articles using multi-level spinning. Frankly, considering that you don't need any particular skills (aside from spinning) this is a very doable solution.
     
    • Thanks Thanks x 4
  2. Daisysiegal

    Daisysiegal BANNED BANNED

    Joined:
    Jan 22, 2012
    Messages:
    204
    Likes Received:
    16
    Awesome Idea Octo!
     
  3. yubrew

    yubrew Junior Member

    Joined:
    Oct 9, 2008
    Messages:
    117
    Likes Received:
    48
    MadOctopus, I have a lot of respect for the in depth analysis and thoughtful posts you have on unique content and the methods to measuring and making it unique on a mass scale.

    However, it seems that members such as ExpertPeon are doing quite well for themselves without making such a complex system as you describe. As someone new to SEO I am probably missing something, but what is the point of being able to rewrite "the entire internet's content" vs producing a ton of unique and pretty readable content?
     
  4. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    the idea is SEO = content. You may say SEO = links which is true but link have to exist in content so you need content. Hence, the more content you can produce the more money you can make.

    I'm not saying you can't make money with less hassle, but it is more dangerous in the long run. I don't see the point working my ass to build 2-300 sites and 10000 web 2.0/free blogs just so i have to start over because they got penalized or the accounts suspended. My rule is "if you do something, do it right and make sure you only have to do it once"

    ExpertPeon - any particular post you're referring to that I should read?
     
  5. shoaibahmad9999

    shoaibahmad9999 Power Member

    Joined:
    Mar 16, 2011
    Messages:
    780
    Likes Received:
    170
    Location:
    skype
    Thanks for the post Buddy,

    But what if a person does not have any coding skills to make a content generator ?
     
  6. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    I just said, you just do multi-level spinning. Or maybe the person should start learning how to code. Life isn't fair, the reason I win and my competitors loose is because I am better, I am smarter, I work harder, I have more money, I have more time, I have cheaper workers, etc. Improve in every aspect a bit and you will improve a lot overall. The sum of all parts is greater than the whole, as you may have heard.
     
  7. bk071

    bk071 Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Nov 24, 2010
    Messages:
    3,105
    Likes Received:
    7,917
    Occupation:
    I don't have a job
    Location:
    .............
    Heck yeah. There's no shortcuts to life... if you want something, you gotta get off your ass and get it.

    Nice post :)
     
    • Thanks Thanks x 1
  8. tobeto

    tobeto Junior Member

    Joined:
    Apr 10, 2010
    Messages:
    171
    Likes Received:
    19
    could you share your skills to build a complex generator?
     
  9. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    I could but it wouldn't help you. Skills are learned and not taken in a blink of an eye. But anyway, basic idea is I made in a programming language (PHP) all the "logic" that goes in your head when you write an article. Deductions, implications, etc. All combined with madlib spintax.
     
  10. AndrewRed

    AndrewRed Newbie

    Joined:
    Jan 7, 2012
    Messages:
    2
    Likes Received:
    0
    Similar to your method.

    I am thinking to automatically summarize article in a profitable niche, by paraphrasing. Thinking what would be the best way to monetized this method?

    Let says, if I rent three VPNs,
    - the first one running to scraped profitable niche
    - the second one running NLP algorithm to spit out new unique content by extraction & abstraction/paraphrasing
    - the third VPN, manage the infrastructure, e.g. register new domains, doing the analytics, running prosper 202, backlinking, running article submission.

    All foot print must be deleted, so each website is isolated from another. Just in case one of them is deindexed. Which means, I can not use adsense. I can not share the same affiliate numbers. For example if I create 100 websites, I will need 100 unique affiliate number to isolate one from another.

    Any thoughts? How should I monetize 100 websites with 1,000 pages unique contents each? The contents are human readable.


    From two research papers below, it seems NLP quite promising. Just need an implementation.
    www cs cmu edu/~nasmith/LS2/das-martins.07.pdf
    www-connex lip6 fr/~amini/RelatedWorks/ChallengesOfAutSumm_HahnMani2000.pdf
     
    Last edited: Feb 20, 2012
  11. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    I think one server is enough for everything. It is also easier to manage a system on just one server.

    About monetization, besides Amazon Associates Program all networks I know of leave footprints because they have just one affiliate ID. You could also sell links on the sites. I sell/rent some links at $15/PR2.