madoctopus
Supreme Member
- Apr 4, 2010
- 1,387
- 3,707
Hey
Just finished doing 499500 comparisons on 1000 articles spun from the same parent seed. Thought to share the data with you.
Original article word count is 415 words. Spinning done with TBS in automatic mode. Click "replace everyone's favorites", choose maximum 4 synonyms (with the original word produces maximum 5 words in each seed spintax block. Subtitles and main title have been spun as well.
I then created 1000 spins of the article.
Then I started performing comparisons using an algorithm similar to the one used by CopyScape. I compared each article with every other article. In total 499500 were performed. If you're curious why not 1 million comparisons (1000*1000) is because you compare article N with articles from N+1 to 1000. N-gram size for the comparison was set to 3.
Data dump (global/comparisons, distinct):
unique 0%: 0 (0%), 0 (0%)
unique 10%: 0 (0%), 0 (0%)
unique 20%: 0 (0%), 0 (0%)
unique 30%: 112 (0%), 93 (9%)
unique 40%: 183659 (37%), 902 (90%)
unique 50%: 313930 (63%), 4 (0%)
unique 60%: 1799 (0%), 0 (0%)
unique 70%: 0 (0%), 0 (0%)
unique 80%: 0 (0%), 0 (0%)
unique 90%: 0 (0%), 0 (0%)
unique 100%: 0 (0%), 0 (0%)
Bottom line:
Basically weather you do a low number of spins or a high number of spins uniqueness will be mostly around 40-50%. That means if you think spinning like this "only" 100 times to get better uniqueness, you're wrong. Probably this uniqueness will stay around this interval even for more spins.
How do you get better uniqueness? You spin on multiple levels (paragraph level, sentence level, phrase level) and you change the order of the paragraphs (e.g. in one spin you have paragraphs in one order, in another in other order - obviously you must write the original article so it makes sense to change order of the paragraphs).
I will do another test soon with a multi-level manual spun article to see how numbers compare.
Just finished doing 499500 comparisons on 1000 articles spun from the same parent seed. Thought to share the data with you.
Original article word count is 415 words. Spinning done with TBS in automatic mode. Click "replace everyone's favorites", choose maximum 4 synonyms (with the original word produces maximum 5 words in each seed spintax block. Subtitles and main title have been spun as well.
I then created 1000 spins of the article.
Then I started performing comparisons using an algorithm similar to the one used by CopyScape. I compared each article with every other article. In total 499500 were performed. If you're curious why not 1 million comparisons (1000*1000) is because you compare article N with articles from N+1 to 1000. N-gram size for the comparison was set to 3.
Data dump (global/comparisons, distinct):
unique 0%: 0 (0%), 0 (0%)
unique 10%: 0 (0%), 0 (0%)
unique 20%: 0 (0%), 0 (0%)
unique 30%: 112 (0%), 93 (9%)
unique 40%: 183659 (37%), 902 (90%)
unique 50%: 313930 (63%), 4 (0%)
unique 60%: 1799 (0%), 0 (0%)
unique 70%: 0 (0%), 0 (0%)
unique 80%: 0 (0%), 0 (0%)
unique 90%: 0 (0%), 0 (0%)
unique 100%: 0 (0%), 0 (0%)
Bottom line:
Basically weather you do a low number of spins or a high number of spins uniqueness will be mostly around 40-50%. That means if you think spinning like this "only" 100 times to get better uniqueness, you're wrong. Probably this uniqueness will stay around this interval even for more spins.
How do you get better uniqueness? You spin on multiple levels (paragraph level, sentence level, phrase level) and you change the order of the paragraphs (e.g. in one spin you have paragraphs in one order, in another in other order - obviously you must write the original article so it makes sense to change order of the paragraphs).
I will do another test soon with a multi-level manual spun article to see how numbers compare.