1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Tutorial] How search engines understad wich content is unique

Discussion in 'Black Hat SEO' started by naskootbg, Nov 4, 2016.

  1. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    I write this post mostly to help people that rewrite or spun content for their backlinks and have limited copyscape access. Because most of the content spinner tools create articles from paras from different articles and very regular the final result do not make sence, but if we can spin/rewrite single article wich will pass copyscape will be great correct!?

    I think the best plagiarism checker tool is copyscape. I think it is the tool that works very similar to search engine bots and even I think it works in the same way. I thought a lot for this and I consider, that this is the only algoritm to find if the content is unique in deep.

    Why?
    After I tried a lot of plagiarism checker tools (have tons free), I noticed that most of them just checking sentences or parts from content on g00gle search. Than they do simple math:
    numberUniqueSentences/totalNumberSentence*100
    In most cases when such tools return 90%+ uniquiness, the content passed copyscape. But, sometimes not! This is the reason in my head come the questions:
    How copyscape algoritm catching the dupe content?
    When they do it so fast (copyscape and search engines) it must be some simple thing, correct?

    So to answer these questions I tryed to make a programme, that will try to find the original source from spun content (like copyscape do). And consider to only way to do this is in 3 modules:
    1. split the content and chek each part of content against g00gle search (,;:!?. are parts' separators)
    2. if some part return results compire words popularity and words count between the checked content and the content of returned URL's content. If popular words and number of words are near the same, than most possibly the content is spun/rewritten and go to module 3.
    3. calculate how many words exact matches and if the % of the mached words is higher than e.g. 20%, mark as duplicate.

    I don't thing, there have other way to check fast.

    So how to be sure search engine algorithms and copyscape will not understand if we spin/rewrite an article (well copyscape at least)?
    Just be sure each part of the content returns no results from g00gle search. You can use some plagiarism checker from the download section on the forum.

    FAQ
    Q. Why it will pass copyscape?
    A. Copyscape will have nothing to use to move on the next steps from the algorithms.

    Q. Well, ofcourse if each part of the article returns no results from G00gle it is unique. Then why you wrote this?
    A. Because most common practice is to rewrite few sentences if copyscape returns dupe or even to skip/remove the paragraph. Now if you understand my post, with only free tools and very little rewrite - only the part of the article wich returns results, you can produce hq copyscape passed articles.
     
    • Thanks Thanks x 1
  2. W9go

    W9go Jr. VIP Jr. VIP Premium Member

    Joined:
    May 16, 2011
    Messages:
    4,861
    Likes Received:
    1,001
    Gender:
    Male
    Occupation:
    chasing girls
    Location:
    chasing girls
    in which "sizes" you spilt the content to check ?
     
  3. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    Split article on these symbols ,;:!?.
     
  4. webhostingproviders

    webhostingproviders Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 8, 2013
    Messages:
    1,450
    Likes Received:
    380
    Occupation:
    Internet Marketer
    Location:
    Planet Earth
    Home Page:
    can you please give some live example, I feel little confuse - you want to put the the above symbol between two paragraphs and check it on Google ?
     
  5. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    PART1: can you please give some live example
    PART2: I feel little confuse - you want to put the the above symbol between two paragraphs and check it on Google

    "can you please give some live example" returns 1 (or2) results from g00gle

    So if "can you please give some live example" is part from the 'article to check', than copyscape will move on second module (words compire)
     
    • Thanks Thanks x 1
  6. Donniefred

    Donniefred Registered Member

    Joined:
    Aug 20, 2016
    Messages:
    88
    Likes Received:
    6
    Location:
    BHW
    I still don't understand
     
  7. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    Let say our article is:

    [​IMG]
    When copyscape or your favorite tool split on parts and check on g00gle search the result will be similar to:
    [​IMG]

    In this example the copyscape algorithm will not start compire words, because no results matching to compire.

    Ofcourse the 'article to check' can be spit on different way, but my tests shows that splitting it on some of these symbols: [,:!?.] works.

    What you can't understand? Please, be more concrete.
     
  8. KHer0

    KHer0 Supreme Member

    Joined:
    Mar 22, 2011
    Messages:
    1,340
    Likes Received:
    1,224
    Occupation:
    Architect
    Emm, I believe it's more complicated that that. The same concept but might be taking a number of words together.

    Cuz, if it's checking sentence by sentence, then even if there is duplicate words it will pass. The only way it would be marked as duplicate if the two sentences are 100% exact
     
  9. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    Maybe the way it find the possible dupe targets is more complex, but my tests shows, that when each part of an article passed g00gle search, then the copyscape returns unique even it is single spun article.
    Other way to look up over the possible duplicate targets is some LSI engine like g00gle suggest, but it will require much more time to check each variant. There will have millions of variants for each article.

    Keep in mind, that I'm not claim copyscape use g00gle to check - even think they have own spider, becasue creating spider is really simple - just require resourses.
     
  10. abhi007

    abhi007 Jr. VIP Jr. VIP

    Joined:
    Aug 31, 2010
    Messages:
    5,795
    Likes Received:
    3,917
    Location:
    Theatre of dreams :)
    OP I am confused :(
     
  11. tokstesla

    tokstesla Power Member

    Joined:
    Aug 21, 2016
    Messages:
    666
    Likes Received:
    122
    Occupation:
    .....!
    Location:
    Cloud9
    Please,can someone explain this properly?:confused::confused::confused:
     
  12. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
  13. blogzandstuff

    blogzandstuff Elite Member

    Joined:
    Jan 1, 2015
    Messages:
    5,732
    Likes Received:
    2,656
    Occupation:
    blog creator
    Location:
    UK
    If I were you I'd invest in a spell checker first lol
     
    • Thanks Thanks x 1
  14. seotechlab

    seotechlab Regular Member

    Joined:
    Mar 18, 2014
    Messages:
    334
    Likes Received:
    38
    Location:
    Internet
    nice... i like it... thanks for the info...
     
  15. bloggerzone02

    bloggerzone02 Junior Member

    Joined:
    Jun 2, 2016
    Messages:
    154
    Likes Received:
    19
    Gender:
    Male
    Well, I am using CopyScape to check articles and It's a familiar and great tool as well. ;) I feel you make it complicated by explaining brief about CopyScape.Anyways, Thanks for sharing the info.
     
  16. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    As I wrote, I think copyscape is the best.
    Becasue copyscape is limited or paid.
     
  17. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    Now this will help all you understand better (even my bad grammar :) :
    PROOF
    This is paragraph from ezinearticles:
    [​IMG]
    When I chek agains google search I got this:
    [​IMG]
    Which means only 2 parts from the paragraphs returns results from g00gle search. If you check this content with copyscape will see it is duplicate.

    Now I rewrite the parts that returned results. They are actually in 1 sentence.
    You can easy see what I changed. When check aginst g00gle search it returned no results for all parts of the paragraph:
    [​IMG]
    When I checked this on free copyscape (you can check too), it is returned as unique.
    ;)
     
  18. blogtaufiq

    blogtaufiq Junior Member

    Joined:
    Mar 22, 2011
    Messages:
    147
    Likes Received:
    10
    thanks for your example. did you use copyscape service to check the sentence like the image below here?i never use copyscape before.

    [​IMG]


    i'm using the sample on grammarly. without editing. and it shows no error on plagiarsm. did it means this ezine article is not indexed on google?
     
    Last edited: Nov 6, 2016
  19. We Bring Rank

    We Bring Rank Jr. VIP Jr. VIP

    Joined:
    Aug 21, 2016
    Messages:
    592
    Likes Received:
    65
    Gender:
    Male
    Occupation:
    Digital Marketing Analyst
    Home Page:
    Thanks for your information..It will be very useful !
     
  20. naskootbg

    naskootbg Power Member

    Joined:
    Nov 8, 2010
    Messages:
    652
    Likes Received:
    221
    Home Page:
    This is from self developed on php plagiarism checker, wich can't be shared. There have free plagiarism checker tools on the download section and I post a link early on this thread.

    The ezine article is indexed - it's ID is 9457999 .
     
    Last edited: Nov 6, 2016