1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

The Foreign Language Copycat Catcher

Discussion in 'Black Hat SEO' started by the-kashish, Sep 18, 2013.

  1. the-kashish

    the-kashish Junior Member

    Nov 15, 2011
    Likes Received:
    Bay Area
    Home Page:
    A Spanish University has come up with a tecnique detecting translated and rewritten text.
    I guess there should no suprise for Google detecting rewritten / spun text either unless its spun so hard its totally unreadable and makes no sence using Markov.

    "LAZY students take note ? lifting an article off the internet, translating it into another language and presenting it as your own work won't necessarily go unnoticed.
    It used to be really tough to spot this kind of plagiarism, thanks to creativity on the part of online translators. Not any more.

    A team led by Alberto Barron-Cedeno at the Polytechnic University of Catalonia, Spain, used a number of statistical methods to analyse suspicious-looking documents.
    One involved breaking each text down into fragments that were five sentences long and looking for elements of words that were similar in two languages.

    Another method used a bilingual dictionary to automatically check how many words in each text were the same.
    The documents could also be translated into a language with a common root to make the analysis easier.

    The results surprised even them: their technique showed "remarkable performance" not only in identifying entire documents that had been copied ? but in spotting tracts that made use of excessive paraphrasing, too (Knowledge Based Systems, doi.org/nqc). If a document is flagged by the system as being similar to another, then human experts can take a closer look."

    Scource: Newscietist.com