1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scammers stuff PDF documents with junk to help with SEO

Discussion in 'Black Hat SEO' started by Asif WILSON Khan, Jul 12, 2015.

  1. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,602
    Likes Received:
    34,744
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
    Google search poisoning ? old dogs learn new tricks


    These days, every company knows that having its website appear at the top of Google?s results for relevant keyword searches makes a big difference in traffic and helps the business. Numerous search engine optimization (SEO) techniques have existed for years and provided marketers with ways to climb up the PageRank ladder. In a nutshell, to be popular with Google, your website has to provide content relevant to specific search keywords and also to be linked to by a high number of reputable and relevant sites. (These act as recommendations, and are rather confusingly known as ?back links,? even though it?s not your site that is doing the linking.)
    Google?s algorithms are much more complex than this simple description, but most of the optimization techniques still revolve around those two goals. Many of the optimization techniques that are being used are legitimate, ethical and approved by Google and other search providers. But there are also other, and at times more effective, tricks that rely on various forms of internet abuse, with attempts to fool Google?s algorithms through forgery, spam and even hacking.
    One of the techniques used to mislead Google?s page indexer is known as cloaking. A few days ago, we identified what we believe is a new type of cloaking that appears to work very well in bypassing Google?s defense algorithms.
    The idea of cloaking is to tell Google?s search engine one thing when it comes looking, but show something completely different to human visitors.
    This is possible because search engines give away their presence by setting a special field inside the web request that asks for content. Where your browser might put text like ?User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3)? into its web request, Google identifies itself as ?Googlebot.?
    A cloaked page would serve the Googlebot with content that is stuffed with keywords to suggest that your site is relevant to specific search terms. In the past, this technique was heavily used in malware attacks, so that searching for ?Justin Bieber? and then following a link found in search results could actually take you to an exploit-ridden malicious website instead. (This Naked Security article explains how these attacks work.)
    But regular visitors would see a regular page, so everything would look normal and no one would realize that there was a problem worth reporting.
    The second most important part of search result manipulation is to ensure that Googlebot sees other relevant and well-ranked sites that include links to yours. This lets Googlebot assume that your website isn?t just relevant to those keywords, but is also popular and recognized by other Internet users. To make this happen, legitimate marketers rely on generating attractive content, building cross-linking agreements, promoting sites on social networks and paying for advertisements. On the other side, rogue SEO marketers spam their links on blogs and forums by posting fake comments, create dedicated websites to form a ?link farm? and, in the worst case, hack into legitimate sites to plant pages that link to theirs. This technique is known as link spamming.
    In response to this, the engineers at Google made a number of improvements to their page-ranking algorithms (notably the Panda engine releases). Those improvements aimed to make it difficult and expensive to achieve high page ranks using malicious methods. Today?s fine-tuned version is doing a good job against known techniques, but this doesn?t stop rogue actors from trying to find loopholes and weaknesses in the algorithm.
    Our discovery of a new search poisoning method came from a Sophos Antivirus detection that Jason Zhang of SophosLabs created based on a suspicious-looking PDF file. In short order, we received hundreds of thousands of unique PDF documents per day that triggered this detection.
    After quick inspection, we realized that someone was using cloaking techniques to poison search results, but instead of feeding fake HTML pages to the Googlebot, they were using PDFs instead.
    As far as we can tell, Google?s cloaking-detection algorithms, which aim to spot web pages that have been artificially (and unrealistically) loaded with keywords, aren?t quite so strict when the bogus content is supplied in a document. It seems that Google implicitly trusts PDFs more than HTML, in the same way that it trusts links on .edu and .gov sites more than those on commercial web pages.
    When doing a Google search for keywords found inside those PDFs we found a large amount of similar documents on a number of legitimate, but unrelated and likely compromised, websites. In addition to the heavy use of specific keywords, the PDFs include links to documents planted on other websites, forming a so-called ?back link wheel.?
    [​IMG]
    (Image source: Wikipedia)
    This trick seems to have been enough to trick Google into giving the documents an artificially high search ranking.
    The final step in the scenario was to redirect the unsuspecting users who click on a PDF link to a promoted website.
    We suspect that this technique could be used for a variety of purposes, including the distribution of malware. So far, however, we have only seen it in a marketing campaign to promote so-called ?binary trading? broker services.
    Here is an example of the first page of poisoned search results:
    [​IMG]
    Almost every link that we see on the results page belongs to this campaign. It is particularly successful and obvious when you search for a combination of lower-frequency keywords like ?Austria? and ?binary trading? as in the example above.
    When clicked, the PDF links redirect to the website for a ?binary options? trading broker:
    [​IMG]
    At a later stage the same links pointed to a seemingly different get-rich-fast scheme:
    [​IMG]
    In order to see the actual PDF document, we need to select its cached version in Google?s search result, in the menu next to the link:
    [​IMG]
    A document that looks legitimate at first glance turns into complete nonsense when you start reading it. Also, you can clearly see the hyperlinks placed throughout the document. Those are the links that, when followed, expose the whole link farm to the Googlebot.
    Many other phrases and keyword combinations within the document give us a good idea of what else we could search for. A quick analysis reveals that many three-word combinations found in the document would lead to the same PDFs when searched. Even a fairly broad search, like ?safe stock trade US? would bring those links to the very top of the results:
    [​IMG]
    In order to see what happens when Google?s crawler visits the link, we can run a web client program with the User-Agent header string set to ?Googlebot?:
    $ curl -is --user-agent "Googlebot" "http://www.[WEBSITE].com/?index.php?id=[ARGS]"
    HTTP/1.1 200 OK
    Date:
    Server: Apache
    Transfer-Encoding: chunked
    Content-Type: application/pdf
    %PDF-1.3
    1 0 obj
    << /Type /Catalog
    /Outlines 2 0 R
    [...]
    But to observe what unsuspecting users would see if they clicked on what they thought was a link to a PDF document, we can simply use a web browser with developer tools. Here is an example of the redirection chain that takes place:
    [​IMG]
    Not surprisingly, the redirection involves some TDS sites (Traffic Distribution Systems) that pass along a unique ID of the affiliate marketer responsible for this campaign.
    We provided detailed information about our findings to Google, along with notice about our intent to publish. Google acknowledged our communication but chose not to comment further. We trust that the necessary measures are being taken to counter these search result poisoning attempts.

    https://blogs.sophos.com/2015/07/07/google-search-poisoning-old-dogs-learn-new-tricks/
    http://www.csoonline.com/article/29...-bypass-google-filters-with-pdf-cloaking.html
    http://searchengineland.com/it-secu...chnique-through-cloaking-pdf-documents-224941
     
    • Thanks Thanks x 23
  2. welly_59

    welly_59 Power Member

    Joined:
    Aug 30, 2011
    Messages:
    698
    Likes Received:
    258
    Proper blackhat :)
     
  3. asap1

    asap1 BANNED BANNED

    Joined:
    Mar 25, 2013
    Messages:
    4,961
    Likes Received:
    3,185
    This is long, bookmarked for later lol :)

    Edit: Had nothing better to do so I read it.

    Now for everyone saying content is king this is a big penis slap in the face.

    Google is ranking that garbage 110% spun crap AND they are using a link wheel LMAO.

    Just had to say that lol, anyway this is a well thought of blackhat technique.
     
    • Thanks Thanks x 2
    Last edited: Jul 12, 2015
  4. Ch3Mik

    Ch3Mik Registered Member

    Joined:
    Apr 10, 2015
    Messages:
    99
    Likes Received:
    111
    Occupation:
    Survivor
    Location:
    Spain
    That was an amazing read, thank you. I do enjoy watching there will always be methods to make G sad :D
     
    • Thanks Thanks x 1
  5. Angilarie

    Angilarie Junior Member

    Joined:
    Aug 4, 2013
    Messages:
    148
    Likes Received:
    2
    Home Page:
    do you have one complete URL for me to check ??? i want to check out his set up please :)



    EDIT: and all the pieces fit now :)


    QUESTION: why PDF?????????????


    EDIT: PDF links are heavier ????????????
     
    Last edited: Jul 12, 2015
  6. Lunatic Call

    Lunatic Call Regular Member

    Joined:
    May 6, 2015
    Messages:
    276
    Likes Received:
    57
    Occupation:
    SEO
    Location:
    Heaven
    Read through. Thank you for information and GJ.
     
  7. satyr85

    satyr85 Jr. VIP Jr. VIP

    Joined:
    Aug 7, 2011
    Messages:
    633
    Likes Received:
    481
    Location:
    Poland
    This
    Bullshit content that rank in google.
    This method with pdf's is proof that money spend on hand written content is mostly money in toilet.
     
    • Thanks Thanks x 1
  8. bahus

    bahus Regular Member

    Joined:
    Jun 4, 2014
    Messages:
    353
    Likes Received:
    97
    Gender:
    Male
    And the don't link or post on unrelated websites goes out of the window as well.

     
  9. THUNDERELVI

    THUNDERELVI Elite Member

    Joined:
    Sep 12, 2009
    Messages:
    2,547
    Likes Received:
    2,200
    Gender:
    Male
    Location:
    W3
    I tried the search queries mentioned in the article and for "binary trading austria", 5 out of 10 results on first page are still PDF-s which redirect to a website LMFAO!
    This is one of the most blackhat techniques I have ever read, amazing. That's what I call grabbing Google by the balls lol...
     
    Last edited: Jul 12, 2015
  10. netcelal

    netcelal Senior Member

    Joined:
    Jul 12, 2009
    Messages:
    953
    Likes Received:
    376
    Location:
    7/24 Internet
    This is really amazing....
     
  11. Scraper9

    Scraper9 Jr. VIP Jr. VIP

    Joined:
    Feb 8, 2015
    Messages:
    609
    Likes Received:
    709
    Location:
    Evropa bro
    Good, good I enjoyed it very much!

    P.S
    ''-What do you do for a living?
    -I am scammer rouge SEO marketer how about you?''
     
  12. lagger

    lagger Power Member

    Joined:
    May 10, 2011
    Messages:
    511
    Likes Received:
    183
    OMG, why you outing my techniques??
    &#55357;&#56841;
     
  13. Sombrero

    Sombrero Supreme Member

    Joined:
    Feb 28, 2011
    Messages:
    1,209
    Likes Received:
    1,032
    Occupation:
    Driver
    Location:
    On The Road
    Google is giving too much power to files and videos instead of websites. They should focus on what they know before experimenting with video, photos and stuff. A lot of useless videos are at TOP for good keywords just because YouTube is the favorite son.
     
  14. luisk

    luisk Regular Member

    Joined:
    Feb 12, 2014
    Messages:
    441
    Likes Received:
    127
    Occupation:
    Entrepreneur
    Location:
    Some place in South america
    Wonderful reading. An article that honoured the name of this site "BlackHatWorld", it's hard to see something like this in this forum nowadays.
     
  15. KHer0

    KHer0 Supreme Member

    Joined:
    Mar 22, 2011
    Messages:
    1,364
    Likes Received:
    1,246
    Occupation:
    Architect
    Wow, Long time no see proper blackhat technique :D


    Sadly, most people manually spinning their articles think they are Blackhatter and Ask if it's Legal :D
     
  16. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,767
    Likes Received:
    11,424
    Occupation:
    COINZ
    Location:
    BUYAH
    Home Page:
    Holy ssshh!! That's great research there W130SN, thanks for this great post.
     
    • Thanks Thanks x 2
  17. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,602
    Likes Received:
    34,744
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
    Not my research, just copy/pasted from the sophos blog.
    Although. I have noticed PDF's ranking well for a number of terms for sometime.
     
    • Thanks Thanks x 1
  18. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,767
    Likes Received:
    11,424
    Occupation:
    COINZ
    Location:
    BUYAH
    Home Page:
    PDFs are everywhere in the SERPs. I noticed that as PDF.js became a native part of browsers written in 100% javascript, that now it's no longer such an inconvenient format. PDFs rank really really well lately, maybe because Googlebot itself may be running PDF.js and better "understanding" PDF.
     
  19. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,602
    Likes Received:
    34,744
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
  20. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,602
    Likes Received:
    34,744
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page: