1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How social media networks can combat spam, bots and digital theft (part 1)

Discussion in 'Programming' started by healzer, May 17, 2017.

  1. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,707
    Likes Received:
    2,366
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Spam, botting and theft are serious matters and social networks do their best to combat these.
    In this post we look for some basic techniques which social media sites (can) use to combat spam and bots in a quite effective manner.

    It's not a trivial task to detect spam and profiles which were built using bots. Oftentimes genuine users perform an act of spam and abusive advertising, but as long no damage/harm is done, these accounts will not be flagged/banned. Further more, manually reviewing all potentially fake/botted profiles is very intensive, thus it's only done in more serious cases (such as for bigger profiles).

    You may have noticed that some social media sites are utilizing much more intelligent approaches to combat spam and bots.
    Before we delve deeper into this statement, we must understand that spam and bots primarily exist for economic reasons (such as profits and monetary rewards). So, if we were a "social media site" (e.g. Tumblr, Pinterest, Instagram, ...) then we could introduce a system that attempts to separate spammers from legitimate users.

    To give you an example, assume our social media site has a feed of the most popular uploads of the past 24 hours (based on #likes). Then it's possible that a network of bots (under the supervision of the bot owner) will attempt to "rank" certain images on that top feed, resulting in a lot of new followers, traffic, leads and/or sales. It is in our best interest to make sure this does not happen. As developers/engineers we could propose various (hypothetical) solutions, such as detecting accounts which have a high chance they are botting and/or associated with many other accounts which are botting/spamming.

    We can establish a ranking for all profiles on our social media site and then give a weight to all their posts/activities. Now, the higher an account's spam-rate is, the lower the chance of its uploads to appear on the "most popular feed".

    Have a look at this figure:

    [​IMG]

    Let's introduce a spam-detection-function: F_spam(user) which calculates how "spammy" a certain user/profile is based on various criteria.
    Assume A is the collection of all profiles of legitimate users, such that for each user F_spam(user) < 50%
    And collection B, consisting of all spammy profiles, such that F_spam(user) >= 50%

    * All profiles/accounts whose spam ratio is exactly =50% fall both in group A and B, and these users need to be reviewed manually to determine their behavior and intentions (whether they are spammers or not).

    The beauty of this strategy, as the figure shows, is that we have 2 distinct and separated groups inside one big social media network. This allows us to ignore all accounts from group B when it comes down to our "most popular feed", or "best uploads of the year", etc... This will leave the botters/spammers in the dark with very little revenue/profit.

    I have searched through various research sites, such as sciencedirect and ieeexplore, but haven't found many articles that explicitly discusses this method. The following article, is most likely the closest one to my proposed approach:
    http://ieeexplore.ieee.org/document/7509326/

    You may ask how can we detect spam and establish a F_spam function?
    Luckily I have found an article explaining how: http://ieeexplore.ieee.org/document/7920623/
    It's a very short paper, however, it contains valuable information. They mention that spam posts are detected when:
    • they contain advertisement words such as "buy", "#buy" and website links. And it's almost immediately spam if the link is an affiliate URL.
    • they have many repetitive and/or duplicate words.
    • contain watermarks of website links, which can be detected on the images using OCR (optical character recognition).
    An interesting and highly experimental technique discussed in the paper is the detection of the contents displayed on the image (using machine learning). For instance if the image is taken during clear daylight, but the caption/description contains "#night", then something is clearly off.

    Now that we have looked at how social media sites protect themselves against bots and spam, we shall look at this method from a spammer's perspective. One may argue that spam is becoming very smart and intelligent, but when we look at some of the existing tools and software on the market, they are not intelligent at all. These tools simply emulate user's behavior (such as automatic uploading and liking content). The real power of spam lies in the hands of the spammer, not the tools themselves. Creative and smart social engineering can lure and trick users into a trap, make them buy/click wherever the spammer wants them.


    I have re-uploaded both articles, in case they ever get removed:
    http://docdro.id/usFcZkX
    http://docdro.id/2G9suNt

    Hope you learned something from this, or at least it provided you some inspiration :)
     
    • Thanks Thanks x 3
    Last edited: May 17, 2017
  2. mnunes532

    mnunes532 Supreme Member

    Joined:
    Jan 21, 2014
    Messages:
    1,438
    Likes Received:
    468
    Gender:
    Male
    Location:
    Portugal
    Awesome post :) Would be awesome to read or listen something from someone who already worked or is working on these systems against bots.
     
  3. zigzagtech

    zigzagtech Regular Member

    Joined:
    Jan 2, 2014
    Messages:
    320
    Likes Received:
    38
    Gender:
    Male
    Occupation:
    Get Custom Automation Tool - Cheap
    Location:
    india
    nice. ha ha ha .
     
  4. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,707
    Likes Received:
    2,366
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Exactly, I'm pretty sure those people are lurking on forums such as BHW,
    unfortunately they are not going to join this discussion :)
     
  5. Maks.KV

    Maks.KV Registered Member

    Joined:
    Jun 13, 2010
    Messages:
    53
    Likes Received:
    3
    Great info, thanks for sharing!
    Interesting introduction into big social players algos