1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Would a twitter scraper software be interesting?

Discussion in 'Twitter' started by Tazdingo1, May 1, 2015.

  1. Tazdingo1

    Tazdingo1 Registered Member

    Joined:
    May 1, 2015
    Messages:
    51
    Likes Received:
    2
    Im new here,

    I am just wondering:

    Would there be any general interest in a software that scans 250 000 000 twitter users for different features and outputs the accounts with those features in a textfile?

    For instance if you want to:
    - Output all accounts with the following/follower ratio of 1.32 and have the bio containg these words "word1, word2, word3" it will output it.
    - Output all accounts that can be DM:ed by anyone
    - Output all accounts that has an english/spanish/[INSERT LANG] bio
    ...etc

    You can fitler on the fly as per your own choice, literally anything you want between a quarter of a billion of twitter accounts.
    Any filtering would be done in seconds since I use optimized algorithms for each task that wil operate with the O(log(n)) speed.

    Right now I am doing it for fun, but would there be any interest for this to anyone?
     
  2. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    770
    Likes Received:
    278
    Location:
    PHP Scripting ;)
    I am more over interested about how you handle the API calls. Or is this based on curl/proxies? Does this include the UID scrapper as well?

    Well, been working making such scrappers and filters for clients for a while, but the numbers you are talking, and at the speed you are telling, I am afraid it would be reliable. I mean, it can be done when all datas are with us, but to grab the data, I mean the follower count and stuff, we need to call twitter one way or the other.
     
  3. Tazdingo1

    Tazdingo1 Registered Member

    Joined:
    May 1, 2015
    Messages:
    51
    Likes Received:
    2
    No, that the catch, the software actually does not have any interaction with the twitters API whatsoever. I use another method, a method that literally lets you fetch as many accounts you want ad infinitum.
    These 250 000 000 that I gathered took me only 3 days to gather for my PC with a bandwidth of 100 MB/S up and down.

    This means that there is no need to be afraid of twitters API rate-limiting since they are ridiculous.

    When it comes to analyzing the data, it is very easy too, and uses no interaction with twitters API.

    So in summary, a very solid scraper for a multitude of purposes.
    And with a repo of 250 000 000 , and with the ability to filter through literally any features you want, I am wondering if there is any interest of someone with this?
     
  4. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    770
    Likes Received:
    278
    Location:
    PHP Scripting ;)
    Good luck if it works as promised.There may be people who are interested.Hope this isnt a presale thread.

    And if you have a solid way to filter the accounts without having to fetch from Twitter, then thats cool, because all these time, I ahve been using queuing to fetch account details 100 chunks per request and then getting all the necessary details. That would be really cool.

    Good luck mate! :)
     
  5. Tazdingo1

    Tazdingo1 Registered Member

    Joined:
    May 1, 2015
    Messages:
    51
    Likes Received:
    2
    No this is only a thread so I get a public opinion,

    Thanks man!
     
  6. riktubrs

    riktubrs Regular Member

    Joined:
    Dec 8, 2010
    Messages:
    263
    Likes Received:
    68
    Occupation:
    Software Developer
    What do you mean you've been queuing to fetch the details?