could someone explain scraping to me?

Discussion in 'Proxies' started by eqpaisley, Nov 25, 2013.

  1. eqpaisley

    eqpaisley Junior Member

    Joined:
    Oct 16, 2012
    Messages:
    175
    Likes Received:
    48
    Occupation:
    EMT
    Location:
    Brooklyn
    So I get that there are public proxies out there and I see that some websites collect and post them. My question is, how does one come up with these lists? I have some scripting skills and would love to create a scraper but I don't even begin to know where to start.

    Do you spam IP ranges and wait for a connection? If these sites that aggregate proxies didn't exist, how would one know that these public proxies were available?

    EQP
     
    Last edited: Nov 25, 2013
  2. boombap

    boombap Junior Member

    Joined:
    Sep 20, 2012
    Messages:
    186
    Likes Received:
    108
    Occupation:
    IM
    Location:
    UK
    Home Page:
    We'll you wouldn't in short. Most scrape from proxy lists like you mention. Testing billions of ips and port combos would be insane and pointless.
     
  3. SnakePliskin

    SnakePliskin BANNED BANNED

    Joined:
    Nov 21, 2012
    Messages:
    401
    Likes Received:
    439
    One way is to type in the IP string into google. See every website that posted this IP as a proxy and then take all of proxy lists each website has. Type in a few other IPs into google that you got from the list, then take all of those proxies, text them in scrapeox, remove dupes.
     
    • Thanks Thanks x 2
  4. SPPChristian

    SPPChristian Jr. VIP Jr. VIP

    Joined:
    Oct 20, 2012
    Messages:
    1,304
    Likes Received:
    265
    Gender:
    Male
    Occupation:
    www.sslprivateproxy.com
    Location:
    www.sslprivateproxy.com
    Home Page:
    scan every IPv4 ip from 1.0.0.1 to 255.255.255.255 to see if they are open as proxies. There are a lot of proxies port scanners scripts out there for free.

    Code:
    Ex.: [URL]http://funoverip.net/2010/11/socks-proxy-servers-scanning-with-nmap/[/URL]
     
    • Thanks Thanks x 1
  5. eqpaisley

    eqpaisley Junior Member

    Joined:
    Oct 16, 2012
    Messages:
    175
    Likes Received:
    48
    Occupation:
    EMT
    Location:
    Brooklyn
    I get this, makes sense. But here do those LISTS come from? Someone, somewhere is busy pinging away, right? My goal is to find a source of fresh proxies since these public lists are so heavily trafficked.
     
  6. boombap

    boombap Junior Member

    Joined:
    Sep 20, 2012
    Messages:
    186
    Likes Received:
    108
    Occupation:
    IM
    Location:
    UK
    Home Page:
    you can buy private proxy lists here on bhw
     
  7. zacatictac

    zacatictac Power Member

    Joined:
    May 2, 2010
    Messages:
    654
    Likes Received:
    767
    Occupation:
    SEO
    Location:
    Metaverse
    Create or find A REGEX to match IPS. Create a script that downloads the sourcecode of a website and finds any Proxy IP using your REGEX and puts it in an array. Run each of these IPs through a variety of tests with whatever language you are using. In c# or vb.net I would make a web requestusing each IP. Check the timeout time and record that and catch any error to see if the proxys work or not. To test the anonymity use a php proxy judge that you can find online and host it yourself. My 2 cents.
     
  8. akoan

    akoan Newbie

    Joined:
    Nov 26, 2013
    Messages:
    1
    Likes Received:
    0
    Location:
    /dev/null
    As stated, one way to resolve open proxies is a tool like nmap. However you won't get very far unless you have a distributed scanning network, like what was done recently in this interesting project: (google internetcensus2012) but you still may get some hits. As to your question, scraping generally means taking an HTML page as input and using an HTML parser (e.g. JSoup, Enlive ) to extract the information you want from the page via XPATH expressions. With XPATH each element on the page has unique expression. Given a list of sites which have proxy lists, one *could* scrape through them and test programatically producing a real-time list of working proxies... hmmm.. sounds pretty useful if you need proxies.... PM me if you have any questions.
     
  9. EmailMaster

    EmailMaster Jr. VIP Jr. VIP

    Joined:
    May 28, 2011
    Messages:
    2,017
    Likes Received:
    562
    Occupation:
    Proxy & Account Seller
    Location:
    Canada
  10. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    13,685
    Likes Received:
    12,299
    Occupation:
    MACHIN LURNIN
    Location:
    TUVALU
    Home Page:
    That's 4 bilion hosts, about 64 thousand ports per host, so using that method you should get a list built by the year 2199 or something.
     
  11. SPPChristian

    SPPChristian Jr. VIP Jr. VIP

    Joined:
    Oct 20, 2012
    Messages:
    1,304
    Likes Received:
    265
    Gender:
    Male
    Occupation:
    www.sslprivateproxy.com
    Location:
    www.sslprivateproxy.com
    Home Page:
    yes of course i know that, but i gave him a simple solution :))