1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

could someone explain scraping to me?

Discussion in 'Proxies' started by eqpaisley, Nov 25, 2013.

  1. eqpaisley

    eqpaisley Junior Member

    Joined:
    Oct 16, 2012
    Messages:
    175
    Likes Received:
    48
    Occupation:
    EMT
    Location:
    Brooklyn
    So I get that there are public proxies out there and I see that some websites collect and post them. My question is, how does one come up with these lists? I have some scripting skills and would love to create a scraper but I don't even begin to know where to start.

    Do you spam IP ranges and wait for a connection? If these sites that aggregate proxies didn't exist, how would one know that these public proxies were available?

    EQP
     
    Last edited: Nov 25, 2013
  2. boombap

    boombap Junior Member

    Joined:
    Sep 20, 2012
    Messages:
    186
    Likes Received:
    107
    Occupation:
    IM
    Location:
    UK
    Home Page:
    We'll you wouldn't in short. Most scrape from proxy lists like you mention. Testing billions of ips and port combos would be insane and pointless.
     
  3. SnakePliskin

    SnakePliskin BANNED BANNED

    Joined:
    Nov 21, 2012
    Messages:
    401
    Likes Received:
    439
    One way is to type in the IP string into google. See every website that posted this IP as a proxy and then take all of proxy lists each website has. Type in a few other IPs into google that you got from the list, then take all of those proxies, text them in scrapeox, remove dupes.
     
    • Thanks Thanks x 2
  4. SPPChristian

    SPPChristian Jr. VIP Jr. VIP Premium Member

    Joined:
    Oct 20, 2012
    Messages:
    1,221
    Likes Received:
    239
    Location:
    United States
    Home Page:
    scan every IPv4 ip from 1.0.0.1 to 255.255.255.255 to see if they are open as proxies. There are a lot of proxies port scanners scripts out there for free.

    Code:
    Ex.: [URL]http://funoverip.net/2010/11/socks-proxy-servers-scanning-with-nmap/[/URL]
     
    • Thanks Thanks x 1
  5. eqpaisley

    eqpaisley Junior Member

    Joined:
    Oct 16, 2012
    Messages:
    175
    Likes Received:
    48
    Occupation:
    EMT
    Location:
    Brooklyn
    I get this, makes sense. But here do those LISTS come from? Someone, somewhere is busy pinging away, right? My goal is to find a source of fresh proxies since these public lists are so heavily trafficked.
     
  6. boombap

    boombap Junior Member

    Joined:
    Sep 20, 2012
    Messages:
    186
    Likes Received:
    107
    Occupation:
    IM
    Location:
    UK
    Home Page:
    you can buy private proxy lists here on bhw
     
  7. zacatictac

    zacatictac Power Member

    Joined:
    May 2, 2010
    Messages:
    598
    Likes Received:
    755
    Occupation:
    SEO
    Location:
    Metaverse
    Create or find A REGEX to match IPS. Create a script that downloads the sourcecode of a website and finds any Proxy IP using your REGEX and puts it in an array. Run each of these IPs through a variety of tests with whatever language you are using. In c# or vb.net I would make a web requestusing each IP. Check the timeout time and record that and catch any error to see if the proxys work or not. To test the anonymity use a php proxy judge that you can find online and host it yourself. My 2 cents.
     
  8. akoan

    akoan Newbie

    Joined:
    Nov 26, 2013
    Messages:
    1
    Likes Received:
    0
    Location:
    /dev/null
    As stated, one way to resolve open proxies is a tool like nmap. However you won't get very far unless you have a distributed scanning network, like what was done recently in this interesting project: (google internetcensus2012) but you still may get some hits. As to your question, scraping generally means taking an HTML page as input and using an HTML parser (e.g. JSoup, Enlive ) to extract the information you want from the page via XPATH expressions. With XPATH each element on the page has unique expression. Given a list of sites which have proxy lists, one *could* scrape through them and test programatically producing a real-time list of working proxies... hmmm.. sounds pretty useful if you need proxies.... PM me if you have any questions.
     
  9. EmailMaster

    EmailMaster Jr. VIP Jr. VIP Premium Member

    Joined:
    May 28, 2011
    Messages:
    1,587
    Likes Received:
    528
    Occupation:
    Proxy & Account Seller
    Location:
    Canada
  10. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    8,886
    Likes Received:
    7,480
    Occupation:
    ZLinky2Buy SEO Services
    Location:
    ⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩
    Home Page:
    That's 4 bilion hosts, about 64 thousand ports per host, so using that method you should get a list built by the year 2199 or something.
     
  11. SPPChristian

    SPPChristian Jr. VIP Jr. VIP Premium Member

    Joined:
    Oct 20, 2012
    Messages:
    1,221
    Likes Received:
    239
    Location:
    United States
    Home Page:
    yes of course i know that, but i gave him a simple solution :))