aishahriar
BANNED
- Jan 7, 2010
- 310
- 338
Hi folks,
I've been requested to provide a detailed guide on how to create your own proxy server for your scrapeboxing and other needs (and maybe XR as well if your host is fast/bulletproof enough). I'll use tinyproxy as it has a small memory footprint and does not require too much tinkering out of the box. If you prefer to have a caching proxy server go for Squid. So here goes...
[Warning: Big wall of text up ahead
]
Preamble - All of this stuff requires some basic server admin knowledge, specifically linux servers (I prefer CentOS). So if you've never done anything like that and would like to give this a go, I'd recommend you pick up a good book first and familiarize yourself with the basics. A good one to get started with is "the definitive guide to CentOS" from Apress. Just do a google search (or search on 4shared dot com) to find a copy - buy it if you can. Also ensure you setup firewall properly as your proxy server is only as secure as your whole VPS. Another good resource from Apress, although a bit dated, is "Hardening Linux" by James Turnbull. I would seriously recommend having a firewall configuration that allows access to ssh and proxy server ports only from your home ip address (if you have a static ip address from your isp).
Step ONE - So, we've got our centos vps up and running, and would like to add a proxy server. First thing to do is to add the EPEL repository to your yum repo database. Here's what to do:
Too easy? now this just installed an older version of tinyproxy that is ok for our purposes. But if you want to have the latest and greatest version you're gonna have to download and compile the source (which I don't suggest newcomers to linux try to perform, at least not without help - you can seriously stuff up your server). Now that I've scared you enough, I'll plan on providing a follow-up guide to this to get you compiling and installing tinyproxy from source code.
Step THREE - what, another step? I thought it was installed already! Ah but we are yet to configure it for our use.
or
you should see new entries in your running processes, also verify that tinyproxy is listening on the right port that you specified in the conf file by issuing this command
in my case this line shows up
tcp 0 0 :::8080 :::* LISTEN 3806/tinyproxy
Step FIVE - Hopefully the proxy server is up and running now. Go and point your browser proxy setting to the IP address and port of the VPS/proxy, and navigate to whatismyipaddress dot com. It should show your proxy address instead of your home address. Also check their advanced proxy check page at whatismyipaddress dot com / proxy-check, the results should be all False (no proxy detected).
Congrats! you've just configured a highly anonomyous, secure, and private proxy for yourself. If it didn't work for any reason, not to worry, just post here or pm me and we'll sort it out together.
I've been requested to provide a detailed guide on how to create your own proxy server for your scrapeboxing and other needs (and maybe XR as well if your host is fast/bulletproof enough). I'll use tinyproxy as it has a small memory footprint and does not require too much tinkering out of the box. If you prefer to have a caching proxy server go for Squid. So here goes...
[Warning: Big wall of text up ahead
Preamble - All of this stuff requires some basic server admin knowledge, specifically linux servers (I prefer CentOS). So if you've never done anything like that and would like to give this a go, I'd recommend you pick up a good book first and familiarize yourself with the basics. A good one to get started with is "the definitive guide to CentOS" from Apress. Just do a google search (or search on 4shared dot com) to find a copy - buy it if you can. Also ensure you setup firewall properly as your proxy server is only as secure as your whole VPS. Another good resource from Apress, although a bit dated, is "Hardening Linux" by James Turnbull. I would seriously recommend having a firewall configuration that allows access to ssh and proxy server ports only from your home ip address (if you have a static ip address from your isp).
Step ONE - So, we've got our centos vps up and running, and would like to add a proxy server. First thing to do is to add the EPEL repository to your yum repo database. Here's what to do:
- login to your vps as root (use an ssh client, if you're using windows Putty is an excellent and free ssh client)
- type this code to see if your yum installer is working properly
-
Code:
yum update
- hit y for any updates, and install them. this also means your net access is working properly, if not check your dns configuration.
- now we'll install the EPEL repo as follows
-
Code:
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
- that should install the repo, verify by typing this command
-
Code:
yum repolist
- and see if it shows epel in the list. assuming it does, i'd also suggest setting up yum priority plugin (but that's optional)
Code:
yum install tinyproxy
Step THREE - what, another step? I thought it was installed already! Ah but we are yet to configure it for our use.
- Your tinyproxy.conf file should be installed in the /etc folder, but you can find out with this command
-
Code:
whereis tinyproxy
- open up the conf file in a text editor, nano is a simple and good enough for this
-
Code:
nano /etc/tinyproxy.conf
- this should bring up the configuration file. Now I'll paste my current conf file and mark the changes that you need to make within [third brackets], DO NOT paste in these comments in your conf file. Here's the conf file
-
Code:
## ## tinyproxy.conf -- tinyproxy daemon configuration file ## ## This example tinyproxy.conf file contains example settings ## with explanations in comments. For decriptions of all ## parameters, see the tinproxy.conf(5) manual page. ## # # User/Group: This allows you to set the user and group that will be # used for tinyproxy after the initial binding to the port has been done # as the root user. Either the user or group name or the UID or GID # number may be used. # User tiny [user and group might be different in your installation] Group tiny [as long as it's not root it doesn't matter, but its better to have a user and group that does not have shell access] # # Port: Specify the port which tinyproxy will listen on. Please note # that should you choose to run on a port lower than 1024 you will need # to start tinyproxy using root. # Port 8080 [my preference, you can set it to any other port of your choice, but don't go for a port below 1024] # # Listen: If you have multiple interfaces this allows you to bind to # only one. If this is commented out, tinyproxy will bind to all # interfaces present. # #Listen 192.168.0.1 # # Bind: This allows you to specify which interface will be used for # outgoing connections. This is useful for multi-home'd machines where # you want all traffic to appear outgoing from one particular interface. # #Bind 192.168.0.1 # # BindSame: If enabled, tinyproxy will bind the outgoing connection to the # ip address of the incoming connection. # BindSame yes [important to set to yes, easier management of multi IP proxies] # # Timeout: The maximum number of seconds of inactivity a connection is # allowed to have before it is closed by tinyproxy. # Timeout 600 # # ErrorFile: Defines the HTML file to send when a given HTTP error # occurs. You will probably need to customize the location to your # particular install. The usual locations to check are: # /usr/local/share/tinyproxy # /usr/share/tinyproxy # /etc/tinyproxy # #ErrorFile 404 "/var/tinyproxy/share/tinyproxy/404.html" #ErrorFile 400 "/var/tinyproxy/share/tinyproxy/400.html" #ErrorFile 503 "/var/tinyproxy/share/tinyproxy/503.html" #ErrorFile 403 "/var/tinyproxy/share/tinyproxy/403.html" #ErrorFile 408 "/var/tinyproxy/share/tinyproxy/408.html" # # DefaultErrorFile: The HTML file that gets sent if there is no # HTML file defined with an ErrorFile keyword for the HTTP error # that has occured. # DefaultErrorFile "/var/tinyproxy/share/tinyproxy/default.html" [might vary in your case, my installation prefix was "/var/tinyproxy" and I installed from source, you can find your installation directory from the whereis tinyproxy command] # # StatHost: This configures the host name or IP address that is treated # as the stat host: Whenever a request for this host is received, # Tinyproxy will return an internal statistics page instead of # forwarding the request to that host. The default value of StatHost is # tinyproxy.stats. # #StatHost "tinyproxy.stats" # # # StatFile: The HTML file that gets sent when a request is made # for the stathost. If this file doesn't exist a basic page is # hardcoded in tinyproxy. # StatFile "/var/tinyproxy/share/tinyproxy/stats.html" [your file path may vary, common directories are /usr/share/tinyproxy/ and /usr/local/share/tinyproxy/] # # LogFile: Allows you to specify the location where information should # be logged to. If you would prefer to log to syslog, then disable this # and enable the Syslog directive. These directives are mutually # exclusive. # #LogFile "/var/tinyproxy/var/log/tinyproxy.log" # # Syslog: Tell tinyproxy to use syslog instead of a logfile. This # option must not be enabled if the Logfile directive is being used. # These two directives are mutually exclusive. # Syslog On # # LogLevel: # # Set the logging level. Allowed settings are: # Critical (least verbose) # Error # Warning # Notice # Connect (to log connections without Info's noise) # Info (most verbose) # # The LogLevel logs from the set level and above. For example, if the # LogLevel was set to Warning, then all log messages from Warning to # Critical would be output, but Notice and below would be suppressed. # LogLevel Warning [might want to set this to info at the beginning to see all the connections, but remember to change back to warning level, otherwise your log files will be cluttered up] # # PidFile: Write the PID of the main tinyproxy thread to this file so it # can be used for signalling purposes. # PidFile "/var/tinyproxy/var/run/tinyproxy.pid" [again, find your directory, most probably would be under /var/run/; you will know the pid file location by opening up the startup script in nano, it is at /etc/init.d/tinyproxy] # # XTinyproxy: Tell Tinyproxy to include the X-Tinyproxy header, which # contains the client's IP address. # XTinyproxy No # # Upstream: # # Turns on upstream proxy support. # # The upstream rules allow you to selectively route upstream connections # based on the host/domain of the site being accessed. # # For example: # # connection to test domain goes through testproxy # upstream testproxy:8008 ".test.domain.invalid" # upstream testproxy:8008 ".our_testbed.example.com" # upstream testproxy:8008 "192.168.128.0/255.255.254.0" # # # no upstream proxy for internal websites and unqualified hosts # no upstream ".internal.example.com" # no upstream "www.example.com" # no upstream "10.0.0.0/8" # no upstream "192.168.0.0/255.255.254.0" # no upstream "." # # # connection to these boxes go through their DMZ firewalls # upstream cust1_firewall:8008 "testbed_for_cust1" # upstream cust2_firewall:8008 "testbed_for_cust2" # # # default upstream is internet firewall # # default upstream is internet firewall # upstream firewall.internal.example.com:80 # # The LAST matching rule wins the route decision. As you can see, you # can use a host, or a domain: # name matches host exactly # .name matches any host in domain "name" # . matches any host with no domain (in 'empty' domain) # IP/bits matches network/mask # IP/mask matches network/mask # #Upstream some.remote.proxy:port # # MaxClients: This is the absolute highest number of threads which will # be created. In other words, only MaxClients number of clients can be # connected at the same time. # MaxClients 9 [if you will be running more than 9 concurrent threads using your proxy server set this higher] # # MinSpareServers/MaxSpareServers: These settings set the upper and # lower limit for the number of spare servers which should be available. # # If the number of spare servers falls below MinSpareServers then new # server processes will be spawned. If the number of servers exceeds # MaxSpareServers then the extras will be killed off. # MinSpareServers 1 MaxSpareServers 1 # # StartServers: The number of servers to start initially. # StartServers 1 # # MaxRequestsPerChild: The number of connections a thread will handle # before it is killed. In practise this should be set to 0, which # disables thread reaping. If you do notice problems with memory # before it is killed. In practise this should be set to 0, which # disables thread reaping. If you do notice problems with memory # leakage, then set this to something like 10000. # MaxRequestsPerChild 0 # # Allow: Customization of authorization controls. If there are any # access control keywords then the default action is to DENY. Otherwise, # the default action is ALLOW. # # The order of the controls are important. All incoming connections are # tested against the controls based on order. # Allow XXX.XXX.XXX.XXX [Important: set this to your home IP address, this will complement our firewall security measure. If your firewall does not block access to your proxy port AND you don't specify any IP address here this will be an open proxy i.e. anyone can get access to your not-so-private proxy] # # AddHeader: Adds the specified headers to outgoing HTTP requests that # Tinyproxy makes. Note that this option will not work for HTTPS # traffic, as Tinyproxy has no control over what headers are exchanged. # #AddHeader "X-My-Header" "Powered by Tinyproxy" # # ViaProxyName: The "Via" header is required by the HTTP RFC, but using # the real host name is a security concern. If the following directive # is enabled, the string supplied will be used as the host name in the # Via header; otherwise, the server's host name will be used. # ViaProxyName "tinyproxy" # # DisableViaHeader: When this is set to yes, Tinyproxy does NOT add # the Via header to the requests. This virtually puts Tinyproxy into # stealth mode. Note that RFC 2616 requires proxies to set the Via # header, so by enabling this option, you break compliance. # Don't disable the Via header unless you know what you are doing... # DisableViaHeader Yes [this option might be missing from your copy, it's available in the recent versions though. This turns the proxy server into more anonymous mode allowing it to pass whatismyipaddress dot com proxy tests and others] # # Filter: This allows you to specify the location of the filter file. # #Filter "/var/tinyproxy/etc/filter" # # FilterURLs: Filter based on URLs rather than domains. # #FilterURLs On # # FilterExtended: Use POSIX Extended regular expressions rather than # basic. # #FilterExtended On # # FilterCaseSensitive: Use case sensitive regular expressions. # #FilterCaseSensitive On # # FilterDefaultDeny: Change the default policy of the filtering system. # If this directive is commented out, or is set to "No" then the default # policy is to allow everything which is not specifically denied by the # filter file. # # However, by setting this directive to "Yes" the default policy becomes # to deny everything which is _not_ specifically allowed by the filter # file. # #FilterDefaultDeny Yes # # Anonymous: If an Anonymous keyword is present, then anonymous proxying # is enabled. The headers listed are allowed through, while all others # are denied. If no Anonymous keyword is present, then all headers are # allowed through. You must include quotes around the headers. # # Most sites require cookies to be enabled for them to work correctly, so # you will need to allow Cookies through if you access those sites. # #Anonymous "Host" #Anonymous "Authorization" #Anonymous "Cookie" Anonymous "Accept" Anonymous "Accept-Charset" Anonymous "Accept-Encoding" Anonymous "Accept-Language" Anonymous "Authorization" Anonymous "Cache-Control" Anonymous "Connection" Anonymous "Content-Length" Anonymous "Content-Type" Anonymous "Cookie" Anonymous "Date" Anonymous "Expect" Anonymous "Host" Anonymous "If-Match" Anonymous "If-Modified-Since" Anonymous "If-None-Match" Anonymous "If-Range" Anonymous "If-Unmodified-Since" Anonymous "Pragma" Anonymous "Range" Anonymous "TE" Anonymous "Upgrade" # # ConnectPort: This is a list of ports allowed by tinyproxy when the # CONNECT method is used. To disable the CONNECT method altogether, set # the value to 0. If no ConnectPort line is found, all ports are # allowed (which is not very secure.) # # The following two ports are used by SSL. # #ConnectPort 443 #ConnectPort 563 # # Configure one or more ReversePath directives to enable reverse proxy # support. With reverse proxying it's possible to make a number of # sites appear as if they were part of a single site. # # If you uncomment the following two directives and run tinyproxy # on your own computer at port 8888, you can access Google using # http://localhost:8888/google/ and Wired News using # http://localhost:8888/wired/news/. Neither will actually work # until you uncomment ReverseMagic as they use absolute linking. # #ReversePath "/google/" "http://www.google.com/" #ReversePath "/wired/" "http://www.wired.com/" # # When using tinyproxy as a reverse proxy, it is STRONGLY recommended # that the normal proxy is turned off by uncommenting the next directive. # #ReverseOnly Yes # # Use a cookie to track reverse proxy mappings. If you need to reverse # proxy sites which have absolute links you must uncomment this. # #ReverseMagic Yes # # The URL that's used to access this reverse proxy. The URL is used to # rewrite HTTP redirects so that they won't escape the proxy. If you # have a chain of reverse proxies, you'll need to put the outermost # URL here (the address which the end user types into his/her browser). # # If not set then no rewriting occurs. # #ReverseBaseURL "http://localhost:8888/"
- After you make the changes in nano text editor, hit Ctrl+X to exit and press 'y' to save the file
Code:
service tinyproxy start
Code:
/etc/init.d/tinyproxy start
Code:
netstat -pnatu
tcp 0 0 :::8080 :::* LISTEN 3806/tinyproxy
Step FIVE - Hopefully the proxy server is up and running now. Go and point your browser proxy setting to the IP address and port of the VPS/proxy, and navigate to whatismyipaddress dot com. It should show your proxy address instead of your home address. Also check their advanced proxy check page at whatismyipaddress dot com / proxy-check, the results should be all False (no proxy detected).
Congrats! you've just configured a highly anonomyous, secure, and private proxy for yourself. If it didn't work for any reason, not to worry, just post here or pm me and we'll sort it out together.