1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[METHOD] Clone any site!! [With bonuses!]

Discussion in 'Black Hat SEO' started by royserpa, Aug 22, 2013.

  1. royserpa

    royserpa Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 28, 2011
    Messages:
    4,649
    Likes Received:
    3,494
    Gender:
    Male
    Occupation:
    Negative Options aka Rebills!
    Location:
    Royserpa
    Home Page:
    Hey bhw members,

    How are ya?

    Hopefully banking hard :)

    Ok enough talk.

    I have read so many threads about how to clone a site or people asking how to do it.

    Many have gone with HTTRACK and other software. Me personally, prefer a good and robust command line tool even if it has no GUI.

    Well, many of you already know it, but for the ones that don't, I present to you WGET!

    Basically, wget is:

    Code:
    GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
    
    And the best part is that it can be run in many OSs such as Linux and Windows (can't remember if it works on Mac).

    The Method:

    First, you have to decide where you want to run WGET. It could be your server or your PC.

    Of course if you are going to clone/copy a site to make it your own, you want to use your server instead of your pc.

    First, let's start with Windows:

    First, you will want to download the wget package from: http://superb-dca3.dl.sourceforge.net/project/gnuwin32/wget/1.11.4-1/wget-1.11.4-1-setup.exe (or on: http://gnuwin32.sourceforge.net/packages/wget.htm), download on C:/wget and install on same directory.

    Now you will have the following folders on C:/

    Code:
    C:/wget/GnuWin32
    
    Now you want to start the windows terminal, which you can do by doing either one of:

    1) Win+R, type "cmd" without quotes and press enter
    Or
    2) Start=>Run=>Type "cmd" without qoutes=>Press enter.

    Now you will have sth like:

    [​IMG]

    And you are ready to go!

    Now lets start the fun part!

    First let's type and press enter:

    Code:
    cd C:\wget\GnuWin32\bin
    
    From there you can use wget any way you want!

    Now let's download a simple file using wget!

    Just type:

    Code:
    wget [URL="http://d24w6bsrhbeh9d.cloudfront.net/photo/aXbwYqV_460sa.gif"]-[/URL]r -m [URL]http://www.cpa10k.com/[/URL]
    
    And Voila! you just downloaded that whole site!

    Now, let's go with Linux Servers!

    First, make sure you have ssh or access to any terminal and connect to your vps/server/linux machine/etc.

    Now that you are connected, make sure you have wget installed, by typing:

    Code:
    wget
    if it says sth that repository not found, simply install (if you have root access) with the command:

    Code:
    yum install wget
    Once you are sure wget is installed, we can proceed with a very easy example of how to implement.

    First, lets go to the temp folder by typing:

    Code:
    cd /tmp
    Now let's copy a good little site by typing:

    Code:
    mkdir wtest; cd wtest; wget -m -r [URL]http://gnuwin32.sourceforge.net/packages/wget.htm[/URL]; ls
    Now you will see all of the downloaded files from that URL like:

    [​IMG]

    Now go and clone any site you want!!!

    Of course there are software out there that does this, but their usage is limited, so I suggest to follow the above steps :)

    Bonus!!

    As a "bonus", I will clone/copy 10-20 sites from the first people that post what site they would like to clone!

    If you need any help, DON'T PM ME!! Just post below and I will solve your problems asap! It's easier for everyone!

    So that's it, BHW!

    Enjoy cloning/copying your sites!
     
    • Thanks Thanks x 5
  2. CosmicSoundz

    CosmicSoundz BANNED BANNED

    Joined:
    Apr 30, 2012
    Messages:
    1,230
    Likes Received:
    1,296
    What can this do that HTTRACK cant?
     
    • Thanks Thanks x 1
  3. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    Probably a good reason I block these sort of remote requests with .htaccess.
     
  4. fmOzilla

    fmOzilla Power Member

    Joined:
    Nov 11, 2011
    Messages:
    714
    Likes Received:
    384
    Location:
    C:\Windows\System32
    gonna try this....only we can do HTML sites?
     
  5. royserpa

    royserpa Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 28, 2011
    Messages:
    4,649
    Likes Received:
    3,494
    Gender:
    Male
    Occupation:
    Negative Options aka Rebills!
    Location:
    Royserpa
    Home Page:
    I like it better :D
    And i can tell wget what i want, how i want it, where i wanted ALL in a single command lins :)
     
  6. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    It only requesting information, so yes, you'll get whatever the server gives you, not server side scripts.
     
  7. bmills

    bmills Junior Member

    Joined:
    Jan 14, 2010
    Messages:
    141
    Likes Received:
    78
    Location:
    Southern California
    Home Page:
    File -> Save As if you just want a single page works well too. But yeah for entire sites good method :) wget is a great tool I use all the time on my elementary OS partition. For non-terminal peeps httrack is good and works.
     
  8. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    heh, got paranoid for a second there, had to double check my file.

    Not full proof, but makes me feel better knowing this ones on the list. Good reminder!
     
  9. pxoxrxn

    pxoxrxn Supreme Member

    Joined:
    Dec 21, 2011
    Messages:
    1,397
    Likes Received:
    2,066
    I don't understand what this does that copy+paste page sauce doesn't? Can I copy a whole website and its file structure?

    Also using those commands on the window cmd prompt is so shit. Linux has a much better cmd system.
     
  10. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    Copy paste won't download any files at all actually, it'll simply transfer plain text to memory.
     
  11. pxoxrxn

    pxoxrxn Supreme Member

    Joined:
    Dec 21, 2011
    Messages:
    1,397
    Likes Received:
    2,066
    I did this method and it just got the index.html an that's it. is there a way to get more? Or is it just god for informational website and not so much interactive websites?
     
  12. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    Did you type in both -r -m options? You should have downloaded everything that was "linked" internally. HTML/CSS/Images/Js/SWF.. I believe.
     
  13. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    If your requesting something that is sent out from server side scripting (php, asp, whatever) you'll only get static html, this is just an automated way of caching sites keeping their directory\file structure intact.
     
  14. rayee

    rayee Power Member

    Joined:
    Jun 21, 2012
    Messages:
    548
    Likes Received:
    66
    Can you clone complicated sites like storenvy.com? Or charge a fee for it?
     
  15. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    I don't see this being effective for AJAX either, as I believe that was to tie in server side scripting in a less obtrusive manner to the end-user.
     
  16. royserpa

    royserpa Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 28, 2011
    Messages:
    4,649
    Likes Received:
    3,494
    Gender:
    Male
    Occupation:
    Negative Options aka Rebills!
    Location:
    Royserpa
    Home Page:
    -r and -m options are basically mirror recursively all of the URL.

    There are WAAAAY more options and configurations for wget.

    Here's a list of the features you can add: (you can watch the same list using the command: "wget --help" without quotes)

    Code:
    Microsoft Windows [Versión 6.1.7601]
    Copyright (c) 2009 Microsoft Corporation. Reservados todos los derechos.
    
    
    C:\Users\o>cd C:/wget
    
    
    C:\wget>cd GenWin32/bin
    El sistema no puede encontrar la ruta especificada.
    
    
    C:\wget>cd GnuWin32/bin
    
    
    C:\wget\GnuWin32\bin>wget --help
    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
    syswgetrc = C:\wget\GnuWin32/etc/wgetrc
    GNU Wget 1.11.4, un recuperador por red no interactivo.
    Modo de empleo: wget [OPCIÓN]... [URL]...
    
    
    Mandatory arguments to long options are mandatory for short options too.
    
    
    Startup:
      -V,  --version           display the version of Wget and exit.
      -h,  --help              print this help.
      -b,  --background        go to background after startup.
      -e,  --execute=COMMAND   execute a `.wgetrc'-style command.
    
    
    Logging and input file:
      -o,  --output-file=FILE    log messages to FILE.
      -a,  --append-output=FILE  append messages to FILE.
      -d,  --debug               print lots of debugging information.
      -q,  --quiet               quiet (no output).
      -v,  --verbose             be verbose (this is the default).
      -nv, --no-verbose          turn off verboseness, without being quiet.
      -i,  --input-file=FILE     download URLs found in FILE.
      -F,  --force-html          treat input file as HTML.
      -B,  --base=URL            prepends URL to relative links in -F -i file.
    
    
    Download:
      -t,  --tries=NUMBER            set number of retries to NUMBER (0 unlimits).
           --retry-connrefused       retry even if connection is refused.
      -O,  --output-document=FILE    write documents to FILE.
      -nc, --no-clobber              skip downloads that would download to
                                     existing files.
      -c,  --continue                resume getting a partially-downloaded file.
           --progress=TYPE           select progress gauge type.
      -N,  --timestamping            don't re-retrieve files unless newer than
                                     local.
      -S,  --server-response         print server response.
           --spider                  don't download anything.
      -T,  --timeout=SECONDS         set all timeout values to SECONDS.
           --dns-timeout=SECS        set the DNS lookup timeout to SECS.
           --connect-timeout=SECS    set the connect timeout to SECS.
           --read-timeout=SECS       set the read timeout to SECS.
      -w,  --wait=SECONDS            wait SECONDS between retrievals.
           --waitretry=SECONDS       wait 1..SECONDS between retries of a retrieval.
    
    
           --random-wait             wait from 0...2*WAIT secs between retrievals.
           --no-proxy                explicitly turn off proxy.
      -Q,  --quota=NUMBER            set retrieval quota to NUMBER.
           --bind-address=ADDRESS    bind to ADDRESS (hostname or IP) on local host.
    
    
           --limit-rate=RATE         limit download rate to RATE.
           --no-dns-cache            disable caching DNS lookups.
           --restrict-file-names=OS  restrict chars in file names to ones OS allows.
    
    
           --ignore-case             ignore case when matching files/directories.
      -4,  --inet4-only              connect only to IPv4 addresses.
      -6,  --inet6-only              connect only to IPv6 addresses.
           --prefer-family=FAMILY    connect first to addresses of specified family,
    
    
                                     one of IPv6, IPv4, or none.
           --user=USER               set both ftp and http user to USER.
           --password=PASS           set both ftp and http password to PASS.
    
    
    Directories:
      -nd, --no-directories           don't create directories.
      -x,  --force-directories        force creation of directories.
      -nH, --no-host-directories      don't create host directories.
           --protocol-directories     use protocol name in directories.
      -P,  --directory-prefix=PREFIX  save files to PREFIX/...
           --cut-dirs=NUMBER          ignore NUMBER remote directory components.
    
    
    HTTP options:
           --http-user=USER        set http user to USER.
           --http-password=PASS    set http password to PASS.
           --no-cache              disallow server-cached data.
      -E,  --html-extension        save HTML documents with `.html' extension.
           --ignore-length         ignore `Content-Length' header field.
           --header=STRING         insert STRING among the headers.
           --max-redirect          maximum redirections allowed per page.
           --proxy-user=USER       set USER as proxy username.
           --proxy-password=PASS   set PASS as proxy password.
           --referer=URL           include `Referer: URL' header in HTTP request.
           --save-headers          save the HTTP headers to file.
      -U,  --user-agent=AGENT      identify as AGENT instead of Wget/VERSION.
           --no-http-keep-alive    disable HTTP keep-alive (persistent connections).
    
    
           --no-cookies            don't use cookies.
           --load-cookies=FILE     load cookies from FILE before session.
           --save-cookies=FILE     save cookies to FILE after session.
           --keep-session-cookies  load and save session (non-permanent) cookies.
           --post-data=STRING      use the POST method; send STRING as the data.
           --post-file=FILE        use the POST method; send contents of FILE.
           --content-disposition   honor the Content-Disposition header when
                                   choosing local file names (EXPERIMENTAL).
           --auth-no-challenge     Send Basic HTTP authentication information
                                   without first waiting for the server's
                                   challenge.
    
    
    HTTPS (SSL/TLS) options:
           --secure-protocol=PR     choose secure protocol, one of auto, SSLv2,
                                    SSLv3, and TLSv1.
           --no-check-certificate   don't validate the server's certificate.
           --certificate=FILE       client certificate file.
           --certificate-type=TYPE  client certificate type, PEM or DER.
           --private-key=FILE       private key file.
           --private-key-type=TYPE  private key type, PEM or DER.
           --ca-certificate=FILE    file with the bundle of CA's.
           --ca-directory=DIR       directory where hash list of CA's is stored.
           --random-file=FILE       file with random data for seeding the SSL PRNG.
           --egd-file=FILE          file naming the EGD socket with random data.
    
    
    FTP options:
           --ftp-user=USER         set ftp user to USER.
           --ftp-password=PASS     set ftp password to PASS.
           --no-remove-listing     don't remove `.listing' files.
           --no-glob               turn off FTP file name globbing.
           --no-passive-ftp        disable the "passive" transfer mode.
           --retr-symlinks         when recursing, get linked-to files (not dir).
           --preserve-permissions  preserve remote file permissions.
    
    
    Recursive download:
      -r,  --recursive          specify recursive download.
      -l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
           --delete-after       delete files locally after downloading them.
      -k,  --convert-links      make links in downloaded HTML point to local files.
      -K,  --backup-converted   before converting file X, back up as X.orig.
      -m,  --mirror             shortcut for -N -r -l inf --no-remove-listing.
      -p,  --page-requisites    get all images, etc. needed to display HTML page.
           --strict-comments    turn on strict (SGML) handling of HTML comments.
    
    
    Recursive accept/reject:
      -A,  --accept=LIST               comma-separated list of accepted extensions.
      -R,  --reject=LIST               comma-separated list of rejected extensions.
      -D,  --domains=LIST              comma-separated list of accepted domains.
           --exclude-domains=LIST      comma-separated list of rejected domains.
           --follow-ftp                follow FTP links from HTML documents.
           --follow-tags=LIST          comma-separated list of followed HTML tags.
           --ignore-tags=LIST          comma-separated list of ignored HTML tags.
      -H,  --span-hosts                go to foreign hosts when recursive.
      -L,  --relative                  follow relative links only.
      -I,  --include-directories=LIST  list of allowed directories.
      -X,  --exclude-directories=LIST  list of excluded directories.
      -np, --no-parent                 don't ascend to the parent directory.
    
    
    Envíe información sobre errores y sugerencias a <bug-wget@gnu.org>.
    
     
  17. formosa

    formosa Regular Member

    Joined:
    Apr 30, 2008
    Messages:
    309
    Likes Received:
    27
    Home Page:
    looks awesome ... i would like a site :)
     
  18. DebtFreeMe

    DebtFreeMe Regular Member

    Joined:
    Mar 14, 2010
    Messages:
    418
    Likes Received:
    363
    Occupation:
    Military
    Location:
    Earth
    Do you know any way to also get .php files with the copy and not just the output?
     
  19. dinkish

    dinkish Power Member

    Joined:
    Apr 19, 2013
    Messages:
    689
    Likes Received:
    159
    You can't without server access. There's nothing lawful you could do to achieve this in reality.

    Php is a server side script, meaning it's already outputted what you get client side, everything that serves prior to output is server-side.
     
  20. khanhdum

    khanhdum Regular Member Premium Member

    Joined:
    Jul 29, 2009
    Messages:
    260
    Likes Received:
    64
    can you please clone me facebook, thanks.
     
    • Thanks Thanks x 3