How to download your website using WGET for Windows

By richardbaxterseo |

Ever had that terrifying feeling you’ve lost your blog? Perhaps your WordPress installation got hacked, or your web hosts royally screwed up with a “database upgrade”. Either way there’s an almost infinite array of reasons to download and backup a copy of your website, and precisely zero reasons to neglect doing it.

If you’re a Linux user, there are lots of guides out there on how to use WGET, the free network utility to retrieve files from the World Wide Web using HTTP and FTP, but no guides to doing so with Windows. Unless you fancy installing Ubuntu or Crunchbang, here’s a handy guide to downloading your site using WGET in Windows.

1) Download WGET

Download and save WGET to your desktop. You can get wget.exe here. I recommend downloading WGET for Windows (win32) from Ugent.be as it’s the most up to date version I could find. For info, you can also get WGET from Brothersoft but avoid the WGET for Windows download page, because their installer doesn’t work with Windows Vista.

2) Make WGET a command you can run from anywhere in the Command Prompt

If you want to be able to run WGET from any directory inside the command terminal, you’ll need to learn about the path command to work out where to copy your new executable.

First, open a command terminal by selecting “run” in the start menu (if you’re using Windows XP) and typing “cmd”. If you’re running Windows Vista go to “All Programs > Accessories > Command Prompt” from the start bar. You’ll see something that looks like this:

terminal

We’re going to move wget.exe into a Windows directory that will allow WGET to be run from anywhere. First, we need to find out which directory that should be. Type path into the command prompt to find out:

command prompt path command

Thanks to the “Path” environment variable, we know that we need to copy wget.exe to either the C:\Windows\System32\ directory or the C:\Windows\ directory. Go ahead and copy WGET to either of the directories you see in your Command Terminal.

3) Restart terminal and test WGET

If you want to test WGET is working properly, restart your terminal and type:

wget -h

If you’ve copied the file to the right place, you’ll see a help file appear with all of the available commands

4) Make a directory to download your site to

Seeing that we’ll be working in Command Prompt, let’s create a download directory just for WGET downloads. *If you’re familiar with Command Terminal basics, just skip this step. Change to the C:\ and use md (makedir) to make a directory:

make directory

Change (cd site-download) to your new directory and you’re ready to do some downloading!

5) Download your site using WGET

Ok, the fun bit begins. Once you’ve got WGET installed and you’ve created a new directory, all you have to do is learn some of the finer points of WGET arguments to make sure you get what you need.

I found two particulary useful resources for WGET usage. The Gnu.org WGET Manual and About.com’s Linux WGET guide are definitely the best.

After some research I came up with a set of instructions to WGET to recursively mirror your site, download all the images, css and javascript, localise all of the URLS (so the site works on your local machine), and save all the pages as a .html file.

To mirror your site:

wget -r http://www.yoursite.com

To mirror the site and localise all of the urls:

wget --convert-links -r http://www.yoursite.com

To mirror the site and save the files as .html:

wget --html-extension -r http://www.yoursite.com

6) Is your WGETing you blocked?

See what I did there? Some webservers are set up to deny WGET’s default user agent – for obvious, bandwidth saving reasons. You could try changing your user agent to get round this. Try er, pretending to be Googlebot:

wget --user-agent="Googlebot/2.1 (+http://www.googlebot.com/bot.html)" -r http://www.yoursite.com

And finally, here’s WGET downloading my website:

downloading seogadget

On that last note, lots of hosting companies block WGET. Mine included! Took me a while to be able to back my own site up but now, I feel pretty safe that I have backups of the database, the plugins, the images and even the HTML of the site itself. Happy WGETting! :-)

15 Responses to “How to download your website using WGET for Windows”

Leave a comment
  1. Posted February 4, 2009 at 8:40 pm | Permalink

    That’s quite a twist on site backup. :) I use less inventive way – Cobian Backup to backup folders from my FTP account. Fetches only needed folders (theme, plugins, images).

  2. Matthew
    Posted February 4, 2009 at 11:04 pm | Permalink

    nice article and wget is very useful tool to be aware of. You might also like Unison.

  3. Posted February 4, 2009 at 11:24 pm | Permalink

    Nifty technique and unique guide here Richard. Way to contribute to the community.

    2 points:
    1) Recursively mirror your site – huh? Try that again in English please, Mr. Englishman ;P
    2) How do Mac users do this?

  4. Posted February 6, 2009 at 9:21 am | Permalink

    Hey Gab – ok – the -r command is the mirror command. Recursively, follow all links. As for the MAC – no idea dude! I keep meaning to get hold of a MAC to learn how. If I come across the answer, I’ll post it here. Thanks for dropping by!

  5. Dan
    Posted March 23, 2009 at 2:08 am | Permalink

    This works for me: wget -e robots=off -E -r -k -l inf -p –restrict-file-names=windows -H -K -D [Your Blog Name].wordpress.com,[Your Blog Name].files.wordpress.com –random-wait http://Your Blog Name].wordpress.com

    • randomstranger
      Posted May 13, 2010 at 12:16 pm | Permalink

      Thanks Dan, that totally worked!

  6. Posted April 7, 2009 at 9:24 am | Permalink

    Hey can anyone help. Trying to download files using wget v 1.10.2 from the command prompt gives this (filenames blanked for commercial reasons):

    –2009-04-07 07:53:52– http://www.medistat-software.net/*******/******.***
    Resolving http://www.medistat-software.net... seconds 0.00, 77.92.81.1
    Caching http://www.medistat-software.net => 77.92.81.1
    Connecting to http://www.medistat-software.net|77.92.81.1|:80… seconds 0.00, Closed fd 1936
    failed: Connection timed out.
    Releasing 0x009259d8 (new refcount 1).
    Retrying.

    This actually works from about 75% of my clients but the other 25% get this error.

    Help – what does it mean/ What is ‘Closed fd 1936′ ??

    Cheers Owen E

  7. Ivan
    Posted August 30, 2009 at 8:16 pm | Permalink

    Hello,

    very nice article indeed. However, it does not help me download my site or any site. I copied and pasted all the commands here with the same result. Only the index file and a js file (http://jscook.sourceforge.net/JSCookMenu/). Why can't I download the site?

    Please help.

    • Posted August 30, 2009 at 8:39 pm | Permalink

      I've found that some sites don't respond correctly unless you add a user agent to the request. Have you tried that?

    • Ivan
      Posted August 30, 2009 at 9:32 pm | Permalink

      How do I add an user agent and what is it?

      I cannot download my site only with wget. WinHTTrack did it. Other sites can be downloaded with wget, but mine nada.

      Here is the url: http://www.all-e-services.com (still in development phase)

      Thank you!

    • Posted August 30, 2009 at 9:43 pm | Permalink

      No problem Ivan,

      I suggest you follow point 6 in the post, as it may be your web host is blocking WGET's standard user agent.

      Good luck!

  8. Darius
    Posted May 6, 2010 at 7:17 pm | Permalink

    Suggested wget download website
    is corrupt

    http://users.ugent.be/~bpuype/wget/#download

    Hidden installation under Vista
    no apps, no interface

    Can you check it under Vista ?

  9. Manali
    Posted July 14, 2010 at 9:07 am | Permalink

    It helped me in all ways… Details are informative…
    Thanks!

  10. thank
    Posted July 19, 2010 at 12:33 pm | Permalink

    i’m saving a site that’s about to be deleted as i type!

    thanks for sharing.

  11. Cameron Fraser
    Posted August 1, 2010 at 3:05 pm | Permalink

    the site I’m trying to archive has “?” in the links and wget saves files on windows replacing
    the ? with “@” – but it leaves the links with “?” so the links don’t match the filenames.
    Also “@” is problematic since it looks like an email link to the browser. Any workaround
    for this?
    thanks

Tagged as: , ,