3/1/2018 - 8:16 AM

download whole page with wget

Wget is a command-line utility that can retrieve all kinds of files over the HTTP and FTP protocols. Since websites are served through HTTP and most web media files are accessible through HTTP or FTP, this makes Wget an excellent tool for ripping websites.

While Wget is typically used to download single files, it can be used to recursively download all pages and files that are found through an initial page:

wget -r -p //
However, some sites may detect and prevent what you’re trying to do because ripping a website can cost them a lot of bandwidth. To get around this, you can disguise yourself as a web browser with a user agent string:

wget -r -p -U Mozilla //
If you want to be polite, you should also limit your download speed (so you don’t hog the web server’s bandwidth) and pause between each download (so you don’t overwhelm the web server with too many requests):

wget -r -p -U Mozilla --wait=10 --limit-rate=35K //
Wget comes bundled with most Unix-based systems. On Mac, you can install Wget using a single Homebrew command: brew install wget (how to set up Homebrew on Mac). On Windows, you’ll need to use this ported version instead.