1wget --random-wait -r -p -e robots=off -U mozilla http://yoursite.com
OR, as a bash function added to ~/.bash_profile
(MacOS) or ~/.bashrc
(Linux)
1# Download an entire website
2# -p --page-requisites: get all the elements that compose the page (images, CSS and so on)
3# -e robots=off you don't want wget to obey by the robots.txt file
4# -U mozilla as your browsers identity.
5# --random-wait to let wget chose a random number of seconds to wait, avoid get into black list.
6# Other Useful wget Parameters:
7# -k --convert-links: convert links so that they work locally, off-line.
8# --limit-rate=20k limits the rate at which it downloads files.
9# -b continues wget after logging out.
10# -o $HOME/wget_log.txt logs the output
11
12getwebsite() {
13 wget --random-wait -r -p -e robots=off -U mozilla $1
14}
To use the bash function
1getwebsite http://websitelink.com
--random-wait
This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds
-r
recursive retreiving
-p
for --page-requisites
, download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
-e
for executing commands. -e robots=off
is the command being sent in this instance.
-U
for --user-agent
equal to --user-agent=mozilla
. Identify as Mozilla to the HTTP server.