Table of Contents
Introduction to wget command
wget is a popular command to download files from the internet. It can use HTTP, HTTPS, and FTP protocols for downloading files.
Since wget is non-interactive it can work in the background even if the user is not logged in. With wget, you can download multiple files, resume the download if interrupted, limit the bandwidth, mirror a website, download files in the background, download files from an FTP server, and many more.
Syntax to use wget command
The syntax for the wget command is as follows:
$ wget [option] URL
Different examples to use wget command
1. Download a webpage
You can simply specify a URL to download a webpage of the website. For example, this command gets the HTML content of the page
$ wget https://ubuntu.com
As you can see, wget shows the file that is being downloaded along with the progress bar, file size, transfer speed, estimated time, etc.
2. Download a file
Similarly, you can use the wget command followed by a file URL to download that file.
$ wget url
The following example downloads the tar archive file of Sublime Text.
3. Save a file with a different name
--output-document option allows you to save a downloaded file as a given name.
$ wget -O filename url
$ wget --output-document=file filename url
In this example, the original filename is
sublime_text_build_4126_x64.tar.xz which we are saving as
sublime.tar.xz with the help of
4. Download a file to the specific directory
By default, wget downloads a file to the current directory. To download a file to a specific directory, you can use the
$ wget -P dir url
$ wget --directory-prefix=dir url
The following example downloads a rpm to the directory
5. Download a file in background
--background option runs the background process to download a file. The download progress can be viewed by opening
$ wget -b url
$ wget --background url
6. Resume a previous download
--continue option allows you to resume the partially downloaded file.
$ wget -c url
$ wget --continue url
Here, we are resuming the download of Ubuntu
7. Limit the download speed
You can limit the download speed of wget using the
--limit-rate option. The default size is bytes per second. You can use
k for kilobytes,
m for megabytes, and
g for gigabytes.
The following example downloads the Ubuntu iso file with a speed of
1MB per second.
$ wget --limit-rate=1m url
8. Use URLs from the file to download
--input-file option lets you read from a specified file. You can use this option to specify the file containing a list of URLs to be downloaded. Each URL in a file must be on a separate line.
$ wget -i file
$ wget --input-file file
We have created a
url_list.txt file with the following URLs.
This option is helpful for downloading multiple files with the wget command.
9. Create a mirror of the website
$ wget -m url
$ wget --mirror url
10. Set the number of retries
You can set the number of retries to a specified number. The default behavior is to retry 20 times but errors like "connection refused" or "404 not found" are not retried.
$ wget -t num url
$ wget --tries=num url
For infinite retry, you can use
11. Download from web server using user credentials
If your web server is protected by user password then you will have to specify the login credentials while trying to access the web server. Specify the username with
--user and password with
--password for both FTP and HTTP file retrieval. These parameters can be overridden using the
--ftp-password options for FTP connections and the
--http-password options for HTTP connections.
wget --user=(put username here) --password='(put password here)' http://example.com/
12. Turn off wget’s output
You can hide the output of wget using the
$ wget -q url
$ wget --quiet url
13. Download files and directories recursively
To download files and directories recursively you have to pass the
--no-parent option to wget (in addition to
--recursive, of course), otherwise it will follow the link in the directory index on my site to the parent directory.
~]# wget --recursive --no-parent https://dl.fedoraproject.org/pub/epel/9/Everything/ ~]# ls -l dl.fedoraproject.org/pub/epel/9/Everything/ total 60 drwxr-xr-x. 6 root root 4096 Oct 26 12:45 aarch64 -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=D;O=A -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=D;O=D -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=M;O=A -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=M;O=D -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=N;O=A -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=N;O=D -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=S;O=A -rw-r--r--. 1 root root 1207 Oct 26 12:45 index.html?C=S;O=D drwxr-xr-x. 6 root root 4096 Oct 26 12:45 ppc64le drwxr-xr-x. 2 root root 4096 Oct 26 12:45 s390x drwxr-xr-x. 2 root root 4096 Oct 26 12:45 source -rw-r--r--. 1 root root 58 Oct 24 20:11 state drwxr-xr-x. 2 root root 4096 Oct 26 12:45 x86_64
14. Reject or skip specific files during download
We also have an option tp skip or reject certain files from being downloaded using
--reject option. For example, in the previous command you can see there are multiple index.html files downloaded. So let us skip all files with name
index.html when downloading recursively:
]# wget --recursive --no-parent -R "index.html*" https://dl.fedoraproject.org/pub/epel/9/Everything/ ... --2022-10-26 12:45:48-- https://dl.fedoraproject.org/pub/epel/9/Everything/?C=N;O=D Reusing existing connection to dl.fedoraproject.org:443. HTTP request sent, awaiting response... 200 OK Length: 1207 (1.2K) [text/html] Saving to: ‘dl.fedoraproject.org/pub/epel/9/Everything/index.html?C=N;O=D’ 100%[====================================================================================================================================================================>] 1,207 --.-K/s in 0s 2022-10-26 12:45:48 (157 MB/s) - ‘dl.fedoraproject.org/pub/epel/9/Everything/index.html?C=N;O=D’ saved [1207/1207] Removing dl.fedoraproject.org/pub/epel/9/Everything/index.html?C=N;O=D since it should be rejected. ....
As you can see, the index.html file was removed during wget download. Let's check the downloaded content:
]# ls -l dl.fedoraproject.org/pub/epel/9/Everything/ total 24 drwxr-xr-x. 6 root root 4096 Oct 26 12:45 aarch64 drwxr-xr-x. 4 root root 4096 Oct 26 12:45 ppc64le drwxr-xr-x. 2 root root 4096 Oct 26 12:45 s390x drwxr-xr-x. 2 root root 4096 Oct 26 12:45 source -rw-r--r--. 1 root root 58 Oct 24 20:11 state drwxr-xr-x. 2 root root 4096 Oct 26 12:45 x86_64
As expected, we don't have any
index.html files anymore.
15. Download without the whole directory structure
You may have observed in our previous examples that even though we wanted to download from "Everything" in the URL
https://dl.fedoraproject.org/pub/epel/9/Everything/, the downloaded content started from hostname followed by path i.e.
dl.fedoraproject.org/pub/epel/9/Everything/. So we can also avoid downloading content inside hostname or path and just have the target path name using
--cut-dirs where we have to specify the number of directories to cut from the download.
For example in this case we would like to cut 4 directories i.e. pub, epel, 9, Everything from the RUL path. Additionally to avoid generating host-prefixed directories we use
--no-host-directories So, let's try this command:
]# wget -r -nH --cut-dirs=4 --no-parent --reject="index.html*" https://dl.fedoraproject.org/pub/epel/9/Everything/ ]# ls -l total 20 drwxr-xr-x. 2 root root 4096 Oct 26 12:54 aarch64 drwxr-xr-x. 2 root root 4096 Oct 26 12:54 ppc64le drwxr-xr-x. 2 root root 4096 Oct 26 12:54 s390x drwxr-xr-x. 2 root root 4096 Oct 26 12:54 source -rw-r--r--. 1 root root 58 Oct 24 20:11 state
As you can see, this time our downloaded content is directly from "Everything" directory.
17. Define the depth of recursive download
We know we can use
--recursive to turn on the recursive download. But we can also control the maximum depth of recursive using
--level. You have to specify the depth to which wget should download the content. For example, I would like to download upto 2 levels of the path I have specified in the download URL:
# wget -r -nH --cut-dirs=4 --no-parent --reject="index.html*" --level 2 https://dl.fedoraproject.org/pub/epel/9/Everything/
As you can see in the output, files and directories upto 2 levels were only downloaded.
~]# tree . . ├── aarch64 │ ├── debug │ ├── drpms │ ├── Packages │ └── repodata ├── ppc64le │ ├── debug │ ├── drpms │ ├── Packages │ └── repodata ├── s390x │ ├── debug │ ├── drpms │ ├── Packages │ └── repodata ├── source │ └── tree ├── state └── x86_64 ├── debug ├── drpms ├── Packages └── repodata 22 directories, 1 file
16. Execute commands during download
We can use
-e to execute command as if it were a part of
.wgetrc. A command thus invoked will be executed after the commands in
.wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of
For example to turn off the robot exclusion we can execute:
wget -e robots=off http://www.example.com/
By now, you should have understood how to use wget commands and download files from the terminal. wget is a good alternative for curl in Linux.
If you have any questions or feedback, please let us know in the comment section.