How to count occurrences of word in file using shell script in Linux

Related Searches: count occurrences of word in file linux. shell script to count number of words in a file. count occurrences of all words in file linux. shell script to count number of lines in a file without using wc command. shell script to count number of lines and words in a file. find count of string in file linux. shell script to count number of lines words and characters in a file. count number of lines in a file linux

 

In my last article I shared some samples scripts to find and remove duplicate files and directories in Linux or Unix. Now in this article I will share some commands and scripts to count occurrences of word in file with examples.

 

Script to count occurrences of word in file

We can use the associative arrays of awk to solve this problem in different ways. Words are alphabetic characters, delimited by space or a period. First, we should parse all the words in a given file and then the count of each word needs to be found. Words can be parsed using regex with tools such as sed, awk, or grep.

 

Sample shell script

Below is a sample shell script which will count occurrences of word in file and print the total count of all the words present in the file.

# cat /tmp/count_words.sh
#!/bin/bash
#Desc: Find out frequency of words in a file

if [ $# -ne 1 ];
then
  echo "Usage: $0 filename";
  exit -1
fi

filename=$1
egrep -o "\b[[:alpha:]]+\b" $filename | \

awk '{ count[$0]++ }
END {printf("%-14s%s\n","Word","Count") ;
for(ind in count)
{ printf("%-14s%d\n",ind,count[ind]); }
}'

Next I will create a dummy_file.txt with some content which we will use to count all the words in the file

# cat /tmp/dummy_file.txt
count occurrences of word in file linux
shell script to count number of words in a file
count occurrences of all words in file linux
shell script to count number of lines in a file without using wc command
shell script to counts number of lines and words in a file
find count of string in file linux
shell script to counting number of lines words and characters in a file
count number of lines in a file linux

Now we will run the script along this file. And as you see the script will print the total occurrence of word in file. The script is also able to differentiate between matching words such as counts, count, counting

# /tmp/count_words.sh /tmp/dummy_file.txt
Word          Count
script        4
linux         4
words         4
counts        1
counting      1
without       1
count         6
lines         4
of            8
and           2
using         1
a             5
to            4
characters    1
number        5
in            8
command       1
shell         4
file          8
find          1
wc            1
string        1
all           1
word          1
occurrences   2

 

One liner command

You can also count occurrences of word in file with various one liner commands using grep, sed, tr, python etc. I will show some more examples here:

 

Using grep command

With egrep we can use different directives to count occurrences of word in file, for example to print the total number of occurrence of word "count" in /tmp/dummy_file.txt

# egrep -c '\<count\>' /tmp/dummy_file.txt
6

Here '\<count\>' makes sure that we only match the exact string, or else if we just use the word "count" then check the output. \< asserts the start of a word and \> asserts the end of a word

# egrep -c 'count' /tmp/dummy_file.txt
8

It is because it is trying to capture "counts" and "counting" also from the file

 

Using tr command

Similar to grep we can use translate command to count occurrences of word in file

# tr ' ' '\n' < /tmp/dummy_file.txt | grep '\<count\>' | wc -l
6

You can also use sed and other tools to list the word count from a file in Linux or Unix.

 

References:
Linux Shell Scripting Cookbook

Leave a Comment

Please use shortcodes <pre class=comments>your code</pre> for syntax highlighting when adding code.