Earlier I shared an article with the steps to improve your Disk IO performance in Linux. Now let me give you some tips to run shell scripts in parallel and collect exit status of respective background process in Linux.

How to run shell scripts in parallel & collect exit status of process in Linux

 

Why should you run shell scripts in parallel?

Computing power constantly increases not only because processors have higher clock cycles but also because they have multiple cores. This means that in a single hardware processor there are multiple logical processors. It’s like having several computers, instead of just one.

However, multiple cores are useless unless the software makes use of them. For example, a program that does huge calculations may only run on one core while the others will sit idle. The software has to be aware and take advantage of the multiple cores if we want it to be faster.

 

Let us start with our sample script

Below I have created three child scripts and one wrapper script which we will use for demonstration to run shell scripts in parallel and then we will collect the exit status of each of these process (scripts).

[[email protected] myscripts]# cat /tmp/myscripts/script1.sh
#!/bin/bash

sleep 5
exit 5
[[email protected] myscripts]# cat /tmp/myscripts/script2.sh
#!/bin/bash

sleep 10
exit 10
[[email protected] myscripts]# cat /tmp/myscripts/script3.sh
#!/bin/bash

sleep 12
exit 12

 

As you see these scripts will do nothing but just sleep for sometime and throw different exit codes. The main idea here is to make sure with scripts ending at different time interval do not impact our function to collect respective exit status.

First I have a simple script which will call these scripts sequentially

# cat /tmp/run_sequential.sh
#!/bin/bash

tmp_file=$(mktemp /tmp/file.XXX)
for scripts in /tmp/myscripts/*; do
   sh $scripts
   exit_status=$?
   script_name=`echo $scripts | rev | awk -F "/" '{print $1}' | rev`
   echo "$script_name exit status: $exit_status"
done

rm -f $tmp_file

So here I am calling individual scripts under /tmp/myscripts and then later I will perform some edit (not required for this tutorial) to get the script name for getting a better output.

 

Let us execute the wrapper script and monitor the time taken for the execution

# time /tmp/run_sequential.sh
script1.sh exit status: 5
script2.sh exit status: 10
script3.sh exit status: 12

real 0m27.044s
user 0m0.011s
sys 0m0.031s

So as expected with sequential approach, the script took 5+10+12 seconds = 27 seconds for execution.

 

Now let us try parallel approach.

To run script in parallel in bash, you must send individual scripts to background. So the loop will not wait for the last process to exit and will immediately process all the scripts. Next you can put some function to collect the exit status of all the processes you pushed to the background so you can print proper exit status.

# cat /tmp/run_parallel.sh
#!/bin/bash

tmp_file=$(mktemp /tmp/file.XXX)
for scripts in /tmp/myscripts/*; do
   sh $scripts &
   PID="$!"
   echo "$PID:$scripts" >> $tmp_file
   PID_LIST+="$PID "
done

for process in ${PID_LIST[@]};do
   wait $process
   exit_status=$?
   script_name=`egrep $process $tmp_file | awk -F ":" '{print $2}' | rev | awk -F "/" '{print $2}' | rev`
   echo "$script_name exit status: $exit_status"
done

rm -f $tmp_file

 

How this works?

We exploit the Bash operand &, which instructs the shell to send the command to the background and continue with the script. However, this means our script will exit as soon as the loop completes, while the child script processes are still running in the background. To prevent this, we get the PIDs of the processes using $!, which in Bash holds the PID of the last background process. We append these PIDs to an array and then use the wait command to wait for these processes to finish.

Here we are storing the PID number of every process I am sending to background and mapping it to the script name so that later I can map the PID to the respective script. Once all the process are sent to background, I am using ‘wait’ function to wait for the PID from our PID_LIST to exit and then capture and print the respective exit status.

 

Now let us execute our script and monitor the execution time

# time /tmp/run_parallel.sh
myscripts exit status: 5
myscripts exit status: 10
myscripts exit status: 12

real 0m12.028s
user 0m0.030s
sys 0m0.044s

As you see now for the same set of scripts, the wrapper only took 12 seconds for execution.

 

Lastly I hope the steps from the article to to run shell scripts in parallel & collect exit status of process on Linux was helpful. So, let me know your suggestions and feedback using the comment section.

Leave a Reply

Your email address will not be published. Required fields are marked *