Earlier I shared an article with the steps to improve your Disk IO performance in Linux. Now let me give you some tips to run shell scripts in parallel and collect exit status of respective background process in Linux.
Why should you run shell scripts in parallel?
Computing power constantly increases not only because processors have higher clock cycles but also because they have multiple cores. This means that in a single hardware processor there are multiple logical processors. It's like having several computers, instead of just one.
However, multiple cores are useless unless the software makes use of them. For example, a program that does huge calculations may only run on one core while the others will sit idle. The software has to be aware and take advantage of the multiple cores if we want it to be faster.
Let us start with our sample script
Below I have created three child scripts and one wrapper script which we will use for demonstration to run shell scripts in parallel and then we will collect the exit status of each of these process (scripts).
[root@node1 myscripts]# cat /tmp/myscripts/script1.sh
#!/bin/bash
sleep 5
exit 5
[root@node1 myscripts]# cat /tmp/myscripts/script2.sh
#!/bin/bash
sleep 10
exit 10
[root@node1 myscripts]# cat /tmp/myscripts/script3.sh
#!/bin/bash
sleep 12
exit 12
As you see these scripts will do nothing but just sleep for sometime and throw different exit codes. The main idea here is to make sure with scripts ending at different time interval do not impact our function to collect respective exit status.
First I have a simple script which will call these scripts sequentially
# cat /tmp/run_sequential.sh
#!/bin/bash
tmp_file=$(mktemp /tmp/file.XXX)
for scripts in /tmp/myscripts/*; do
sh $scripts
exit_status=$?
script_name=`echo $scripts | rev | awk -F "/" '{print $1}' | rev`
echo "$script_name exit status: $exit_status"
done
rm -f $tmp_file
So here I am calling individual scripts under /tmp/myscripts
and then later I will perform some edit (not required for this tutorial) to get the script name for getting a better output.
Let us execute the wrapper script and monitor the time taken for the execution
# time /tmp/run_sequential.sh
script1.sh exit status: 5
script2.sh exit status: 10
script3.sh exit status: 12
real 0m27.044s
user 0m0.011s
sys 0m0.031s
So as expected with sequential approach, the script took 5+10+12 seconds = 27 seconds for execution.
Now let us try parallel approach.
To run script in parallel in bash, you must send individual scripts to background. So the loop will not wait for the last process to exit and will immediately process all the scripts. Next you can put some function to collect the exit status of all the processes you pushed to the background so you can print proper exit status.
# cat /tmp/run_parallel.sh
#!/bin/bash
tmp_file=$(mktemp /tmp/file.XXX)
for scripts in /tmp/myscripts/*; do
sh $scripts &
PID="$!"
echo "$PID:$scripts" >> $tmp_file
PID_LIST+="$PID "
done
for process in ${PID_LIST[@]};do
wait $process
exit_status=$?
script_name=`egrep $process $tmp_file | awk -F ":" '{print $2}' | rev | awk -F "/" '{print $2}' | rev`
echo "$script_name exit status: $exit_status"
done
rm -f $tmp_file
How this works?
We exploit the Bash operand &
, which instructs the shell to send the command to the background and continue with the script. However, this means our script will exit as soon as the loop completes, while the child script processes are still running in the background. To prevent this, we get the PIDs of the processes using $!, which in Bash holds the PID of the last background process. We append these PIDs to an array and then use the wait command to wait for these processes to finish.
Here we are storing the PID number of every process I am sending to background and mapping it to the script name so that later I can map the PID to the respective script. Once all the process are sent to background, I am using 'wait' function to wait for the PID from our PID_LIST to exit and then capture and print the respective exit status.
Now let us execute our script and monitor the execution time
# time /tmp/run_parallel.sh
myscripts exit status: 5
myscripts exit status: 10
myscripts exit status: 12
real 0m12.028s
user 0m0.030s
sys 0m0.044s
As you see now for the same set of scripts, the wrapper only took 12 seconds for execution.
Lastly I hope the steps from the article to to run shell scripts in parallel & collect exit status of process on Linux was helpful. So, let me know your suggestions and feedback using the comment section.
How to handle the case where the child scripts return some error strings, basically some child bash script fails. And we need to capture them in the parent script
If you just wish to capture the logs then you can use same log file for both child and parent script
So both will be writing to same log file.
But if you wish to do some processing on the parent script based on these errors then you will need an watch operation on the log file.
Why
instead of
?
because 2+2 is equal to 4 and 2*2 is also equal to 4
You could also use GNU Parallel:
parallel ::: /tmp/myscripts/*.sh