Often we look for new way to boost up our work, so we write a lot of code to perform jobs at high speed.
Sometimes the right way is the old way, specially if we need to do repetitive jobs. To do this kind of jobs we can use the xargs command, this command used in pipe with other commands can automate our jobs.
For example if we want to remove all C files from a directory and from all subdirectory recursively we can do the following:
root@slax # find ./ -name "*.c" | xargs rm -f
this command performs a lot of work for us and so it can speed up our jobs, also this command doesn't lose any .c files so we can be sure that we have removed all files.
The xargs command has two interesting options that allows to execute many jobs in parallel
--max-args=max-args, -n max-args
Use at most max-args arguments per command line. Fewer than
max-args arguments will be used if the size (see the -s option)
is exceeded, unless the -x option is given, in which case xargs
will exit.
-P max-procs
Run up to max-procs processes at a time; the default is 1. If
max-procs is 0, xargs will run as many processes as possible at
a time. Use the -n option with -P; otherwise chances are that
only one exec will be done.
Now imagine you want to convert all .pdf files from the RFC-all.tgz archive into .ps files:
root@slax # time find ./ -name "*.pdf" | wc -l
122
root@slax # time find ./ -name "*.pdf" | xargs -Istr pdf2ps str
real 6m22.734s
user 5m41.345s
sys 0m38.990s
this command makes the work for you and you could be happy, to be happiest you shall modify the xargs command as follows:
root@slax # time find ./ -name "*.pdf" | xargs -n 8 -Istr pdf2ps str
real 4m35.770s
user 6m20.376s
sys 0m46.979s
as you can see this command speeds up the execution of your job of 39%
For jobs that have higer latency you can obtain better performances, imagine you want to ping all IP address on your subnet (192.168.0.1 to 192.168.0.255)
#!/bin/bash COUNTER=0 while [ $COUNTER -lt 255 ]; do echo 192.168.0.$COUNTER >>subnet.txt let COUNTER=COUNTER+1 done
This simple script will build the file subnet.txt containing a row for each IP address in the subnet.
You can use this file to make the "global" ping:
root@slax# time cat subnet.txt | xargs -i ping {} -c 1 > log.txt
real 12m32.336s
user 0m0.380s
sys 0m0.660s
root@slax# grep "1 errors" log.txt | wc -l
250
This command make the work for you and you can be happy. 😉
You could be very happy if you parallelize the work as follow:
root@slax# time cat subnet.txt | xargs -n 8 -P 8 -i ping {} -c 1 > log.txt
real 1m36.289
user 0m0.440s
sys 0m0.708s
grep "1 errors" log.txt | wc -l
250
This command speed up the work about 7,8 times.
In this specific case (the ping is very high latency job) you can be happiest increasing the level of parallelization
root@slax# time cat subnet.txt | xargs -n 256 -P 256 -i ping {} -c 1 > log.txt
real 0m4.643s
user 0m0.544s
sys 0m2.168s
root@slax# grep "1 errors" log.txt | wc -l
250
The results are the same but with performance boosted of a factor of 163. As you can see, often we have the solutions in our hands, we only need to solve our problems in the right way.
Gg1