Archive for December, 2009

I just discovered a really simple way to create a workqueue on Linux that someone else might find useful.  If you have a bunch of jobs to run that all require different parameters and that potentially take different amounts of time to complete, it’s difficult to schedule them in a way that makes maximum use of the available cores short of using some sort of batch scheduling system, which is overly complicated for a lot of prototyping purposes.  It turns out that the xargs command has builtin workqueue scheduling that is really easy to use.

basic syntax (assuming you want to run the program ‘command’ with one parameter, and that you want to have four processes running at any one time):

echo param1 param2 param3 param4 param5 param6 | xargs -n1 -P4 command

will run

command param1 &      # & => background
command param2 &
command param3 &
command param4 &

then when the first command of those four completes, it will run

command param5 &

then command param6 &, etc.

If your command requires two parameters, do:

echo param1a param1b param2a param2b [etc.] | xargs -n2 -P4 command

If you have a quad-core processor with hyperthreading, you could do -P8, etc.

You can also obviously store the params in a file and do cat file | xargs … .

The nice thing about this approach over batch scheduling for prototyping is that if you hit Ctrl-C, it kills all the child processes.

I haven’t experimented yet to find the optimal way to generate a separate logfile for each child process, but I also just discovered PPSS which is a more powerful system for achieving the same thing as xargs, and supports separate logfiles: http://code.google.com/p/ppss/

I hope this is as useful to someone else as it is going to be to me!!

UPDATE 2010-06-07: Ole Tange left me a message in response to this post alerting me to the existence of the project he maintains, GNU Parallel.  Looks like an awesome tool.

Read Full Post »