Simpler Dataflow Language
In my year at TAI, I learned something very important - Bootstrap all new
designs off existing technology because frankly, your design could use some
work. There is no substitute for working code, especially when your goal is to
refine that code.
Well I took that advice yesterday and made a little language that allows
you to set up a pipeline between *NIX commands where the commands can have
multiple input and output streams. I believe this is equivalent to a Hartmann
pipeline if one is clever enough with xargs. The utility is called 'lace' and
its source code, containing the following example, can be found at the bottom
of this post.
So why all this dataflow business? Well, pipelines in the shell always bugged
me, since many problems have a split/join attribute to them. To solve this in
a script, you have to use temporary files, explicit file descriptors, or named
pipes. All of these have downsides. Temporary files are a pain to manage and
clean up. File descriptors require cunning in your command layout and are
difficult to maintain since numbers are being hard-coded. Named pipes are
also a pain to manage and have strange blocking semantics in the shell,
causing elusive deadlocks.
Anyway, below is a Lace script. At the top, a computer lab map is defined.
This computer lab is totally unrealistic because all the machines are named
localhost. One can just as easily substitute real machine names into the map.
The script visits each host, checks what user is signed on, and replaces the
host name with the user name in the map. This happens for each host and the
modified map is output at the end. Note that some of these steps happen
simultaneously.
I wrote this script to hand back assignments faster since I TA an introductory
programming lab. At this point, I know most people's names, but am tired of
gawking over the shoulders of those I don't know.
## This code replaces all the computer names in /lab/
## with the user id on each machine!
## Non-hostnames in the map must start with exclamation points.
## You should have auto-login to these hosts.
$(H lab)
!door! localhost 127.0.0.1 !door
! localhost localhost !
! localhost localhost !
! !
! !projector !
! !
! localhost localhost !door
$(H lab)
echo $(H lab) $(O a)
$(H clientcmd)
who | awk '{ print "host " $0 ; }'
$(H clientcmd)
sed $(X a) $(O a) \
-e 's/^[[:space:]]\+/ /' \
-e 's/^ //' \
-e 's/ $//'
tr $(X a) ' ' '\n' $(O a)
grep $(X a) -v '^!' $(O a)
#cat $(XF a)
#$(H comment)
xargs $(X a) -n 1 -ihost ssh -o StrictHostKeyChecking=no host \
sh -c $(H clientcmd) $(O users)
$(H subst)
{ printf ("s/\\<%s\\>/%-*s/\n", $1, length($1), $2); }
$(H subst)
awk $(X users) $(H subst) $(O a)
echo $(H lab) $(O b)
sed $(X b) -f $(XF a)
#$(H comment)
Here's the rundown.
The "$(H name)" directive defines a Here document (like in shell scripting)
when it appears at the start of a line. The Here document ends when the
"$(H name)" is encountered again on its own line. When this doesn't start a
line, it is treated as a variable. The Here document text expands into a
(single) argument.
The "$(X name)" and "$(O name)" directives specify standard in and standard
out for a command. If unspecified, Lace's standard in/out are used. Note that
these can appear anywhere on a command line, placement doesn't matter. They
don't appear anywhere in the final command arguments. Also note that an output
must appear before an input. Outputs and inputs have a 1-to-1 correspondence,
so no output can be used as two inputs (use tee to get around this).
The "$(XF name)" and "$(OF name)" directives also define input and outputs for
a command, but these expand to file names (/dev/fd/XX). Note that these can be
matched up with the previous two ways of defining inputs and outputs (i.e.:
they are part of that 1-to-1 correspondence). A real difference from standard
*NIX pipelining begins to show with these directives. The final Sed command is
sed $(X b) -f $(XF a)
which takes the lab map on standard input and a dynamically generated Sed
script via a file descriptor (file) in /dev/fd/. This is doable in standard
shell programming, but is more difficult to conjure. More complex routing is
possible of course.
All in all, I actually found this code easier to write than if I had tried
writing a big pipeline within standard shell script syntax. The problem
was easier to dissect without worrying where data would have to travel.
That said, the two scripting languages are not at odds. Lace scripts can be
called from shell scripts, much like Awk and Sed.
The two-days-in source code of lace is here:
lace-orig.tar.gz
It's in the public domain, enjoy.
-- Alex
2011.10.17 - last edit >> Thu, 10 Nov 2011 00:00:25 -0500