A Dataflow Language Prototype (pt 1)

We work around name collisions with ad-hoc methods: strange verbose prefixes and suffixes, packages, namespaces, readtables, class and function scopes... and we still have problems with collisions (hygienic macros, "using namespace" may find ambiguities) while our code, in many cases, just acquires unneeded cruft. Indeed, it is hard to specify exactly what we want in a non-colliding way when languages we use are based on strings for directives. It is clear that the association between what we want a name to mean and what it actually does mean cannot be specified within the system without sacrificing concision.

Another major problem persists: Efficiency. We are at a point when computers are _REALLY_ fast, so fast that some programmers sacrifice efficiency for clarity. On the other end of the spectrum, there are those that use explicit tricks to optimize, reducing clarity. Of course, one may find herself in both camps, perhaps combating a stack overflow, finding the correct bit of recursive call to loopify... and strictly using loops thereafter. There is a level of balance between the two camps which most are scattered amongst. Here, the compiler is not a mystery, nor is it trusted blatantly. The more one knows her compiler, the clearer her code... without sacrificing a performance hit.

I've been playing with a language idea which makes an effort to solve these two problems. The naming problem is solved because there is no syntax tied to the language - references to functions and variables can be absolute (assigned a key held within the IDE). References which usually require symbols do not need names. In fact, the prototype syntax I will present has nameless variables... which is not a new concept. However, I envision equivalent "syntaxes" to not require naming, keeping the program's reference graph local to the IDE, allowing it to present each reference to the programmer in unique ways (color + name + shape + luminosity + it doesn't matter).

The efficiency problem is more challenging. In fact, a naive way of using my approach surely gives an exponential compilation time. I allow the programmer to specify equivalent representations of patterns in the program. These equivalent representations need not be limited to the language. Though if in a foreign language, an equivalent representation cannot be optimized further through this mode of transformation.

John Backus made a very convincing case for programming languages with associated algebras (see: FP and FL). Reasoning about a language is important, and I believe that the "equivalent statements" which the programmer may specify in this language can be used to prove aspects of a program... so long as the programmer is accurate in her assertions.

Let us get to the code, I'm certain you'll hate it. At present, the syntax models a circuit-like graph, with the lowercase 'o' being a source, and 'x' a sink. For example, defining the completely useless function "add" (using the + operator), goes like this:

add  o  o
     |  |
     |  |
     x  x  +  o
              |
              |
--------------/


Isn't it terrible? Here we see the declaration of add, and two sources next to it. Following each "wire" down, we see they go into sinks on the same line as the + operator and another source. This means that the + function operates on the two values carried to it along the wires and returns the value through the source. The flow then follows the wire down and to the left, which implies the return value.

Moving on, there's a powerful facility called dispatch/collect. It is similar to a for/each loop, though constrained, and lends itself to parallelism. Let's see the definition of vector addition.

vec+  o    o
      |    |
    o x  o x  dispatch
    |    |
    x o  x +
      |
      x collect
      |
------/


A dispatch/collect pair is essentially mapping a function, which is defined between "dispatch" and "collect", over the input values. An equivalent Lisp expression is

(define (vec+ a b)
  (map + a b))


So hey, why not create a map function in this language? Now you're talkin'! But, as we will see in the next addition to this series, it is desirable to have a map function which takes 3 arguments (map2), like the one automatically provided in Lisp. For fun, the regular map definition is included as well.

map  f  o
     |  |
     |  x o dispatch
     |    |
     f  o x
        |
      o x collect
      |
------/

map2  f    o    o
      |    |    |
      |  o x  o x  dispatch
      |  |    |
      f  x    x o
                |
        collect x o
                  |
------------------/


Here, for the map function, we can read: "For each element in the second argument, apply the function specified by the first argument and collect the result into a sequence." For the map2 function, the flow is trivially similar. Note the new 'f' notation. Argument lists for functions (in this syntax) are ordered by number and can be accessed by keyword. The keyword 'f', or keychar in this case, makes sense for a function. Let's see what the vec+ function looks like using our new definition for map2.

vec+  o   o
      |   |
      |   | + o
      |   |   |
 map2 x o x   f
        |
--------/


Here, the + function is lambdafied by applying it to no arguments. It is then used as the function to map2, as specified by the 'f' keyword. We can see that keywords can be very useful - imagine the placement of the + function is not so trivial, it is often impossible to direct a wire to a certain parameter position without crossing another wire. Because of this necessity, numeric keywords are always available. An equivalent definition is given by

vec+  o   o
      |   | + o
 map2 2 o 3   1
--------/


Since the arguments start at 1, we can imply the same ordering with

vec+  o   o
      |   | + o
 map2 x o x   1
--------/


That's all for tonight. While I'm still struggling with major issues, I chose the previous examples in confidence that they will remain valid.

-- Alex

2009.11.24 - last edit >> Tue, 24 Nov 2009 01:53:19 -0500