"Values associated with a data type usually have a shape, too. For example, the shape of a matrix is its size. Shape refers to data structures into which data can be inserted at various positions, or ``holes''. For example, the shape of a labelled graph is its underlying, unlabelled graph; that of a record is its set of field names. Shapes, like types, are important in the specification and semantics of programs, and should support tools for static error detection and improved compilation techniques. This position paper will outline the semantic underpinnings of shape theory, and suggest how it might be exploited in a variety of computational settings."
Separating Shape from Data is the summary of an invited address emphasising the semantic aspects of shape. It begins:
"Shape theory gives a precise categorical account of how data is
stored within data structures, or shapes..."
Types
Novel type systems designed to support shape-based programming can be
found in the various programming languages described below. For
example, Functorial ML supports
a class of functors used to support new forms of polymorphism, which
go by the name of functorial or shape polymorphism, or polytypy. Also,
FISh is an Algol-like language
that supports data types of arrays, by having a distinct class of
shape types.
Work on the semantics of shape lead to the discovery of the covariant types which form a sub-system
of system F of polymorphic lambda calculus based on a type constructor
that represents (natural) transformations. This can be used to
construct an alternative presentation of F that uses transformations
instead of universal quantification:
These two equivalent views, using a shape-data decomposition, and using a class of functors and natural transformations, are both important, and powerful. The original account is in our joint paper Shapely Types and Shape Polymorphism . The full picture can be found in A semantics for shape.
If F and G are functors shapely over lists then one can (typically)
construct an object F => G of natural transformations from F to G.
The theory of functors shapely over lists extends to functors of
several variables. So now we can consider transformations
Rather, they are data functors . Their properties are
developed in Data Categories
though there are some unsolved existence problems waiting to be sorted
out. In addition to the question of existence of initial algebras for
data functors, ther is the problem of establishing that they form data
functors themselves. In the paper I claimed to have solved this latter
problem (Theorem 5.7) but Bart Jacobs was kind enough to point out an
error, so the question is still open.
Polymorphism (P2, FML)
The ability to abstract over shapes suggests the possibility of shape
polymorphism, i.e. of writing programs which are independent of the
shape of their arguments.
Functors, Types and Shapes states the shape approach to
polymorphism, and explores its relationship to polytypism, emphasising
that shape polymorphic algorithms are parametrically
polymorphic.
Initially, it was not at all clear that shape polymorphic algorithms exist. The first construction was given in the experimental language P2 described in Polynomial Polymorphism .
Eugenio Moggi then proposed that shape polymorphism be viewed as
quantification over functors, or functorial polymorphism
. This was incorporated in an extension of ML called Functorial ML or FML written by us with
Gianna Belle. It supports an additional layer of functors underneath
the types which supports most of the structure. Then the types T are
either variables X, obtained by application of functors F to existing
types, or function types:
Most recently, this approach has been applied to the study of traversals which generalise mapping by
performing side-effects when each datum is found. More generally, we
can use a monad to represent the computation at each
node. Representing the shape by a functor F and letting M be the
monad, a traversal can be captured as natural transformation
Array Programming (FISh)
This approach was further developed in the FISh language. FISh
supports both the functional and imperative programming in the style
of an Algol-like language. Such languages divide their types into data
types (representing storeable values) and phrase types (representing
meaningful program fragments). Traditionally the data types have
simply been atomic datum types of integers, booleans, etc. FISh
further divides the data types into array types, generated by the
datum types and closed under array formation, and the shape types,
used to represent the array shapes. The presence of polymorphic,
nested arrays is made possible by the shape analysis.
The array type constructor has been studied in two forms. In the early version of FISh The functional imperative: shape! it represents one dimensional arrays. In the later version Poly-dimensional regular arrays in FISh it represents finite dimensional regular arrays.
Paul Steckler and I have implemented both versions for the SunOS operating system, or using the ocaml byte-code interpreter. You can download either. The implementation works by first translating to a simple imperative language Turbot, and translating this into C. Since shape analysis has determined the exact storage requirements for every array, there is no need to box data. The C code produced is easily read, and efficient. Benchmarking against Ocaml shows a speed-up of about 4-8 times when higher-order functions appear in loops over arrays.
FISh now has a formal
semantics in which shape analysis and evaluation of Turbot
programs are described formally.
Parallel Programming (GoldFISh)
Knowledge of shapes is essential for a rational approach to data
distribution in a parallel setting. First thoughts about the
possibilities can be found in Shape
Analysis for Parallel Computing by CBJ, D. Clarke and
J. Edwards.
Before choosing between data distributions, one requires a means of comparing the relative costs of the alternatives. Shape information was used to produce a PRAM cost model for VEC in A Monadic Calculus for Parallel Costing in a Functional Language of Arrays written by Murray Cole, Milan Sekanina, Paul Steckler and myself. This paper also introduced the idea of using computational monads to track costs in the parallel setting.
This idea will be used in a parallel version of FISh called GoldFISh.
GoldFISh will be a purely functional language that will use shape
analysis to determine costs, and hence appropriate distributions. Then
the code for each processor will be translated into efficient FISh
code. Latest ideas on this can be found in Costing parallel programs as a function of
shapes .
This an area of active research.
Object-Oriented Programming (Walkabout)
Shape polymorphism has close analogues in object-oriented programming,
such as the use of container classes, or design patterns. The
potential of this approach was discussed with James Noble and written up as Shaping object-oriented programs. Later,
Jens Palsberg
and I developed a parametric, or generic implementation of the
Visitor pattern (a form of graph traversal) as a Walkabout class in
Java 1.1 which is described in The Essence
of the Visitor Pattern . James then produced a Smalltalk
implementation of a generic Visitor, which is included in our joint
position paper Experiments with
Generic Visitors .