Journal Articles
Browse in : |
All
> Journals
> CVu
> 122
(18)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Pure, functional, lazy ISBNs
Author: Administrator
Date: 03 March 2000 13:15:35 +00:00 or Fri, 03 March 2000 13:15:35 +00:00
Summary:
Sometimes the best thing you can do to improve your productivity is to take a holiday; a fresh perspective can work wonders. I'm going to describe one of my favourite holiday destinations. Haskell
Body:
Sometimes the best thing you can do to improve your productivity is to take a holiday; a fresh perspective can work wonders. I'm going to describe one of my favourite holiday destinations.
Haskell is a pure, functional programming language using lazy evaluation.
-
Pure: it has no assignments
-
Functional: you can treat functions just like any other values
-
Lazy: it only evaluates expressions when needed
I am going to give a brief introduction to the language, using the core of the ISBN example from C Vu 11.5. The central routine takes a sequence of integers corresponding to an ISBN or ISSN, weights each by its position, calculates the total, and checks that the total has remainder zero when divided by eleven.
A simple C++ implementation might be
bool is_valid(const int *a, int n) { int t = 0; for(int i = 0; i < n; ++i) t += (i+1) * a[i]; return (t % 11) == 0; }
This totally ignores the complications introduced by input/output and error handling, but translating this into Haskell is informative.
Haskell functions have two parts: a definition and an optional[1] declaration, so we write
isValid :: [Int]->Bool isValid xs = validateTotal (addNumbers (weightDigits xs))
The first line declares isValid to have type [Int]->Bool. That is, to be a function taking a list of integers ([Int]) and returning (->) a boolean (Bool). The second line says that whenever we see isValid applied to a value, we can replace that by three other functions validateTotal, addNumbers and weightDigits applied to the same value.
Function application is written by simple juxtaposition, so weightDigits xs applies the weightDigits function to the parameter xs which was passed to isValid. We could have used any name for the parameter, but it is conventional to use a name ending in s for a list.
The function validateTotal is the simplest of the three
validateTotal :: Int->Bool validateTotal t = (mod t 11) == 0
where mod x y corresponds to x%y in C++. Apart from this, it is very close to the corresponding C++
bool validate_total(int t) { return (t % 11) == 0; }
While C++ used a loop over an array, Haskell must add up all the elements of a list. Lists in Haskell are either empty, written as [ ], or they can be split into their first element and the rest of the list, written as (x:xs). Functions with lists as arguments can handle these two cases separately, as in
addNumbers :: [Int]->Int addNumbers [] = 0 addNumbers (x:xs) = x + addNumbers xs
The notation may be unfamiliar, but this definition states the obvious: the sum of the numbers in an empty list is zero, and the sum of the numbers in a non-empty list is given by adding the first element to the sum of the rest.
This feature of Haskell is known as pattern-matching. When an expression involves applying addNumbers to a value, it will be evaluated just enough to decide which of the patterns will be used to rewrite the expression.
We can give a similar definition for weightDigits, but we need to pass around another parameter which indicates the current weight, so that
weightDigits :: [Int]->[Int] weightDigits xs = weightDigitsBy 1 xs
where
weightDigitsBy :: Int->[Int]->[Int] weightDigitsBy w [] = [] weightDigitsBy w (x:xs) = (w * x) : weightDigitsBy (w+1) xs
The declaration of weightDigitsBy says that it is a function taking an integer and returning another function ([Int]->[Int]). By applying weightByDigits to the value 1 we get a function which we can in turn apply to the list xs in order to get the result we wanted. Using functions which return other functions, instead of functions with multiple arguments, is known as currying[2]. (STL's function adapter bind1st achieves the same effect for binary functions since bind1st(f,x)(y) is just f(x,y).)
Now
weightDigits xs = weightDigitsBy 1 xs
so applying weightDigits to any value has the same result as applying the function weightDigitsBy 1; that is weightDigits and weightDigitsBy 1 are exactly the same function. It follows that we can write
weightDigits = weightDigitsBy 1
dropping the parameter from both sides.
So far we've turned seven lines of C++ into a dozen longer lines of Haskell. We can redress the balance by making use of some features of the Haskell library.
We did not need to define addNumbers. The library function sum does the same thing, but at first sight its definition is nothing like the one above. Simplifying slightly[3], the library defines
fold :: (t -> u -> u) -> u -> [t] -> u fold f a [] = a fold f a (x:xs) = f x (fold f a xs)
and then defines sum as
sum :: [Int]->Int sum = fold (+) 0
(+) is just shorthand for the addition function; for any binary operator @, (@) is the corresponding curried function. (x @) and (@ y) are shorthands for the corresponding unary functions, so that (1 +) is a function which adds one to its argument, while (/ 3) divides by three.
What does fold do? If the list is empty, then it returns the seed value a which we passed in. If the list is not empty, then it uses the function f to combine the first element of the list with the result of fold on the rest of the list. The lower-case type names in the declaration of fold indicate that it is polymorphic[4] with respect to those types; any type can be used. In the case of sum, both those types are Int.
fold generalises the definition of addNumbers. It captures the shape of the recursion, but allows the seed value and combining function to be passed in. Once fold is defined, we can use it to define similar functions without having to code the recursion explicitly. For example
product :: [Int]->Int product = fold (*) 1
multiplies together the elements of a list.
Those familiar with the Standard Template Library may feel that they've seen this before. fold corresponds almost exactly to the accumulate algorithm from the STL.
template<class InputIterator, class T> T accumulate(InputIterator first, InputIterator last, T init, BinaryFunction binary_op);
accumulate captures the shape of a particular loop, just as fold does for the recursion.
We have simplified the definition of addNumbers to
addNumbers = sum
Finding a direct replacement for weightDigits in the library seems less likely; the recursion in weightDigitsBy involves both a list and an incrementing weighting factor. We can, however, replace that parameter with another list
weightByList :: [Int]->[Int]->[Int] weightByList ws [] = [] weightByList (w:ws) (x:xs) = (w * x) : weightByList ws xs
That is, weightByList takes a list of weights which will be used for the corresponding values in the list of digits. weightDigits can then supply a suitable list of weights
weightDigits xs = weightByList [1..(length xs)] xs
where [1..n] builds the list of numbers from 1 to n inclusive. weightByList just multiplies together corresponding elements of two lists. Generalising away from the multiplication we find that the library contains
zipWith :: (t->u->v)->[t]->[u]->[v] zipWith f xs [] = [] zipWith f [] ys = [] zipWith f (x:xs) (y:ys) = (f x y) : zipWith f xs ys
which combines corresponding elements of the two argument lists using f, giving back the list of results. weightDigits is then just
weightDigits xs = zipWith (*) [1..(length xs)] xs
Finally, since zipWith finishes as soon as either of its argument lists runs out, we can pass in any weight list which is at least as long as xs. In particular, we can pass in [1..] which is the infinite list of integers starting from one. Lazy evaluation guarantees that only the required portion will be evaluated. That means that we can write
weightDigits = zipWith (*) [1..]
Functional programming borrows the idea of composition from mathematics. If f and g are functions, then f.g is the function which applies f after applying g, so that
(f.g) x = f (g x)
The corresponding part of STL has not been standardised. Austern [Austern] calls it unary_compose. According to Josuttis[Josuttis] SGI called it compose1 and Boost[5] uses compose_f_gx.
Using this notation, we can write
isValid = validateTotal.addNumbers.weightDigits
We can even apply the same trick to rewrite validateTotal from
validateTotal t = (mod t 11) == 0
to
validateTotal = (0 ==) . flip mod 11
The operator section (0 ==) compares its argument against zero, and flip mod 11 takes the remainder of its argument by eleven. flip reverses the order of arguments to a function, that is
flip :: (t->u->v)->(u->t->v) flip f y x = f x y
The closest STL analogue is bind2nd since bind2nd(f,y)(x) evaluates to f(x,y).
Eliminating all three of the helper functions, we can write isValid as
isValid :: [Int]->Bool isValid = (0 ==).flip mod 11.sum.zipWith (*) [1..]
Or "is zero equal to the remainder (modulo eleven) of the total of the product of each digit with its position?", which is the specification we started with.
Functional programs are often like this; they have a natural, terse form which is close to the original description. The above one-line definition is what I wrote down straight after the reading the original C Vu article.
Can we use the STL to produce a corresponding one-line C++ program? The last two components of
isValid = (0 ==).flip mod 11.sum.zipWith (*) [1..]
take corresponding values from two lists, multiply them together and add up the results. Rummaging through the STL algorithms, inner_product takes corresponding values from two input iterators, multiplies them together and adds up the results. The simpler version of this algorithm defaults to using operator* and operator+, and is
template<class InputerIterator1, class InputIterator2, class T> T inner_product(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, T init);
That suggests we want an input iterator corresponding to [1..], the list of integers up from one. Following the skeleton in Matthew Austern's book [Austern], that is
#include <iterator> class upfrom : public std::iterator<std::input_iterator_tag, int, ptrdiff_t, const int *, const int &>{ int n; public: upfrom(int x) : n(x) { } reference operator*() const { return n; } pointer operator->() const { return &n; } upfrom operator++() { return upfrom(++n); } upfrom operator++(int){ return upfrom(n++); } bool operator==(const upfrom &x) const { return n == x.n; } bool operator!=(const upfrom &x) const { return n != x.n; } } ;
This is simplistic: we could certainly extend this to make it a forward iterator, and the int could be replaced by a template parameter. However, it is all we need to write the C++ equivalent of the Haskell as
#include <algorithm> bool is_valid(const int *a, int n) { return (std::inner_product(a, a+n, upfrom(1), 0) % 11) == 0; }
STL does not have a direct analogue for zipWith, although constructing one is an interesting exercise. Examining the above version of is_valid may suggest why: splitting the call to inner_product into separate calls to accumulate and zip_with requires storing the intermediate values. The use of lazy evaluation allows Haskell to avoid this.
I've been using it to tackle a toy problem, but Haskell is far from a toy language. It has user-defined types, concise list-comprehensions, both local and anonymous functions, a module system for structuring projects, a substantial I/O library, and a class system which is somewhere between inheritance and generic programming.
It does have weaknesses. The most widespread version is interpreted, the best compilers lag behind C compilers, and most implementations require garbage collection. Using C-style APIs from Haskell is a major challenge.
The lack of assignment is a problem in that habits established with conventional languages may become a hindrance, but it is also a major benefit. No assignment means no side-effects, so the subtle problems caused by aliasing and evaluation order disappear at a stroke. The easiest C++ classes to reason about are those with value semantics (only const methods). In effect, Haskell permits nothing else.
Reports suggest that programming in Haskell can be more productive and less error-prone than in more conventional languages. I regularly use Haskell for small puzzle-solving programs, and I've had good results using it to prototype tools for analysing other programs; pattern-matching functions are well-suited for performing such symbolic manipulation. I would hesitate before writing an interactive GUI application in Haskell, although I have seen it done.
STL is relatively new and uses parts of C++ fresh from standardisation, but the related ideas behind the Haskell library can be traced back at least twenty years to the work of the Lisp community. As such they have been part of my own toolkit for over a decade; meeting them again in the STL was a pleasant surprise.
Every programming language can give you a different perspective, especially the weird ones. Learn another language; it can be the easiest way to improve your programming and could have benefits you never anticipated.
Francis' talk on being multi-lingual, discussions over coffee at the JACC seminar, Matthew Austern's book [Austern] on STL, and feedback from Phil Bass all helped me to finish an article I've been thinking about for years.
My favourite functional programming textbook [Abelson] uses Scheme. Richard Bird's book [Bird] is a standard introductory text on Haskell. Mark Jones' work on Hugs [Jones-] helped make a free version of Haskell available on a wide range of systems.
[Abelson] Harold Abelson, "Structure and interpretation of computer programs." ISBN 0 262 01077 1, MIT Press.
[Bird] Richard Bird, "Introduction to Functional Programming using Haskell." 0 13 484346 0, Prentice-Hall.
[Jones-] Mark P Jones et al., "Hugs98." Available from http://www.haskell.org.
[1] The declaration is optional since Haskell can deduce the type of any expression. Warnings of mismatches between the declared and deduced types help to catch silly mistakes.
[2] After Haskell B Curry who first described it in the context of the lambda calculus, a mathematical construct which has functions but no other values.
[3] Actually, I'm simplifying a lot. There are issues here to do with the direction of the fold, associativity of f, overloading of +, and forcing non-lazy evaluation.
[4] This is the term is used by the functional programming community. It is only loosely related to dynamic binding in C++.
Notes:
More fields may be available via dynamicdata ..