RE: book review: Mastering Algorithms with Perl

Hokkun Pang (hpang@flycast.com)
Wed, 24 Nov 1999 11:35:21 -0800


I brought this book on company's expense so no regret yet.
As I was searching for Perl books, the ones for data structure is
particularly lacking. Perl's standard data structure (array, hash, etc)
have too much overhead and are simply not space efficient.
For example, I tried to build a lookup table of 1 million entries. the
raw data is about 20Meg but the equivalent Perl hash table uses over
a 100 meg of memory. I tried to use Tie variables but there are many
restrictions and probably save at most 50%.
wish there's a nice C-interpreter with a good set of libraries.

-----Original Message-----
From: John Regehr [mailto:jdr8d@cs.virginia.edu]
Sent: Wednesday, November 24, 1999 1:37 PM
To: fork@xent.com
Subject: book review: Mastering Algorithms with Perl

Mastering Algorithms with Perl

by Jon Orwant, Jarkko Hietaniemi, and John Macdonald

O'Reilly
1st Edition, August 1999
1-56592-398-7
704 pages, $34.95
http://www.oreilly.com/catalog/maperl/

review by John Regehr

In The New Hacker's Dictionary under "superprogrammer," we read that
"productivity can vary from one programmer to another by three orders
of magnitude." I would argue that at least one of these factors of ten
comes from the ability to quickly recognize what algorithms should be
used to solve different parts of a problem and to find or write
implementations of those algorithms that will result in an efficient
program, given the available time and the characteristics of the
problem. This ability is developed through experience and by
understanding the highlights of the large body of algorithms and
analysis of algorithms that has been developed to solve problems that
occur over and over again in computer programs.

Mastering Algorithms with Perl is designed to provide the necessary
background. It's structured like a traditional algorithms textbook:
after describing some basic and advanced data structures (linked
lists, trees, heaps, etc.), it has chapters about searching, sorting,
sets, matrices, graphs, strings, and some related topics. After the
introduction and discussion of data structures, the chapters are
relatively independent and could be read in any order. The authors
provide plenty of cross-references as well as pointers to books that
describe individual subjects in more detail.

The intended audience is programmers who don't have a background in
computer science, who know at least some Perl. However, experienced
programmers who don't know Perl should have no trouble picking up the
basics of the language with this book and a copy of Programming
Perl. Also, computer scientists can often use a review of algorithms,
and the CPAN pointers are very useful. So, I would go so far as to say
that this book would enrich any programmer's bookshelf. A stringent
test of the merit of a new technical book is to ask if it adds some
value, given the best existing books in its area? I think that
Mastering Algorithms with Perl definitely does. It is a well-written
introduction to algorithms that is more accessible, practical, and
entertaining than standard algorithm books. It leverages off of the
strengths of a powerful language and a large base of reusable code.

The rest of this review will evaluate the strengths and weaknesses of
Mastering Algorithms with Perl in more depth. The central issue that I
will consider is why the reader might or might not prefer an
algorithms book that concentrates on a single language, as opposed to
a general algorithms book. I will try to be up-front about my biases:
as a computer scientist, I consider this book to be a compromise
between an algorithms book and a how-to manual. This compromise makes
it much more useful to Perl programmers, but it sometimes causes the
algorithms content to be too watered down.

It is traditional in algorithms books to describe algorithms in
pseudocode, which often superficially resembles Pascal. The difference
between pseudocode and real code is that pseudocode is not compilable
- it ignores implementation details that are not helpful to
understanding a particular example. This is considered to be an
advantage: without the clutter, the core of the algorithm is easier to
see and understand. At the beginning of the book the authors make the
point that the Perl code for a binary search is actually shorter than
the corresponding pseudocode. And it's true! The advantage of the Perl
program is that we have a readable description of the algorithm, and
it's executable too. (Unfortunately, it's often nontrivial to convert
pseudocode into real source code - the devil is in the details.) The
binary search example is slightly misleading, however, because in this
case a native Perl data structure (the array) matches the semantics of
the problem extremely well, leading to a clear and concise
implementation. Later in the book, particularly in the chapter on
graphs, we see examples where Perl's built-in data structures are less
well suited to the problems. The executable Perl code for graph
operations are much longer than the corresponding pseudocode, and are
often so syntactically cluttered that they are difficult to read. Is
this a flaw in the book or in Perl? No - it's a consequence of giving
examples in runnable code instead of pseudocode. Is the tradeoff worth
it? Probably, but it depends on what you're trying to get out of the
book.

Another consequence of basing an algorithms book on a real language is
that the authors can point readers to existing implementations of the
algorithms, in CPAN. It's hard to overstate how big of a win this
is. Perl is a powerful language to begin with, but it becomes far more
powerful when programmers are able to take advantage of the large body
of existing code modules. An unfortunate side effect of the fact that
the book talks about specific versions of Perl and about specific CPAN
packages is that this information will become outdated much more
quickly than the algorithms will. Unless the Perl language and CPAN
are exceptionally stable in the future, I would not expect most of
this information to be valid for more than a few years - hopefully a
new version of the book will be available before this one becomes too
out of date.

Because the book provides executable code for the algorithms, it's
possible to evaluate the performance of the example code (which is
available at the O'Reilly site). The authors benchmark a number of the
algorithms that they present, and compare the results. This is a nice
change from the discussion of asymptotic running times found in
traditional algorithm books, which generally ignore the constant
factors that often make the difference between an algorithm being
useful in practice or not.

The design and analysis of algorithms is a highly mathematical
discipline. A sophisticated set of tools has been developed to
evaluate the tradeoffs between various algorithms: How efficiently do
they use memory and processor cycles? What is the best, average, and
worst case running time of various operations? How does the algorithm
scale as the size of the input grows? As it turns out, programmers
need to understand a few of these formalisms, particularly the "big O"
notation for describing asymptotic running time. I think that
Mastering Algorithms with Perl uses theory in just the right way: as
an aid to programmers' intuition about algorithms, rather than beating
us over the head with formulae and proofs. That said, I think there is
one area of theory that this book should have spent more time on: NP
completeness. NP-complete problems are solvable, but are believed to
be inherently hard: no efficient algorithm has been discovered to
solve them. There are a wide variety of NP-complete problems, and they
do come up in practice. For programmers, the important thing is first
to recognize that an NP-complete problem has been encountered, and
that it cannot be solved exactly except in small instances. Then, a
heuristic that comes up with a good enough approximation of the
solution needs to be found and implemented. This is a practical and
well-studied part of algorithm design, and in a 650-page book I would
expect more than a page or two to be devoted to it.

Several chapters of Mastering Algorithms with Perl are too shallow to
be considered good introductions to the associated areas of
algorithms. For example, the chapter on matrices only shows code for
some of the more trivial matrix operations; for complex tasks, it
tells the reader how to use PDL - the Perl Data Language. Although PDL
looks like a useful and powerful package, readers should not confuse
knowing how to use it with understanding matrix algorithms. In other
words, the matrix chapter is too much of a how-to manual. Other
chapters such as the ones on searching and sorting are excellent and
avoid falling into this trap. Algorithms is a huge area, and it can't
all be covered well in 650 pages. The later chapters are a lot of fun
to read, but some of them should probably have been scrapped in favor
of more depth in core areas.

In conclusion, this is a well-written, useful book. Viewed as a Perl
book it's superb; it complements the strengths of Programming Perl and
The Perl Cookbook, and I think most or all Perl programmers would
benefit from having a copy. Viewed as a computer science book, it has
made a number of compromises in order to focus on a specific language;
this is not necessarily a problem but it is something that readers
should be aware of.

Acknowledgments: Thanks to Tom Christiansen, Dave Coppit, Bill
Pearson, and Jamie Raymond for helpful comments on previous drafts of
this review.