Is Google really on a completely different plane? [was Re: [FoRK] languages]

Luis Villa luis.villa
Wed Dec 7 19:03:26 PST 2005


On 12/7/05, Justin Mason <jm at jmason.org> wrote:
> Ken Meltsner said:
> > On 12/7/05, Aaron Burt <aaron at bavariati.org> wrote:
> > > * You sell a specialized product that has a specialized language.
> > > Might be an enterprise app environment like Peoplesoft or SAP, might
> > > be a CAD scripting language, might be ladder-logic for industrial
> > > controls, might be an embedded uC language like PICBasic.
> > >
> > > Note that specialized languages also serve as shibboleths, ensuring
> > > that if someone knows the language, they know the problem-domain, too.
> >
> > The trend here has definitely been away from specialized languages
> > towards libraries/add-ons to existing languages.
> >
> > If there is a good way to extend an existing language with app-specific
> > functions -- Java is a counter-example, of course -- a custom language
> > is only going to be viewed as an undesirable attempt at vendor lock-in.
> >
> > Note:  VB-like or C-like or Java-like don't cut it -- it's not the
> > syntax, but the corner cases and other oddities that make users run away
> > from app-specific languages.
>
> Hmm.   It's interesting to note that Google went in the opposite direction
> recently, with Sawzall:
>
> http://labs.google.com/papers/sawzall.html
>
>   Interpreting the Data: Parallel Analysis with Sawzall (Draft)
>   Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan
>
>   Abstract
>
>   Very large data sets often have a flat but regular structure and span
>   multiple disks and machines. Examples include telephone call records,
>   network logs, and web document repositories. These large data sets are
>   not amenable to study using traditional database techniques, if only
>   because they can be too large to fit in a single relational database. On
>   the other hand, many of the analyses done on them can be expressed using
>   simple, easily distributed computations: filtering, aggregation,
>   extraction of statistics, and so on.
>
>   We present a system for automating such analyses. A filtering phase, in
>   which a query is expressed using a new programming language, emits data
>   to an aggregation phase. Both phases are distributed over hundreds or
>   even thousands of computers. The results are then collated and saved to
>   a file. The design -- including the separation into two phases, the form
>   of the programming language, and the properties of the aggregators --
>   exploits the parallelism inherent in having data and computation
>   distributed across many machines.

This really looks to me like 'exception proving the rule'- the most
interesting specialist language any of us can think of off the top of
our heads is really useful, *if* you happen to have thousands of
machines sitting around *and* your data set's size is measured in the
tens or hundreds of terabytes.

Which makes me ask:

* is google really operating on that different a plane than the rest
of us, in terms of storage techniques and data manipulation? It seems
like currently they are inventing techniques that (1) solve problems
only a handful of companies have ever even thought about and (2) none
of those handful have ever actually resolved satisfactorily. But maybe
that's just a naive outsider buying into their PR.

* will some of google's techniques eventually trickle down to the rest
of us? For example, from what I know of google's data storage
techniques, it seems plausible to me that an xoogler[1] from their
storage group could easily put together a SAN OS company that would
easily destroy every other SAN company on the planet. But as of yet
I've not seen any indication that that is happening.

* how did this list end up with no google people? c'mon, west
coasters, you're slacking...

Luis

[1] see the interesting http://xooglers.blogspot.com/



More information about the FoRK mailing list