Is Google really on a completely different plane? [was Re: [FoRK]
Wed Dec 7 19:03:26 PST 2005
On 12/7/05, Justin Mason <jm at jmason.org> wrote:
> Ken Meltsner said:
> > On 12/7/05, Aaron Burt <aaron at bavariati.org> wrote:
> > > * You sell a specialized product that has a specialized language.
> > > Might be an enterprise app environment like Peoplesoft or SAP, might
> > > be a CAD scripting language, might be ladder-logic for industrial
> > > controls, might be an embedded uC language like PICBasic.
> > >
> > > Note that specialized languages also serve as shibboleths, ensuring
> > > that if someone knows the language, they know the problem-domain, too.
> > The trend here has definitely been away from specialized languages
> > towards libraries/add-ons to existing languages.
> > If there is a good way to extend an existing language with app-specific
> > functions -- Java is a counter-example, of course -- a custom language
> > is only going to be viewed as an undesirable attempt at vendor lock-in.
> > Note: VB-like or C-like or Java-like don't cut it -- it's not the
> > syntax, but the corner cases and other oddities that make users run away
> > from app-specific languages.
> Hmm. It's interesting to note that Google went in the opposite direction
> recently, with Sawzall:
> Interpreting the Data: Parallel Analysis with Sawzall (Draft)
> Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan
> Very large data sets often have a flat but regular structure and span
> multiple disks and machines. Examples include telephone call records,
> network logs, and web document repositories. These large data sets are
> not amenable to study using traditional database techniques, if only
> because they can be too large to fit in a single relational database. On
> the other hand, many of the analyses done on them can be expressed using
> simple, easily distributed computations: filtering, aggregation,
> extraction of statistics, and so on.
> We present a system for automating such analyses. A filtering phase, in
> which a query is expressed using a new programming language, emits data
> to an aggregation phase. Both phases are distributed over hundreds or
> even thousands of computers. The results are then collated and saved to
> a file. The design -- including the separation into two phases, the form
> of the programming language, and the properties of the aggregators --
> exploits the parallelism inherent in having data and computation
> distributed across many machines.
This really looks to me like 'exception proving the rule'- the most
interesting specialist language any of us can think of off the top of
our heads is really useful, *if* you happen to have thousands of
machines sitting around *and* your data set's size is measured in the
tens or hundreds of terabytes.
Which makes me ask:
* is google really operating on that different a plane than the rest
of us, in terms of storage techniques and data manipulation? It seems
like currently they are inventing techniques that (1) solve problems
only a handful of companies have ever even thought about and (2) none
of those handful have ever actually resolved satisfactorily. But maybe
that's just a naive outsider buying into their PR.
* will some of google's techniques eventually trickle down to the rest
of us? For example, from what I know of google's data storage
techniques, it seems plausible to me that an xoogler from their
storage group could easily put together a SAN OS company that would
easily destroy every other SAN company on the planet. But as of yet
I've not seen any indication that that is happening.
* how did this list end up with no google people? c'mon, west
coasters, you're slacking...
 see the interesting http://xooglers.blogspot.com/
More information about the FoRK