[FoRK] PubSub 2.0

J. Andrew Rogers <andrew at ceruleansystems.com> on Fri May 4 16:02:07 PDT 2007

On May 4, 2007, at 3:18 AM, Eugen Leitl wrote:
> On Thu, May 03, 2007 at 07:46:29PM -0700, J. Andrew Rogers wrote:
>> As a semi-related tangent, I have seen multiple interesting and
>> serious proposals in the last couple months for the high-performance
>> routing of exotic and pervasively decentralized and distributed high-
>> dimensionality topologies (e.g. for vanilla 3D and 4D spatio-
> Can you expand? This is an area of dear interest to me, assuming
> I parsed you correctly.

The background:

For all the nascent interest in geospatial data on the Internet, it  
has a number of very significant limitations currently, that in  
aggregate mean that it does not scale in terms of space, time, or  
concurrency.  In other words, it does not scale at all and the  
following problems have not been seriously addressed (some of these  
overlap theoretically):

- Non-trivial data sets have poor search/retrieval performance
- Data sets that contain anything other than points (e.g. polygons)  
are even worse
- Concurrency is relatively poor with even a small dynamic write load
- The underlying algorithms are not really amenable to distributed  
systems or clusters
- The computational costs are considerably higher than, say, text  
search and retrieval even ignoring the other scaling problems

These basic facts are aggravated by the reality that the data sets  
just the private sector want to work with are huge (terabytes of new  
data per day organized in space-time, with near real-time spatially- 
localized analytics), and that the data may be physically organized  
in that space with considerable latency between any two data  
publishers or even aggregation points.  You cannot get there from  
here applying the same old Big Tree abstractions we've always used  
because the physical constraints do not allow it, and even when we  
are not constrained to physical topologies that reflect the data, the  
same old Big Tree abstractions kinda suck regardless when the data is  
neatly collected in one place.

Over a year ago, I became involved (nominally part-time, ha) in a  
multi-org cooperative research effort to address some of the core  
theoretical computer science issues that were holding back massively  
scalable and distributed geospatial systems.  A lot of interesting  
technology has started to emerge from that effort, including very  
interesting new access method algorithms and data structures for  
efficiently and directly representing 2+ dimensional spaces in  
distributed data systems.  It should be generalizable to any n- 
dimensional data topology (with a few conditions and caveats) and so  
is not limited to geospatial, though geospatial is the driving  
application and the only one well-studied.  Importantly, the access  
method was designed to be directly projected upon the physically  
distributed topology of the data while allowing very high concurrency  
e.g. if you had a globally distributed 2-spheroid (like the earth's  
surface) data model with everyone locally maintaining and updating  
their local data you could actually do that very efficiently, and the  
routing and representation would directly reflect the 2-spheroid  
topology with considerable efficiency.  There is no great magic to  
it, the main theoretical trick is pretty simple but so counter- 
intuitive that no one had ever even considered it in literature, the  
rest is pretty standard clever scaling-fu.

These days I am working on more practical things like the core  
routing/metadata protocols you would use to build real distributed  
data networks that use these representations.  People are already  
putting together some really interesting proposals for things you  
could build with it.

I'll probably have a lot more to say about it this summer.


J. Andrew Rogers

More information about the FoRK mailing list