[FoRK] PubSub 2.0

Ian Andrew Bell <hello at ianbell.com> on Sun May 6 13:07:15 PDT 2007

I'm confused.  I don't understand how (below) is related to PubSub,  
so there must be some aspect of the company's goals that my  
postmortem forensic analysis might be missing.

I have noticed that the company is getting raw Tsunami alerting data  
from the USGS <ens at usgs.gov>.   I assume that they had built some  
tools of some sort to cluster/aggregate news and reporting data after  
the Indonesian Tsunami.

-Ian.


On 4-May-07, at 4:02 PM, J. Andrew Rogers wrote:

> On May 4, 2007, at 3:18 AM, Eugen Leitl wrote:
>> On Thu, May 03, 2007 at 07:46:29PM -0700, J. Andrew Rogers wrote:
>>
>>> As a semi-related tangent, I have seen multiple interesting and
>>> serious proposals in the last couple months for the high-performance
>>> routing of exotic and pervasively decentralized and distributed  
>>> high-
>>> dimensionality topologies (e.g. for vanilla 3D and 4D spatio-
>>
>> Can you expand? This is an area of dear interest to me, assuming
>> I parsed you correctly.
>
>
> The background:
>
> For all the nascent interest in geospatial data on the Internet, it  
> has a number of very significant limitations currently, that in  
> aggregate mean that it does not scale in terms of space, time, or  
> concurrency.  In other words, it does not scale at all and the  
> following problems have not been seriously addressed (some of these  
> overlap theoretically):
>
> - Non-trivial data sets have poor search/retrieval performance
> - Data sets that contain anything other than points (e.g. polygons)  
> are even worse
> - Concurrency is relatively poor with even a small dynamic write load
> - The underlying algorithms are not really amenable to distributed  
> systems or clusters
> - The computational costs are considerably higher than, say, text  
> search and retrieval even ignoring the other scaling problems
>
> These basic facts are aggravated by the reality that the data sets  
> just the private sector want to work with are huge (terabytes of  
> new data per day organized in space-time, with near real-time  
> spatially-localized analytics), and that the data may be physically  
> organized in that space with considerable latency between any two  
> data publishers or even aggregation points.  You cannot get there  
> from here applying the same old Big Tree abstractions we've always  
> used because the physical constraints do not allow it, and even  
> when we are not constrained to physical topologies that reflect the  
> data, the same old Big Tree abstractions kinda suck regardless when  
> the data is neatly collected in one place.
>
>
> Over a year ago, I became involved (nominally part-time, ha) in a  
> multi-org cooperative research effort to address some of the core  
> theoretical computer science issues that were holding back  
> massively scalable and distributed geospatial systems.  A lot of  
> interesting technology has started to emerge from that effort,  
> including very interesting new access method algorithms and data  
> structures for efficiently and directly representing 2+ dimensional  
> spaces in distributed data systems.  It should be generalizable to  
> any n-dimensional data topology (with a few conditions and caveats)  
> and so is not limited to geospatial, though geospatial is the  
> driving application and the only one well-studied.  Importantly,  
> the access method was designed to be directly projected upon the  
> physically distributed topology of the data while allowing very  
> high concurrency e.g. if you had a globally distributed 2-spheroid  
> (like the earth's surface) data model with everyone locally  
> maintaining and updating their local data you could actually do  
> that very efficiently, and the routing and representation would  
> directly reflect the 2-spheroid topology with considerable  
> efficiency.  There is no great magic to it, the main theoretical  
> trick is pretty simple but so counter-intuitive that no one had  
> ever even considered it in literature, the rest is pretty standard  
> clever scaling-fu.
>
> These days I am working on more practical things like the core  
> routing/metadata protocols you would use to build real distributed  
> data networks that use these representations.  People are already  
> putting together some really interesting proposals for things you  
> could build with it.
>
> I'll probably have a lot more to say about it this summer.
>
> Cheers,
>
> J. Andrew Rogers
>
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork

  _____________
  www.ianbell.com



More information about the FoRK mailing list