[FoRK] Joyent, cloud service evolution (J. Andrew Rogers)

J. Andrew Rogers andrew at jarbox.org
Wed Jun 22 16:56:42 PDT 2016

> On Jun 21, 2016, at 10:43 PM, Stephen D. Williams <sdw at lig.net> wrote:
> SSDs should have made this kind of thing much easier by removing rotational latency from the equation, leaving mainly generic IOPs
> which should be position insensitive.

For many batch-oriented big data applications, SSDs have fewer advantages than you would think. The higher sequential bandwidth and lack of head seek don’t always address the actual bottlenecks if the I/O scheduler is good.

Real-time workloads with highly concurrent operations are a different matter though. But even then, there are a minority of workloads where HDD works just as well in practice.

> It would be interesting to know the specific data, required indexing, and types of queries required for this area.  Are there public
> benchmarks for this type of IoT yet?

What operation would you be benchmarking exactly and in what dimensions? 

The data models are simple. Your primary attributes are space, time, and an unique identifier with a payload of other types which may or may not be indexed depending on the app. Both space and time may be interval types. Multiple sources of the same type are all stored in the same table but different source types are stored in different tables. Basically a “layer” model. Ad hoc joins, which may be recursive, on any subset of the primary attributes is an important operation and can span an arbitrary set of tables.

Two of the main query types that are used across most apps (and often together):

- Given a triggering event, characterize it using context reconstructed from spatiotemporally proximal data, both from similar sources and unrelated types of sources. This is the “something just happened, we need to understand it” scenario.

- Given an ad hoc set of selected entities, discover and/or analyze relationships between those entities by traveling backward in time and observing their behavior. This is the “how many people in this room went to Starbucks yesterday” query. 

> If the commodity servers and communication fabric are much
> less expensive, they can be better while being much worse.

There is nothing "much worse" about commodity servers and networking given a fixed budget. The only real downside is that it is more likely to expose poor software design and implementation. 

More information about the FoRK mailing list