OCEANS research group at BU

Rohit Khare (khare@pest.w3.org)
Wed, 28 Aug 96 12:00:17 -0400


Looks like a clueful group -- at least they've identified some interesting
problems, mostly by bringing a middleware lens to looking at Web service

I inivited him to give a Friday Tech Talk at MIT... RK


Object Caching Environments for Applications and Network Services

Research Overview



The OCEANS project has as its goal the improvement of distributed information
systems like the World Wide Web based on measurement, analysis, and careful
redesign. The project undertook the first published study of the effectiveness
of caching in the Web; based on that work members have gone on to study the
benefits of eager document dissemination and speculative prefetching. The
project's early study of the characteristics of client use of the Web has been
widely cited, and recent work has concentrated on statistical
characterization of the properties of reference locality in the Web. OCEANS is
now beginning a full-scale implementation of a number of our techniques in
the context of a experimental high-performance, low-cost distributed Web

The OCEANS group research work encompasses a mosaic of projects that we
describe below.


Establishing the Self-Similarity of Web Traffic
PIs: Azer Bestavros and Mark Crovella

In this study, Professors Crovella and Bestavros established that Internet
traffic attributed to the Web is statistically self-similar. Furthermore, they
were able to trace the genesis of this self-similarity through a rigorous
analysis of Web file systems and user traces. Previous studies have
established the bursty (self-similar) nature of network traffic only at the
LAN (Ethernet) level. The presence of self-similar characteristics in Internet
traffic has many important implications. First, it implies that currently
adopted Markovian traffic models for analysis and simulation purposes are
inadequate because they allow Internet traffic to be smoothed out through
finite buffering---an impossible outcome in the presence of self-similar
traffic patterns. Second, it implies that current transport
protocols---utilizing negative acknowledgements as indicative of
congestion---may be too pessimistic and thus possibly not utilizing available
bandwidth efficiently. This implication was confirmed in a recent study by
other members of the Oceans group. Perhaps the most intriguing result from
this study was the attribution of the genesis of self-similarity to the
heavy-tailed nature of the distribution of file sizes in particular, and of
information quanta in general. The significance of this finding is that
traffic self-similarity is related to a universal property of information
representation, which is rooted in the very way humans process


Exploiting Locality of Reference Properties
PIs: Azer Bestavros, Mark Crovella and Abdelsalam Heddaya

The first pilot study conducted within the OCEANS group established the
importance of client-based caching in large-scale information retrieval
systems (henceforth referred to as the Web). However, this result pointed out
that client-based caching does not scale up and, alone, is not enough to
alleviate the performance problems of the Web. In a sequence of studies,
members of the OCEANS Group have traced the reason for the limited performance
of client-based caching to the absence of strong temporal locality of
reference properties in Web access patterns. Furthermore, they showed that
other forms of locality of reference properties (namely, spatial and
geographical) exist and are strong enough to be exploited efficiently. Based
on this, they proposed and evaluated through extensive trace-driven
simulations two server-initiated protocols for Web information retrieval. The
first protocol is a hierarchical data dissemination mechanism that allows
information to propagate from its producers to servers that are closer to its
consumers. This dissemination reduces network traffic and balances load
amongst servers by exploiting geographic and temporal locality of reference
properties exhibited in client access patterns. The second protocol relies on
speculative service, whereby a request for a document is serviced by sending,
in addition to the document requested, a number of other documents that the
server speculates will be requested in the near future. This speculation
reduces service time by exploiting the spatial locality of reference property.


Burstiness-tolerant Transport Protocols
PIs: Azer Bestavros and Mark Crovella

Given that burstiness (due to self-similarity) is to be expected in traffic
no matter how much buffering is performed, it seems reasonable to expect
transport protocols to be tolerant of burstiness. One way of dealing with
burstiness is to expect packet drops (i.e., erasures) and design the transport
protocol in a way that would mask (or otherwise reduce) the impact of packet
erasures. One way of achieving this goal is to use dynamically-adjustable
levels of redundancy. To that end, Professors Bestavros and Crovella are
currently investigating the use of AIDA techniques to incorporate this
adjustable redundancy. Another benefit of using AIDA in the design of
transport protocols is to tolerate the fragmentation of IP packets when
transmitted over ATM network using 48-byte cells. Professor Bestavros and his
students have proposed and analyzed through simulation the performance of an
AIDA-based TCP/IP protocol that they have named TCP Boston.


Tools for Network-Aware Applications
PI: Mark Crovella

The Tools for Network-Aware Applications project is studying ways to improve
the performance of network applications by providing them with measurements of
the current conditions on the Internet. Prof. Crovella and his students have
shown that if an application has access to a small set of simple measurements
of network conditions, it can improve response time dramatically. For example,
current Internet-based applications like the World Wide Web are typically
constained to retrieve a file only from one specific location. If instead a
small number of alternate locations are provided, then the application can
almost always improve the transfer time of the file by selecting on-the-fly
the location that promises the best performance at the current moment. To
support this approach to performance improvement, Prof. Crovella and his
students have developed tools to measure latency, bottleneck link speed, and
congestion along any path in the Internet. These tools have been shown to
accurately measure properties of the network, and hold promise for improving
the performance of systems like the World Wide Web.


PI: Stan Sclaroff

The primary goal of this project is to develop a world wide web image search
tool, for searching web documents based on image content. Unlike keyword-based
search, search by image content allows users to guide a search through the
selection (or creation) of example images. The technical challenges associated
with this project are in part due to the staggering scale of the world wide
web, and in part due to the problem of developing effective image
representations for very fast search based on image content. In addition, this
project will address issues relating to developing user interfaces for a web
search by image content browser.


The Responsive Web Computing Project
PIs: Azer Bestavros, Marina Chen, Mark Crovella, Abdelsalam Heddaya, and Stan

The goal of this umbrella project is to use the Web as a medium (within
either the global Internet or an enterprise intranet) for metacomputing in a
reliable way with real-time performance guarantees. We approach this problem
from four different levels: (1) network services and protocol-level
techniques, (2) middleware solutions such as caching, prefetching, and
replication, (3) Web computing resource management models, real-time
scheduling protocols, and services, and (4) an object-oriented framework to
capture these models and associated protocols and services along with
application-specific knowledge and the overall designs of Web computing


Maintainer: _A.Bestavros_
Created on: 1994.05.02
Updated on: 1996.08.18