Sifting AT&T billing records to measure degrees-of-separation

Rohit Khare (rohit@uci.edu)
Sat, 6 Feb 1999 16:30:46 -0800


John --

In our continuing search for the dimensions of the six-degrees
research area, some fresh reports of empirical study. I know Bill
Cheswick was doing some fine work on visualizing Internet routing
tables -- have you ever seen his giant plots with a purple MCI
backbone as "true north", the one fixpoint from wherever on the
Internet you start tracing routes? -- but Joan's involvement was news
to me. You might remember her from my W3C work on trust management
and the REFEREE student project at MIT.

Still, finding 30-person cliques within 12-hour data? Not even the
most sterotyped claque of teenage girls could manage that. My
esitmation is that it's someone's IPO gone awry as brokers
frantically call each other to reprice a sinking issue...

Best,
Rohit

Needles in giant haystacks
------------------------------------------------------------------------

DATE 30-Jan-99

HOW many handshakes, or phone calls, or acquaintances are you away
from Bill Clinton? According to a common urban myth, probably no more
than six. Testing such a proposition, however, is difficult. But new
mathematical techniques being developed to handle "massive data
sets"-collections of information whose size is measured in millions
of gigabytes-might soon provide an answer, by combing through
billions of telephone records.
The records in question are held by AT&T , which keeps track of the
billing information for roughly 250m phone calls each day. Every
record includes a caller's number, the number of the person called,
the time of day, the duration of the call and so on, resulting in 18m
gigabytes of billing data a year. Although individually meaningless,
these records fit together to form a huge mathematical structure
called a directed multigraph, which represents the interconnectedness
of the 300m phone numbers known to AT&T 's computers.
Careful analysis of this information could help with infrastructure
planning, customer classification and marketing. Do all area codes
call each other equally? Which pairs of codes call each other most
often? Joan Feigenbaum, a researcher at AT&T 's Florham Park
laboratory in New Jersey, says she and her team will get round to
answering these questions eventually. At the moment, though, they are
more interested in using the data to investigate patterns of social
behaviour.
For example, how big is the largest "clique" of phone numbers, all of
which call each other on the same day? Dr Feigenbaum's colleague,
Mauricio Resende, analysed the records from a 12-hour period with a
20-processor supercomputer in order to find out. The resulting
multigraph involved 123m connections among 53m numbers, and the
largest clique was found to contain at least 30 numbers.
This result is, however, necessarily imprecise, because the
mathematical recipes normally used in these sorts of cases break down
when faced with such a large quantity of data. So Dr Feigenbaum and
her colleagues are working on new algorithms that will, they hope, be
of general use in grappling with the world's growing data mountains.
And, sometime later this year, they plan to run a calculation to
determine the "diameter" of one of their multigraphs-in other words,
the maximum number of intermediate acquaintances required to link any
two phone numbers. Will it be six?