Zipf's Law as applied to the Web

Date view Thread view Subject view Author view

From: Rohit Khare (rohit@uci.edu)
Date: Thu Mar 02 2000 - 05:06:57 PST


 From http://www.cs.bu.edu/faculty/crovella/paper-archive/TR-95-010/paper.html

Characteristics of WWW Client-based Traces
Tue Jul 18 10:53:00 EDT 1995

The final instance of hyperbolic distributions in our data occurs as
an instance of Zipf's law [15, discussed in [11],]. Zipf's law was
originally applied to the relationship between a word's popularity in
terms of rank and its frequency of use. It states that if one ranks
the popularity of words used in a given text (denoted by p) by their
frequency of use (denoted by P) then

        P ~ 1/p

Note that this distribution is parameterless, i.e., is raised to
exactly -1, so that the nth most popular document is exactly twice as
popular as the 2nth most popular document. Zipf's law has
subsequently been applied to other examples of popularity in the
social sciences.

Our data shows that Zipf's law applies quite strongly to documents on
the WWW. This is demonstrated in Figure 8 for all 46,830 documents
referenced in our logs. The figure shows a log-log plot of references
to each document as a function of the document's rank in overall
popularity. The tightness of the fit to a straight line is remarkable
(R^2 = 1.00), as is the slope of the line: -0.986. Thus the exponent
relating popularity to rank for WWW documents is very nearly -1, as
predicted by Zipf's law.

[11] Benoit B. Mandelbrot. The Fractal Geometry of Nature. W. H.
Freedman and Co., New York, 1983.

[15] G. K. Zipf. Human Behavior and the Principle of Least-Effort.
Addison-Wesley, Cambridge, MA, 1949.

Carlos R. Cunha Azer Bestavros Mark E.\ Crovella
Computer Science Department
Boston University
111 Cummington St, Boston, MA 02215
{carro,best,crovella}@cs.bu.edu
BU-CS-95-010


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Thu Mar 02 2000 - 05:08:29 PST