TimBL on integration, semantic Web in Internet World's POV

New Message Reply Date view Thread view Subject view Author view

From: Sally Khudairi (sk@zotgroup.com)
Date: Tue Jan 04 2000 - 19:06:04 PST


excerpt and fun photo at
http://www.internetworldnews.com/idx_povarticle.asp?inc=POV/01.01interview1&i
ssue=01.01

full story follows:

Tim Berners-Lee

An unsentimental look at the medium he helped propel

By James C. Luh

At the dawn of a new century, the most romantic notions about the Internet
business can only take a greater hold: Rules seem less certain, possibilities
seem wider, and it seems less likely that anyone can predict today what the
future holds.

If you want the best possible guess, though, you could do worse than ask Tim
Berners-Lee, the man who created the foundation for the World Wide Web.

Internet World caught up with Berners-Lee in a midtown Manhattan diner while
he was out promoting his recent book, "Weaving the Web," which recounts how
today's Web grew out of technologies the London-born, Oxford-educated
researcher designed in the early '90s at Switzerland's European Laboratory
for
Particle Physics (CERN).

The 20th century's answer to Gutenberg was unassuming, energetic, and jovial,
taking advantage of a pause in our interview to snap a few photos of the
scene with a digital camera, stretching out his arm and turning the camera
around to get himself in the picture.

The 44-year-old pioneer isn't just spending his time basking in the glory of
the Web's first decade, though. In his role as director of the World Wide Web
Consortium (W3C), he's also working assiduously for the future of his
invention in the next decade and beyond.

Lately Berners-Lee has found himself talking frequently about his vision for
the next phase of the Web's evolution, what he calls "the semantic Web." The
bits and bytes of the semantic Web contain more than just raw content. In the
semantic Web, meaning itself is embedded in the framework of the Web, and its
infrastructure reflects and communicates the relationships among Internet
resources. The W3C is working hard to create and promote base technologies
for enabling the semantic Web, including XML and the Resource Description
Framework (RDF), a "metadata" framework that allows semantic relationships to
be expressed in structures that can be read and processed by computer
programs.

Exactly what form Berners-Lee's semantic Web might take is still hard to pin
down, but it has the potential to radically change the way people and
machines interact with the Net and with each other. And that, in a way, could
be just what prevents it from taking root.

Net companies boast about being open-minded and embracing change. But huge
segments of today's Web business - the existence of the search engine
industry, for example - depend on the Web remaining mired in its current
limitations and peculiarities. The leaders of the first Web revolution might
not support a second revolution that will render their businesses obsolete.

So you have to wonder: Is this guy thinking a little too optimistically? Is
the semantic Web a little too "out there" for the mainstream to grasp? Is he
setting himself up for disappointment?

And then you remember just what happened the last time Tim Berners-Lee had a
big idea.

IW: How much does the W3C depend on you personally - how much do you actually
get a chance to shape what's going on? Are you satisfied with that level?
Would you rather be doing more? Would you rather be doing less?

Berners-Lee: Yeah, yeah, yeah, respectively. Sometimes I feel I should be
doing more in the way of leadership; sometimes I feel I should be doing less.
There's a constant balance.

My contribution in terms of leadership tends to be on the team-building side,
of setting it up - mutual respect being very, very high. The respect for
whatever anybody does, whatever their role, is really high up on the list.

And also there's some leadership from the point of view of the technical:
Simple is good, trying to design your piece not so that it takes over as much
territory as possible, but so it claims as small an area as possible, does it
very, very well, very cleanly - those sort of principles of design. Those
things I feel I have learned from a bit of experience. I've done consultancy
with different kinds of software companies doing software in different
environments.

IW: The New York Times has described you as the father of the Web, and the
Web as your child. How accurate is that metaphor? What kind of sense of
personal responsibility do you feel to the Web now?

Berners-Lee: Well, to the technology a certain amount, but to the content,
none. At Internet World you don't have to point out that it's somebody else
that wrote it all, but sometimes the sort of people that you bump into in the
street - there was one guy who called me. He mailed me first to say that he
was kind of angry that I was put forth as the person who created the Web. He
gave a phone number. The message is just one of these strange messages, so I
thought, let's track this one down. I called him back, said, so what's the
problem? He said, "Well, I think it's just absurd to imagine that you
could've typed all that stuff in."

So I don't feel any responsibility for what's out there. That's really
important, that no one person feels any responsibility for the content. No,
that would not be the spirit of it. But I do feel, I suppose, a sort of
parental concern that it doesn't go to pot - that it doesn't, for example,
get fragmented into many Webs, that it isn't taken over by one controlling
concern, which would lead to the Internet as we know it to be destroyed as a
sort of concept of free information space.

IW: You also talked in your book about some of the design principles: You
wanted the Web to be something that's decentralized, that doesn't need a big
central index or something. Is there anything else, looking back on it now,
any other political or moral principles that the Web either leads to, or that
it follows from, or that are closely tied to its structure?

Berners-Lee: Simplicity and modularization in the design - those are things
just from the world of software engineering. And then decentralization comes
from the world of the Internet, and tolerance is an Internet maxim. Be
conservative in what you do, and liberal in what you expect.

If you have to add anything from the Web era, then it's the test of
independent invention. I have to find a one-word-like phrase for it, but it's
this idea that if you design a system - whether it's a political system, or a
technical system, or something on the Internet - you should design itů. Just
imagine if somebody else on the other side of the planet is designing the
same system and they'd use the same philosophy as you have, but all the
arbitrary decisions are being made differently. So they call things a
different name. So they've made protocols that use different terms but do the
same thing. What happens when these two systems meet? The test of independent
invention says if the independently invented system can interwork with yours,
then you succeed. If you've built something in the system so your system, for
example, has to be dominant, has to contain the link registry, or it has a
central concept of what is high quality, or it just constrains some arbitrary
point - everybody must drive around in green cars - and the other one
constrains that everybody must drive around in red cars, then you'd have to
draw a boundary, have a big battle deciding whether to drive around in red
cars or green cars. So that principle I came across in small ways in software
systems before the Web, but it's very important on the Web. You've seen new
systems that people propose for electronic commerce, and it's a good test to
try in your head.

IW: You've talked about the danger that one or a few forces are going to
seize the Web and take control. You also talked in your book about how
vertical integration might pose a danger as well - how deals between hardware
and software mkers, ISPs, and Web sites can bias the Web experience. How is
the W3C trying to help prevent that - if it is - and what can other
institutions do to help stave that off?

Berners-Lee: The consortium only works in a very specific area. That is the
technical interoperability, the evolution of the technology. So we are not
doing anything about monopoly division or the threats of vertical
integration. Obviously, the decentralized technology tends to create a
multicentralized society, but it doesn't guarantee one at all. You can use a
decentralized technology to build a very, very centralized totalitarian
regime.

It's not for the consortium to do something about that; it's for the existing
political process. It's for people to take to the streets, if necessary, to
do something about that. When it comes to, for example, the fear about
vertical integration - the fact that you'll end up with a biased IP supply -
the bias is for consumers to do something about. It's for consumers to elect:
Rather than just reading the free newspapers that come through the door,
people [should] go out and pay money for a newspaper they feel will give them
unbiased coverage. Or at least they select the particulars from a great
choice.

IW: What role does the government have to play in the Web and the Internet,
if any?

Berners-Lee: Well, it's very easy to say that first of all, it should get out
of the way, but that kind of ignores the fact that there are some places
where you need governments. You need governments where there is something
that must be managed on behalf of the whole population. So for example, the
DNS [Domain Name System] space. The whole DNS debacle was, for example, a
lack of governing statute for the root of the Domain Name System. The domain
system is a single, very valuable resource. When American government decided
they'd move it over to industry, they were ignoring the fact that this was
like giving the
 control of the dollar, the currency, to industry. When you have something as
fundamental as that, then you have to really make sure that the government's
for the people, by the people. And I hope that whatever comes out of this
ICANN [Internet Corporation for Assigned Names and Numbers] system eventually
will do that.

Then it will be very boring again. Because generally, the government's
something that has to be very slow, bureaucratic, and boring, because it's
not where the action is. That's just the infrastructure.

There's a certain amount of consumer protection. I believe in consumer
protection. I'm a European. In Europe, there are data protection laws. I feel
America does not have enough protection of privacy. There's always the battle
of the consumers vs. the corporation. The consortium has a P3P [Platform for
Privacy Preferences] project for negotiating privacy between a client and a
server. But if you think about a site that is going to abuse privacy by
sampling your information, finding out really what you like, selling it to
people who you wouldn't want to know that sort of information - we have all
the technology in the world. But if a rogue site doesn't use it, there should
be some legislation to make a default that you must respect privacy, I feel.

Then, if the default is that privacy is to be respected, then you can
negotiate and you can agree to give away certain details in return for
getting a lot better advertising, better service, but you'll be able to know
where you go.

IW: You mention P3P, and there's also PICS [Platform for Internet Content
Selection] and RDF. These are three technologies that you promote a lot in
public and in your book, but it seems that the industry hasn't really picked
up on those as readily as they've picked up on XML, for instance.

Berners-Lee: Well, XML is a basic notion. So is RDF. It's very easy to see
how you use XML. It's more complicated to see how you use RDF, because RDF
works at a higher level of information. In fact, RDF is used by so many
things that RDF, by one name or another, has got to happen. If it doesn't,
not having commonality at that level would be a shame.

PICS is an interesting case, because it was brought out largely in response
to the CDA [Communications Decency Act]. When the CDA was overturned, people
felt that PICS had done its job as a demonstration. You can buy a lot of
filtering software, and it's not online, it doesn't use the PICS protocols a
whole lot, and in fact I think the PICS labels could die out. But the concept
I wanted to explain in the book was the concept of being able to make a
statement about somebody else's information. PICS was the first attempt at
formalizing that.

RDF is just a generic structure for data on the Web. But in the future, if
you look at the Dublin Core [a scheme for metadata], for example, that's a
really strong community of people using RDF. The need for information about
information, I think, is really interesting. If you talk to people on the
street, they're concerned about "What about all this junk? How do I determine
the junk from the rest?" It's like the initial Web - it was a chicken-and-egg
situation. It isn't until they've got it better figured out that people will
have the tools to make it manageable. So, yeah, the takeoff of anything like
that is difficult.

In a way, we hoped that PICS would be a foot in the door for metadata. Didn't
happen like that, because there's no business model for labeling services,
quite simply. When you deploy something, you've got to find a place from here
to there; you have to find a path where somebody has the incentive to take
each step, even though they may be going in very odd directions. For the Web,
there were these sort of initial stages - of getting the phone book [an early
Web application] up, to get people to get the client from CERN. So with
metadata, we've got to find the same thing. But once we get a core set of
metadata about the people rating sites someplace, then I know search engines
could be really interesting. There are some RDF search engines running, ready
for when this will be out there, ready to give a very highly enhanced
service.

IW: Another issue is the decentralized nature of XML and RDF schemata, and
how they're going to come about. Without any kind of central governing
authority to handle them from the top down, they're going to need to be built
from the bottom up, and mesh across industries. How do you think that's going
to come about? How are we going to get competing interests to cooperate on
common vocabularies?

Berners-Lee: Again, this is, if you like, the deployment question. I think
initially
what you will find is that people will expose existing data to different
systems -
of which there will be no deliberate commonality, but there will be some
academic commonality. You'll have databases which, for example, share the
concept of ZIP code. When it's a ZIP code, well, ZIP code is pretty
straightforward. It's a five-digit number. So you could basically make the
retrospective assertion - this column in this database has the same
significance as a column in this database. And this row is some sort of
address or location. And so it makes sense to do a join through there.

I think what will happen is, in order to run a particular application,
somebody will
 annotate the fact that there is a relationship, maybe within their company.
For example, we have a database of contacts, and you have a database of
employees. And in fact there is some relationship between the two databases,
even though
they're stored on different systems. So that you can, if you're looking for
people in different ZIP codes, sometimes you want to merge the two databases.
You want to do a search across them both. And you can enable that just by
making a single link, making a few of these semantic links between the
databases. So if we have tools that allow you to do that, and then as a
result operate from two databases, then initially it will have to be pushed
by people driving particular applications, particular queries. Then after a
while, for example, everybody will end up linking to the IRS's definition of
ZIP code. It'll be a very, very large number that have got this common
concept, all these semantic links that basically, when they join together,
you'll find some very, very large concepts dying to get out. And then it'll
change the whole business of being in this semantic Web. Instead, when you
create a form, you'll browse the Web for defining meanings. So you won't just
seed a new concept that you've developed - you'll pick it out of a menu of
your favorite form program, or you'll go browse the most interesting ones
from somewhere else. But to save yourself the trouble of defining things,
you'll always use somebody else's definition. Unless you're really creating
some unique concept in your own company.

So then you'll be really taking advantage of the Web. And then the moment
you're running an application, suddenly you'll be able to ask questions that
are very open-ended, that involve going out there into the big wide world.
Instead of getting back these crazy search-engine answers that are just
totally off the wall, every answer will be mathematically provable to be what
can answer the query that you asked. Then it will take off. And it will be
the next revolution. But exactly which is going to be the killer app to get
it there, which is going to be the first link that people make, I really
don't know.

IW: Do you think that users share that same view of what the Web could be,
intuitively, or do you think that it's something that needs to be handed to
them?

Berners-Lee: I think the semantic Web, if you talk about this whole idea of
how meaning can be formed from the grass roots, I think in a way that's a
really difficult philosophical thing for people. Even though it's how natural
language works; it's how people work. In reality, everybody uses terms in
their own way, with all their differences: You say "coffee" - to different
people, maybe different things have different significances, but everybody
imagines that behind the use of the word "coffee" there's some platonic idea,
that there's some correct encyclopedia definition of what coffee is. And in
fact, of course, somehow, everybody is trying to clone in their own mind a
completely consistent set of perfect definitions to work with. Logicians have
been trying to do that for the last century. They've been trying to form one
completely consistent logic. And then computer scientists are trying to
[create] ontologies - descriptions of knowledge in which there was a clean,
fit, perfect definition of what coffee is.

IW: What else presses on your mind that you haven't convinced everyone of
yet?

Berners-Lee: The things I think I'd like people to think about is the
information bias. And I think it's good to focus people on the concept of the
quality of information. Yes, it's difficult to persuade people about the
philosophy of the semantic Web, but on the other hand I wouldn't want to
persuade everybody. We don't have to do that. The average person using the
Web doesn't have to do that. The average person using the Web isn't reading
Internet World, mind you.

But keeping the medium pure - the Internet a cloud, rather than something
that is reacting to me and trying to lead me in particular ways - is a pretty
good message for the moment.

Sound Off

Will the Internet industry ever pick up on Berners-Lee's plans to build
meaning into the architecture of the Web? Respond to letters@iw.com

... ... ... ... ... ... ... ... ... ...

Sally Khudairi, ZOT Group
<sk@zotgroup.com>
+1.617.818.0177
http://www.zotgroup.com/


New Message Reply Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Jan 19 2000 - 15:03:03 PST