Turning the Web Into a Data Source
By Leslie Marable
What if the Web could expose its information to any application in
any way? This is the concept behind _webMethods Inc._ of Fairfax, Va.,
which has defined an eXtensible Markup Language (XML that can directly
address any HTML document or set of documents as a data source.
WebMethods' Web Interface Definition Language (WIDL is at the heart
of the webMethods Web Automation Toolkit, a Java class creator that
"reads" a document's structure, transforms it into its constituent
objects, and stores these objects in a repository for use in any Java
These classes can be called from existing business applications to
get data from a Web site, which lets the developer bypass the browser
and treat the Web as a data source. As a Java Bean, these classes can
be incorporated into COM- or CORBA-based applications through the
Java-ActiveX bridge or other interobject communication schema. Beans
support is coming in a future release, according to Charles Allen,
vice president of business development at webMethods.
The Web Interface Definition Language (WIDL consists of six
XML-compliant HTML extenders that define a universal schema for HTML
documents based on the Document Object Model (DOM as it is being
defined by the _World Wide Web Consortium_.
Currently, the Automation Toolkit parses the HTML level supported by
_Microsoft_ and _Netscape_ 3.x on top of the W3C's HTML 3.2
specification (what will be known as DOM level 0.
By dealing with HTML documents as a set of abstract objects, the
Automation Toolkit can extract data from any place in any document at
an extremely detailed level. For instance, defining a news story as a
headline, a byline, and body copy organized as paragraphs, an
application designed to monitor the Associated Press wire could
enumerate, manipulate, and extract the specific paragraphs containing
the word Zaire in any story by a specific reporter. It could then pull
that story, paragraph, or whatever into a local application.
The company added native support for the Secure Sockets Layer 3.0 de
facto standard earlier this month, enabling secure transactions with
any Web-based source from nonbrowser applications.
Web developers working for webMethods clients, including _DHL
Worldwide Express_, Texas Instruments, and the U.S. Postal Service,
are building applications used to automate the exchange of data from
suppliers, business partners, and customers.
Mark Lussier, the lead software engineer for DHL's customer
automation division, said he used the technology to build a
single-source package tracking facility. "In the case of [package]
tracking, the Web results that a user sees are different depending on
the current situation of the package. If it's been delivered, there's
one response, if the package is in transit, there's a different
response," he said. "The webMethods Toolkit allows us to handle all
those conditions with one piece of code, so we didn't have to write
code for each variation of the page." Ultimately, Lussier said, the
Toolkit saved him time and saved DHL money, because Netscape priced
its proposed solution for the same task at $27,000. "I did the same
thing in two hours for one-tenth the cost." A 30-day evaluation
version of the complete Web Automation Toolkit can be downloaded from
the company's Web site. A beta version of Web Automation Toolkit 2.0
is also available. Individual developer licenses for the Toolkit begin
at $995. Server run-time licenses begin at $2,495. Server run-time
licenses with the SSL plug-in begin at $4,995.
The SSL plug-in requires the _Phaos Technology Corp._ SSL libraries
and an encryption license from _RSA Data Security Inc._
_ Reprinted from Web Week, Volume 3, Issue 11, April 21, 1997