Ericsson's WebOnAir distillation proxy

Rohit Khare (rohit@uci.edu)
Wed, 7 Apr 1999 16:30:19 -0700


This proxy server from Ericsson is aimed at PC-like browsers:
laptops, WinCEs, Palms, screenphones. It's not suitable for
"microbrowsers" at the lowest-end, to be sure. It just does all the
"obvious things" for wireless web use: tidies the markup (pace Dave
Raggett), strips javascript, downsamples graphics, and gzips the
whole stream.

Rohit

===================================================
[blame MS IIS ASP for the ugly URL: for some reason the feature
overview requires a "login session"]

http://mobileinternet.ericsson.se/emi/pagegen/software/emi_gen_softwar
e_WebOnAir.asp?n=3028&sid=89382759&ua=3&sv=6&mv=237

WebOnAir Client Version 1.x

...For typical on-line newspaper pages, using the WebOnAir Filter
Proxy over a wireless connection (GSM, D-AMPS, etc.) will give you up
to five times faster download.

In the case of being wireless connected using your mobile phone, this
would roughly mean that your 9.6 kbit/s connection is boosted up to a
speed which is close to or just as good as if you were using a normal
(28.8 kbit/s or 33.6 kbit/s) modem on a fixed network.

When using the WebOnAir Filter Proxy on a fixed network, the average
page will be downloaded at least twice as fast as without using the
filter proxy.

Note: The reduced download times varies a lot between different types
of web-pages. For some web-pages the gain can be close to nothing,
but for others it can be a gain of 10 times faster or even more.
Furthermore, the gained download time also depends on the type of
device and processor, the faster processor on the client side, the
faster download time of a web-page. This means e.g. that a laptop
with a Pentium processor is always faster than any Windows CE device.

...

* HTML filtering
- White spaces
- Background images
- Comments
- META tags
- Java
- JavaScript
*Distillation of images
- File-size reduction (i.e. reduction of the byte-size of the image)
by reducing the quality ratio of GIF and JPEG images
- Colour conversion to grey-scale
- Format conversion from GIF to JPEG
- Conversion of animated GIF to static image
* Compression and decompression of HTML
* Compression and decompression of downloaded documents of various
types such as ASCII, MSWord, PowerPoint, PostScript, etc.
* User defined configuration of the features

Supported Clients

The WebOnAir Filter Proxy 1.0 officially supports a number of
different client operating systems (i.e. H/PC and PC operating
systems).

* Windows 95
* Windows 98
* Windows CE 2.0 - Hitachi SH-3 processor
* Windows CE 2.0 - MIPS processor
* Windows NT 4.0
* EPOC32 Version 4 (Client will be available in April 1999)
* EPOC32 Version 5 (Client will be available in April 1999)

The software for these operating systems has been developed and
tested by Ericsson.
Note: EPOC32 is the operating system used in Psion5 and coming
devices from the Symbian partners Ericsson, Nokia, Motorola, Philips,
and Psion as well as from other vendors.

Unsupported Clients

The HTTP 1.1 compliant interface will be made public, meaning that
client software can be developed independent from Ericsson. For an
interface specification, see the Open Interface Specification.
Some unsupported versions of the client proxy are available from the
WebOnAir Download web page, others might be found elsewhere on the
Internet.
Currently, the following unsupported clients are available:

* Linux
* Solaris

These clients are not tested by Ericsson and are therefore totally
unsupported by Ericsson. The interface specification between the
client and gateway proxy is supported, but not any implementations
made using this specification.
The specified interface will also allow web-browser developers to
include the client functionality in the browser itself.

Functional Description

HTML Filtering

HTML filtering means that a filter process cleans up the code of
unnecessary information. How much code that can be removed or
optimised depends heavily on the author and/or the program the author
used to generate the HTML code.

When the code is processed and parts of the code are removed or
optimised, less code needs to be compressed and sent to the receiver.

The user can decide if and what to filter out from HTML code when
setting the configuration for the WebOnAir Filter Proxy, see User
Defined Configuration of the Features (Client).

In the first release of Ericsson's WebOnAir Filter Proxy, the
following reduction of the HTML code is made in the gateway proxy:

White Spaces

White spaces are those characters in a character set, that produce
"white" on white paper. In other words, that produce nothing but a
horizontal or vertical movement of what is on paper or in a computer
file. This includes the space and tab characters, the
carriage-return/line-feed and the form feed (new page) characters.

When it comes to web pages written in HTML, there are actually two
types of white spaces: The ones later displayed in the browser
(spaces between words, empty lines, etc.), and the ones not used for
display.

The later type of white spaces are possible, since HTML as a language
allows a web page author to use white space characters in certain
areas to improve the readability and formatting of the raw HTML data
itself. This type of formatting is done by the page author to keep
the HTML code readable to him/her. And, as said, does not contribute
to the representation of the page. Thus it is save to reduce this
specific type of white spaces from an HTML page, without changing its
representation.

Removal of such white spaces makes sense, since there is no
difference in transmitting a normal character or a white space
character.

Background Images

Background images are quite common on the WWW of today. Sometimes
they are just heavy cosmetic and do not give any extra information
but sometimes they are just as much part of the information as the
text presented on the page.

It is also quite common that text on top of the background images is
written with a colour that is not necessarily visible if the
background image is removed.

For laptops (Windows 95/98/NT4), the user of Ericsson's WebOnAir
Filter Proxy has an option to filter out background images or not.

If you choose to remove the background image, the WebOnAir Filter
Proxy will simply not transmit the image. Any handling of default
background colours are handled by your web-browser.

If you choose to not remove the background image, this background
image will be distilled just as any other image on an HTML page.
Often this is just as effective for the download time as if you would
remove the background image.
For H/PCs with Pocket Internet Explorer, background images are always
removed since the browser does not support background images.

Comments Added by the Author

Authors of HTML are free to add their own hidden comments to the
code/text. Such comments can have great value to the author
him/herself, but only add to the overhead of information to be sent
over the WWW (i.e. the comments are of no value for the web-browser
when displaying a HTML-page).

Some comments contain hidden scripts in JavaScript. The WebOnAir
Filter Proxy can preserve such comments in order to ensure that the
page remains fully functional.

Comments Added by the Program Generating the HTML

Just as the author is allowed to add hidden comments, some HTML
editors can add internal comments when generating the HTML code/text.

These comments can also be safely removed since they are not needed
by the web-browser to display the HTML-page.

Superfluous META tags

So-called META tags are included in an HTML document and used by
search engines to categorise a web site. (The META tags are included
in the header of the HTML document).

Most META tags can be safely removed since they are not needed by the
web-browser to display the HTML-page. Removing META tags is not
visible to the user reading the text, but this improves the
transmission time since less data needs to be compressed and sent.

Java

HTML pages can include Java applets by referring to the appropriate
Java applets. As opposite to JavaScript, an applet's programming code
is not embedded in such an HTML page, but only the applets name, and
additional parameters to start the applet are contained in the page.

If a web browser is used which can not run Java applets, or if it is
in general not intended to run Java applets, then these references
and the parameters can be removed from an HTML page by the WebOnAir
Filter Proxy.

JavaScript

JavaScript is a small programming language, and JavaScript programs
can be embedded into HTML web pages to enhance the web page contents.
The WebOnAir Filter Proxy can remove JavaScript from HTML pages. This
is desirable if the web browser in use is not capable of running
JavaScript, or if the user doesn't want to run JavaScript programs.

Distillation of Images

Image distillation means that any GIF or JPEG image on an HTML page
is processed to reduce the file-size of the image and, therefore,
reduce the download time. The following distillation methods to
reduce the file-size can be used:

* Reducing the quality factor of JPEG images
* Reducing the quality factor of GIF images
* Format conversion from GIF to JPEG with a defined quality
reduction of the JPEG image (if this is more effective than just
reducing the quality ratio of the GIF image)
* Colour to grey-scale conversion
* Conversion of animated GIF to static image

Distilling images result in a quality reduction. The Ericsson
WebOnAir Filter Proxy is using the lossy compression method JPEG
which means that image information removed by the gateway proxy is
lost and cannot be restored on the client side.

Depending on which quality factor that is used, more or less
information is removed from the original image.
Another distillation method which can be used is to convert so-called
animated GIFs to static images. This is done by freezing the first
image in the animated GIF and present it as a static image. You
cannot convert animated GIF to animated JPEG (so-called MJPEG).

How the user wants to distil images can be configured using the
configuration feature, see User defined configuration of the features
(Client).

Compression and Decompression of HTML

When the HTML code/text is filtered it is compressed on the gateway
side. The compressed information is sent to the client and when
received at the client, decompressed and displayed in the web-browser
on the client.

The advantage of the WebOnAir Filter Proxy concept with a
gateway/client solution is, that compression and decompression are
totally transparent for the user. The user can continue to use
his/her preferred web browser, and no web server has to be changed.

Experience has shown, that the compression algorithm used by the
WebOnAir filter proxy is almost always more efficient than the
typical compression mechanisms used in communication equipment like
modems.

Compression and Decompression of Documents

Ericsson's WebOnAir Filter Proxy does not only compress and
decompress HTML code but also a large number of different text and/or
document types like for example ASCII, MSWord, PowerPoint,
FrameMaker, and PostScript files.
The proxy does not handle already compressed document types such as
PDF, see Limitations.

User Defined Configuration of the Features

The user can configure to which level he/she wants to filter and/or
compress the information. An easy accessible configuration page on
the client side guides the user in configuring different user
profiles.

The configuration information is stored locally on the client, which
means that the user can set different profiles when he/she is
off-line. When the user wants to access the WWW, the configuration
profile is sent to the gateway proxy.

The user can easily switch between different profiles as well as
selecting a transparent mode which will leave all data unaltered.

The reason for having different profiles is that the user might
prefer to have different grades of compression and/or image
distillation depending on the web-sites to visit.
The following HTML Filtering features can be configured to what the
user prefers:

* HTML Filtering of code not representing visible information
i.e. white spaces, comments, and META tags.
* Background images
* Java
* Java Script

The Distillation of Images can be configured with a higher
granularity. For images, the user can set the image quality deduction
as well as if colour images should be converted to grey-scale.

Compression of HTML and documents is always active i.e. the user
cannot deselect this option unless going in to the so-called
transparent mode when no processing at all is made on the web content.