Best Computer Security Method Overlooked By Industry

Chuck Murcko
Wed, 13 Mar 2002 15:55:27 -0500

"A team of Penn State and Iowa State researchers has tested and rated=20
three =93smart=94 classification methods capable of detecting the =
patterns of entry and misuse left by the typical computer network=20

They found that one, called =93rough sets,=94 currently overlooked by =
industry, is the best.

The researchers report that computer security breaches have risen=20
significantly in the last three years. In February 2000, Yahoo, Amazon,=20=

E-Bay, Datek and E-Trade were shut down due to denial-of-service attacks=20=

on their web servers.

The U.S. General Accounting Office (GAO) reports that about 250,000=20
break-ins into Federal computer systems were attempted in one year and=20=

64 percent were successful. The number of attacks is doubling every year=20=

and the GAO estimates that only one to four percent of these attacks=20
will be detected and only about one percent will be reported.

Dr. Chao-Hsien Chu, associate professor of information sciences and=20
technology and of management science and information systems at Penn=20
State, began the study when he was on the faculty at Iowa State=20

The results were published in the current issue (Vol: 32, No. 4) of the=20=

journal, Decision Sciences. His Iowa State co-authors are Dr. Dan Zhu,=20=

assistant professor of management information systems, and Dr. G.=20
Premkumar, associate professor of management information systems, and=20
Xiaoning Zhang, Chu=92s former master=92s student.

=93No network security system or firewall can ever be completely=20
foolproof,=94 Chu says. =93So there is always a need for a =91watchdog=92 =
patrol the network and signal when an intrusion occurs. Commercially=20
available =91watchdog=92 systems depend on traditional statistical=20
techniques. However, the newer =91smart=92 methods promise to have a=20
significant impact on accuracy.=94

Even the cleverest intruder leaves electronic footprints on breaking and=20=

entering a secure computer data network such as bank, medical or credit=20=

records. The new =93smart=94 methods can collect information from a =
of sources within the network, =93learn=94 the patterns typical of a=20
perpetrator trying to gain a level of control similar to that of the=20
people who legitimately operate the network, and make a reasoned=20
prediction about whether the pattern represents intrusion or not.

The team focused on three =93smart=94 approaches, known as data mining=20=

techniques, namely: neural nets, inductive learning and rough sets. All=20=

three data mining techniques can collect information, =93learn=94 and =
reasoned predictions.

Neural nets and inductive learning have previously been used in=20
intrusion detection and research by others has found these methods to be=20=

successful and effective. Chu notes that rough sets, a relatively new=20
approach, has not been applied to intrusion detection.

The researchers say their study is the first to evaluate and compare=20
multiple data mining methods, including rough sets, in the intrusion=20
detection context.

The researchers report that the rough sets method does not require any=20=

preliminary or additional information about the data and can work with=20=

missing values and less expensive or alternative sets of measurements.=20=

The method can work with imprecise values where a pair of lower and=20
upper approximations replaces imprecise or uncertain data.

It is also able to discover important facts hidden in the data and=20
express them in the natural language of decision rules. A powerful=20
method for characterizing complex multidimensional patterns, rough sets=20=

has been successfully applied in knowledge acquisition, forecasting and=20=

predictive modeling, and decision support.

In their study, the team used data from the privileged program -=FA=20
sendmail, a program in use in virtually every Unix site that has email.=20=

They write, =93The data includes both normal and abnormal traces. The=20
normal trace is a trace of the sendmail daemon and several invocations=20=

of the sendmail program. During the period of collecting these traces,=20=

there are no intrusions or any suspicious activities happening. The=20
abnormal traces contain several traces including intrusions that exploit=20=

well-known problems in Unix systems.=94

The average classification accuracy rate for the three programs was as=20=

follows: rough sets 75.68 percent accurate; neural nets 69.78 percent=20
accurate; and inductive learning 51.16 percent accurate.

In addition, the team found that training the programs on equal amounts=20=

of normal and abnormal sequences leads to better learning and a more=20
accurate classification. Whether the data was represented as binaries or=20=

as integers (neural nets cannot use both), did not significantly affect=20=


They conclude, =93The tremendous growth in the Internet and electronic=20=

commerce has created serious challenges to network security. Advances in=20=

data mining and knowledge discovery provide new approaches to network=20
intrusion detection.=94