Date: Wed, 18 Oct 1995 09:44:54 -0300
From: SAMSAM@VM1.YorkU.CA
To: Multiple recipients of list <inet-news@nstn.ca>
Subject: web SEARCH ENGINES - why Inktomi is so fast.
Parallel computing brings a faster and bigger search engine to
the Internet - UC Berkeley's Inktomi
Berkeley -- Two computer scientists at UC Berkeley have
introduced parallel computing to the Internet to create the
fastest and most comprehensive "engine" now available to search
the World Wide Web.
Called Inktomi, it searches a database of more than 1.3 million
documents on the World Wide Web, a network that reaches around
the world to provide ready access to words, pictures, sound and
video. Inktomi is the largest index of web documents.
Inktomi addresses one of the main problems of the web today: as
the number of documents on it skyrockets it becomes a challenge
to index every one, and time-consuming to search the index. While
Internet surfers happily skip from web site to web site in search
of "cool" links, the Internet's true potential will be felt only
when users can quickly search for and find desired sites.
"It's getting increasingly difficult to find things on the
Internet," says Eric Brewer, an assistant professor of computer
science at the University of California at Berkeley who developed
Inktomi with graduate student Paul Gauthier. "The problem is,
it's very hard to have a large database and get good performance.
With parallel computing you can have larger databases and high
performance. Because we use commodity workstations, we have a
much cheaper solution than anyone."
Parallel computing involves stringing many computers or
microprocessors together to work on a problem simultaneously, a
potentially faster and more powerful method than tackling the
problem with a single large computer.
Inktomi (pronounced "ink to me") is the name of a mythological
trickster spider of the Plains Indians. The search engine can be
found at the web address http://inktomi.berkeley.edu. Brewer and
Gauthier announced the search engine this week, though it has
been up and running since August.
The UC Berkeley scientists are quick to distinguish their
directory, which is a comprehensive index of documents on the
web, from directories such as the popular Yahoo, which is a
select list of web documents more akin to a table of contents.
Yahoo, started a year and a half ago by two Stanford University
graduate students, maintains addresses for perhaps 50,000 of the
most useful documents on the web.
"With Inktomi you can find a lot more things than with Yahoo, but
both are useful," Brewer says. "We're providing a more
comprehensive search engine for the web without sacrificing
speed."
An equivalent search engine is Infoseek, which is as fast as
Inktomi but can accommodate only one-fifth the documents; or
Lycos, which indexes slightly more than a million documents but
is significantly slower than Inktomi.
The new search engine is one of the first fruits of a
collaborative project at UC Berkeley to tie common desktop
computers or workstations - just your average PC - into a
powerful "network of workstations." Dubbed NOW, the project hopes
to harness the power of inexpensive PCs into a parallel computer
with the capabilities of a supercomputer - at a fraction of the
cost.
Brewer emphasizes that parallel computing brings a unique power
to search engines of any kind, whether they are searching a
database of WWW addresses or a library catalogue. The major
advantage is "scalability," that is, as the database increases he
merely adds more inexpensive computers to maintain the system's
quick response.
Gauthier and Brewer built Inktomi using four outdated Sun
workstations, and have designed it so that if three break down,
Inktomi continues to have access to the entire index although at
a reduced rate. This reliability is unmatched by search engines
that operate out of a single computer.
To find and catalogue all the addresses, Brewer and Gauthier
developed a web crawler that periodically looks for new
addresses. Here too parallel computing is important. Taking
advantage of 32 networked computers within UC Berkeley's computer
science building, Soda Hall, they relegate to several at a time
the task of discovering new web sites, often while the computers
are being used by others.
Access to Inktomi is supported by the NOW project in the Division
of Computer Science of the UC Berkeley College of Engineering.
###
Eric Brewer can be reached at (510) 642-8143, or
brewer@cs.berkeley.edu
Paul Gauthier can be reached at (510) 642-9435, or
gauthier@cs.berkeley.edu
>>UC NewsWire<<