End User's Corner - October 1996


Jack SolockSearching the Internet Part II

Subject Catalogs, Annotated Directories, and Subject Guides

Jack Solock, Special Librarian


October 1996


The Internet is about computerized information made readily available at fantastic speeds to people all over the world. It promises an incredible increase in the transmission of information through the passage of bytes from computer to computer. It's automated, and it's fast!

Ironically, one of the most difficult things about using the Internet for research is finding the information you need. Last month (http://scout.cs.wisc.edu/scout/toolkit/enduser/archive/1996/euc-9609.html) we discussed automated search indexes as a one way of finding Internet information. However, as we will see, there are also Internet search guides that are manual. The subject directories and hierarchies people maintain are, for all their shortcomings, far more powerful to users (especially new users) who are asking the question: "What can I find on the Internet about history, or economics, or women's studies, or medicine?" or any of hundreds of subjects.

Automated search indexes are poor at answering these questions because they provide little organization or structure to the results they spit back in response to a query. The search indexes receive a relevance score, but few points for organization and structure. The structure and organization of resources which have always helped traditional library users are available in other kinds of Internet search tools, called subject directories.

This month's column will be devoted to a discussion of subject directories. Subject directories are categorizations of Internet resources meant to be browsed, although most can also be searched. As we discussed last month, search indexes are collections of Internet links, built by "spider" programs that automatically deposit links in a searchable database. Subject directories, on the other hand, are produced and maintained by people, and resources are collected by either resource-owner submission or selection by librarians, editors, or subject specialists. Most of these directories contain search interfaces, but they are often more rudimentary than the ones discussed in last month's column, serving instead as a gateway to a subject hierarchy which the user can browse for information about a topic.

The main difference between subject directories and search indexes is the level of human intervention in the creation of the directory. Human intervention filters and classifies resources so that busy researchers can quickly find what is of use to them, rather than searching every page of hundreds of thousands of sites. These directories (except for the very largest ones) contain far fewer resources than search indexes. However, this can actually be advantageous to the user. There is much less "chaff" to cut through to obtain the "wheat."

As with all things human, each directory is unique, with its own set of advantages and disadvantages. Which one is best for you is a personal preference, but we will point out some of the better ones.

We will categorize subject directories by the amount of human intervention. The categories are subject catalogs, annotated directories, and subject guides.

A "subject catalog" is very much like a library subject card catalog. Users look in the catalog under the subject heading that they are interested in and find resources.

An "annotated directory" has resources listed in a subject hierarchy, but each resource is further analyzed by an editor, librarian, or subject specialist. It is then annotated to give the user a more detailed idea of what the resource is, and, in some cases, rated based on an established set of criteria.

A "subject guide" contains a still deeper level of human analysis, in that a person or persons (editors, librarians, or subject experts) have filtered resources in a single subject and created a guide (sometimes annotated) to that subject. Implicit in the notion of a guide is that its resources will be of high quality because of the amount of filtering and the level of expertise of its author. Having a set of these guides at one site would give users the highest level of filtering and analysis, and thus the highest quality resources.

Eight directories that fall into these categories are old Internet veterans, well established and respected. Looking at them categorized will help users decide which one to use. The directories, arranged by type, are:

Subject Catalogs

Yahoo:
http://www.yahoo.com/

Search help:
http://www.yahoo.com/docs/info/help.html

Features:
http://www.yahoo.com/docs/info/features.html

Bulletin Board for Library Systems (BUBL) -- Universal Decimal Classification (UDC):
http://link.bubl.ac.uk/

BUBL -- alphabetical:
http://bubl.ac.uk/link/subjects/

BUBL search:
http://link.bubl.ac.uk/isc1

BUBL search help:
http://link.bubl.ac.uk/isc3

Galaxy:
http://www.einet.net/

Search help:
http://www.einet.net/howto.html#SEARCH

Galaxy information:
http://www.einet.net/about.html

Annotated Directories

McKinley's Magellan:
http://www.mckinley.com/

Search and ratings help:
http://www.mckinley.com/magellan/Info/advancedtips.html

Information about Magellan:
http://www.mckinley.com/feature.cgi?faq_bd

Lycos Top 5% (formerly Point):
http://point.lycos.com/categories/index.html

InterNIC Directory of Directories:
http://www.internic.net/ds/dsdirofdirs.html

Search help (Harvest):
http://ds2.internic.net/Harvest/brokers/queryhelp.html

Sample queries (Harvest):
http://ds2.internic.net/Harvest/brokers/dod/sample_queries.html

Search help (WAIS):
http://www.internic.net/ds/dsdirofdirs.html

Click on "Search help" for the WAIS search engine.

[Note: When last checked by the Internet Scout team, all of the above InterNIC Directory of Directories site URLs were no longer available.]

Subject Guides

Argus Clearinghouse (formerly University of Michigan Clearinghouse for Subject Oriented Guides):
http://www.clearinghouse.net/

Search help (very rudimentary):
http://www.clearinghouse.net/searchtips.html

Ratings Guide:
http://www.clearinghouse.net/ratings.html

Collection Development Policy:
http://www.clearinghouse.net/submit.html

World Wide Web Virtual Library Subject Catalog:
http://www.w3.org/pub/DataSources/bySubject/Overview.html

Category Subtree:
http://www.w3.org/pub/DataSources/bySubject/Overview2.html

The directories and their features are presented in a table for your convenience. We will not discuss the intricacies of using their search engines. Interested users should use last month's column as a guide. We will discuss certain features of these directories to help users analyze which ones are most applicable to them.

These features should help you to determine the amount of filtering and quality analysis that has taken place in each directory. Some of the features you should look for in a directory are:

Search capability:

Most, but not all subject indexes have this and it is crucial, especially with the larger, multi-hierarchy indexes. The reason for this is that each subject index uses different subject terms and arranges subject hierarchies differently. Hierarchies are, in all examples but one (BUBL) "home grown," and arbitrary. Which subject hierarchy a resource is listed under is also arbitrary (again, except for BUBL). You may or may not find the same resource under the same term or hierarchy in Yahoo and Galaxy, for example. In this case, you use a two-step process: first, search the index to find the subject terms your query is listed under, and then browse that (or those) categories for more resources.

Site discrimination:

Does the directory choose what it thinks are quality resources, or does it take almost anything submitted and place it in the subject hierarchy?

Rating:

Are resources rated? Rating acts as a filter to alert users to whether the resource they are looking at is of high quality, and how good (in the opinion of the rater) it is. Ratings are of course subjective, and depend largely on the following.

Rating system:

What is it? Is it "good or bad," "four star," "35 points," etc.? You must know what the rating system is in order to determine what it means.

Rating criteria:

This answers the question "What qualities does 'four star' represent?" Most rated directories use similar criteria, but they should state those criteria clearly.

Who rates:

Since ratings are subjective, it is always helpful to know who is doing the rating. Most of the subject indexes that rate are fairly obscure when it comes to actually identifying who does the rating. The terms "editor" and "writer" are often used.

Site annotation:

The more site annotation available, the better, because it tells the user that someone has analyzed the site long enough to summarize its contents. The annotation should give the user a concise idea of site content before he or she connects to it. If an index discriminates in the sites it contains, and annotates those sites, the user has the benefit of a double filter, and thus has a better chance of finding quality resources.

Who annotates:

It is best to have subject specialists or trained information specialists analyze a site. However, the user must always make the final judgement of the site based on its content.

Analysis:

The best subject indexes are those with the most human intervention. They intervene in discriminating which sites they pick, rating those sites, and annotating them. However, in the case of the Argus Clearinghouse, the entire process of site selection for their subject guides has been given over to subject specialists, allowing for a level of site discrimination that (although guide quality varies from subject to subject) makes their guides the place to start when looking for subject-specific information.

McKinley's Magellan is the best annotated directory because of both the number of annotated sites, and the level of annotation of each site.

While Yahoo is the most comprehensive subject catalog, it takes almost anything submitted and puts it into a hierarchy that is difficult to navigate without prior searching. It straddles the line between subject directory and search index, and many people use it both ways. A better, although much less comprehensive subject catalog is the Bulletin Board for Libraries (BUBL). Its producers provide the catalog in both Universal Decimal Classification and alphabetic subject format. Its selectors are librarians, and while this does not guarantee excellence, it does guarantee that people whose job it is to select and categorize information are doing that job.

You may not agree with these picks, or may feel there are better subject directories on the Internet than the ones discussed here. The point is to find the directory that is best for you, that consistently provides you with the best resources, and then use it. This quick comparison will show you that these directories, because they have different strengths, can be used in combination to provide better results. Yahoo, Galaxy, and the Internet Directory of Directories contain lots of resources but little filtering. Magellan and the Lycos Top 5% give high ratings to very different kinds of resources. Argus Clearinghouse and W3C Virtual Library produce entire guides on single subjects. The important thing is to know what you're looking at when you look at a subject directory.

As with search indexes, subject directories have inherent problems. The above-mentioned problem of arbitrary and uncontrolled hierachies is the biggest. It is sometimes difficult to determine who puts resources where in the subject hierarchy--the resource submitters or the owners of the directory.

Selecting or not selecting a resource, rating it, and annotating it are very subjective processes. Because Magellan gives a site 28 points out of 30 ("four stars"), does not guarantee the site is a quality site for every user. That determination must be made by the user.

However, the fact that resources have been categorized, and in some cases selected, rated, and annotated, means that users are likely to find more quality resources in these directories than by searching an automated index. Which directory contains the most quality resources? Which contains the highest quality resources? That is for the user to determine. Users must determine quality much more on the Internet than in other avenues of publication because the filters that have long existed in those avenues do not exist at this time on the Internet. This cam be good and bad. It is good in the sense that the Internet can be a publishing avenue for information that normally wouldn't make it through publishing filters. It is bad in the sense that those publishing filters have long been perceived as quality filters as well. The Internet has been criticized for having a lower quality of information. How does the user determine the quality of an information resource in a networked environment? We turn next to that question.

For more information on subject directories, see the Scout Toolkit: http://scout.cs.wisc.edu/scout/toolkit/searching/

Table 1. A comparison of filtering features for eight Internet subject directories.

Search Site Descrimination Site Ratings Ratings System Rating Criteria Who Rates Site Annotation Who Annotates
SUBJECT CATALOGS
Yahoo! Y N Y Glasses Icon Presentation/Content Editors Y (Brief) Submitters
BUBL Y Y N N/A N/A N/A Y Librarians
Galaxy Y N N N N/A N/A N/A N/A
ANNOTATED DIRECTORIES
Magellan Y Y Y 30 pts. - Content Depth

- Organization

- Net Appeal
Editors Y Editors
Lycos N Y Y 50 pts. - Content

- Presentation

- Experience
Editors Y Editors
InterNIC Directory of Directories Y N N N/A N/A N/A Y Submitters
SUBJECT GUIDES
Argus Clearinghouse Y (Info pages, not Guides) Y Y 1 - 5 check marks - Level of resource description

- Level of resource evolution

- Guide design

- Guide organization

- Guide meta-info
Editors Varies by Guide Guide maintainer
W3C Virtual Library N Y N N/A N/A N/A Varies by Guide Guide maintainer


Key:

Y=Yes

N=No

NA=Not Applicable

Directories that list "Y" under annotation do not necessarily annotate every site in the directory.

Note that the Argus Clearinghouse rating system rates the guides, not the individual resources within the guides.


InterNIC News

This article originally appeared as part of the End User's Corner, a featured column of InterNIC News, which was published monthly by Network Solutions, Inc. and InterNIC from May 1996 through March 1998. As of April 1998, End User's Corner will be published by the Internet Scout Project.


Copyright Susan Calcari and the University of Wisconsin Board of Regents, 1994-1998. Permission is granted to make and distribute verbatim copies of the End User's Corner provided the copyright notice and this paragraph is preserved on all copies. The Internet Scout Project provides information about the Internet to the US research and education community under a grant from the National Science Foundation, number NCR-9712140. The Government has certain rights in this material.

Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the University of Wisconsin - Madison or the National Science Foundation.


Internet Scout

A Publication of the Internet Scout Project

Comments, Suggestions, Feedback

Use our feedback form or send email to scout@cs.wisc.edu.

© 1996 Internet Scout Project

Information on reproducing any publication is available on our copyright page.