End User's Corner - January 1997

Jack SolockFTP or not FTP? That is the Question

How to Drive the Information Highway With an Eighteen Wheeler

Jack Solock, Special Librarian

January 1997

When we do the Scout Report (http://scout.cs.wisc.edu/scout/report/, we like to think of ourselves as guides, allowing users to start up their information vehicles and ride down the highway, perhaps stopping here and there to load small items into their trunks if they feel the need. The web is like that. It is a very pretty road to travel, and has many beautiful and useful sites to see. It is analogous to taking a nice Sunday drive (when the traffic isn't too terrible), stopping the car here and there to enjoy the beautiful vistas, and maybe even picking up a souvenir now and then.

Before I came to InterNIC, I was a librarian at a Special Library at the University of Wisconsin. My job there was not information guide, but information hauler. The professors I worked for appreciated the nice tours I occasionally provided them. But they enjoyed it much more when I backed my information eighteen wheeler at their dock and unloaded a truckful of information that they could process into new knowledge.

"Information providers don't realize how much more useful sites could be if they simply provided FTP access as well as web access."

The Internet is about two things, communication and sharing of information. While one can get information from the web, its main function, it could be argued, is communication. Another Internet access method, FTP (File Transfer Protocol), is much more effective for industrial strength information sharing. It allows you to to trade in your nice Sunday car for an eighteen wheeler, fifty car freight train, or even a super tanker. It is the best way to quickly obtain enormous amounts of information from the Internet, and one of the great drawbacks of the Net is that information providers don't realize how much more useful sites could be if they simply provided FTP access as well as web access.

This column will be a tour rather than a tutorial on FTP, although we will show the basic steps of how to obtain information via this access method. By taking you to just a few sites, we will demonstrate how you can use FTP to take full advantage of Internet resources. Not only that, but with just a little practice, you will be able to tell your friends that not only do you "surf the web," but that you also know how to drive an eighteen wheeler.

Before the tour, it is important to point out the key difference between FTP and web access, which is the ability to download multiple files (the mget command). There are other differences as well, but this is the most important one for users.

First, for those who need to know how to use anonymous FTP (a type of FTP that allows any user access to Internet FTP resources), the best place to start is the FTP FAQ (Frequently Asked Questions) at the Usenet FAQ archives at Massachussetts Institute of Technology (MIT) (ftp://rtfm.mit.edu/pub/usenet/news.answers/ftp-list/faq). If you already have an FTP client, now is as good as any time to use it.

ftp rtfm.mit.edu
login: anonymous
password: your email address
cd pub/usenet/news.answers/ftp-list/
get faq

(cd means change directories)

(Note that in this and all cases, directories are separated by / and you may have to change directories individually, depending on your client.)

For those who don't have FTP access tools, they can be obtained many places, one of the most effective of which is the PBS (Public Broadcasting System) Beginner's Guide to the Internet FTP section (http://www.pbs.org/uti/guide/ftp.html). Here you can find not only FTP information, but also connections to FTP programs for Windows® and Macintosh®, and file decompression software that you may need. It is very important, especially if you are new to FTP, to obtain this information before you continue.

Now, let's take a look at a well-maintained FTP archive as a model for FTP maintenance, as well as to see advantages of the FTP access method.

The 15 Minute Series (http://rs.internic.net/nic-support/15min)

This is the InterNIC Information and Education Services' set of materials for Internet trainers. If you access the 15 Minute Series through the web, you can do many interesting things, such as search or browse the materials, or even download each set in HTML or PowerPoint format. The site is also useful in that it provides exhaustive instructions about decompressing and using the materials. However, if you were interested in downloading all the materials in the Index and Search Services section, for example, FTP would be a much more effective way to do it.

ftp rs.internic.net
login: anonymous
password: your email address
cd NIC-support/15Min

At this point, if you didn't know where the Index and Search materials were, you would download (or view, if your FTP client were able to) the files called "table-of-contents.txt" and "instructions.txt."

get table-of-contents.txt
get instructions.txt

If your client doesn't support viewing files, you must download these files (index and help files) and look at them first to see what files you want to download. Admittedly, this is cumbersome, but sometimes driving an eighteen wheeler is cumbersome. Remember that it is the data you can obtain that is the advantage of FTP.

In this case, InterNIC has provided the information you need to know about where to find the index-search materials, as well as instructions on how to differentiate the HTML from PowerPoint files. This is good FTP netiquette, and any effective FTP archive will have some sort of table of contents or instruction file that identifies the files in the archive and how to use them. Now, if you viewed the two files in your client, you can simply go to the index-search directory. If not, you might need to open another FTP session to get the files.

ftp rs.internic.net
login: anonymous
password: your email address
cd NIC-support/15min/index-search

Here we see that there are some text files (instructional in nature), some tar.gz files (HTML materials) and .zip files (PowerPoint materials). Now, if you want to download, say, all five searching modules in PowerPoint format:

mget *.zip (for PowerPoint materials)

Note that you told the client to download in binary mode, and turned the prompt toggle off so that you are not queried before each file is downloaded. Then, using the mget command (multiple get), you got all the files with one command. * tells the client to get everything with a .zip extension. With the proper decompression software, you have access to all the files. Note that it is crucial to tell the client to download binary files in binary mode, or the files will be useless.

You have left your car and are now driving a small truck. You could have done all this through a web browser (using the format ftp://...), and the browser would even recognize the binary format when downloading, but you would only be able to download one file at a time.

The advantages to FTP become clear when you decide you want to download the entire 15 Minute Series (31 modules at present.)

ftp rs.internic.net
login: anonymous
password: your email address
cd NIC-Support/15min/modules
mget *.zip OR mget *.tar.gz

Sit back and have a cup of coffee while FTP loads your truck with the 15 Minute Series.

The above is a case where the web was a good place to find out what the 15 Minute Series is about, how it is organized, and what a module looks like. Once you have seen that, use FTP to get the goods.

Now, for just a few examples of how FTP can help load the information goods in your eighteen wheeler. We will look at both the web and ftp sites of these information repositories, in order to see how you can use both to aid in your information mining.

  1. USDA (United States Department of Agriculture) Economics and Statistics System. If you are interested in U.S. dairy statistics you can go to the web site (http://www.mannlib.cornell.edu/usda/) and do a title lookup on "dairy." Here you would find the Dairy Yearbook, a compendium of over 100 time series. The series are available in Lotus spreadsheet format, and with the proper helper applications installed, you could look at them one at a time.

    Via FTP, you can download the entire Dairy Yearbook with one command.

    ftp usda.mannlib.cornell.edu
    login: anonymous
    password: your email address
    cd usda/data-sets/livestock/89032
    mget *.wk1

    The problem in this case is that the web site does not directly point you to FTP access or explain the directory structure. If you didn't already know that FTP access was available, you might never find it. You would have to send a help message to the Albert R. Mann Library (Cornell University) to find out about the FTP site, or access the gopher archive, (gopher://usda.mannlib.cornell.edu) where FTP access information is available. This, of course, is an information maintenance problem, not an FTP problem. But it is a common one.

  2. Social Sciences Oriented Subject Bibliographies (http://coombs.anu.edu.au/CoombswebPages/BiblioClear.html) Here is a repository of over 160 Social Science bibliographies maintained at the Australian National University. If you are interested in Chinese studies related bibliographies you can find 10 here. You could read, print, or download them one at a time.

    Via FTP you can download 5 of them at one time (as the FTP archive is not as complete as the web archive.) This, again is a problem with archive maintenance, not FTP.

    ftp coombs.anu.edu.au
    login: anonymous
    password: your email address
    cd coombspapers/subj-bibl-clearinghouse/
    mget chin*.*

    Since these are text files, no binary command is necessary.

  3. U.S. Bureau of Labor Statistics Selective Data Access (http://stats.bls.gov:80/sahome.html)

    This is a marvelous site, where you can retrieve selective data by querying a form. If you are interested in Local Area Unemployment Statistics, you can find them here.

    Via FTP:

    ftp to stats.bls.gov
    login: anonymous
    password: your email address
    cd pub/time.series/la
    get la.area
    get la.area.type
    get la.contacts
    get la.doc
    get la.measure
    get la.period
    get la.series
    get la.data.58.Wisconsin

    In this case you get all the monthly information available about Wisconsin (a 1.6 megabyte file), along with all the documentation to explain the file. Through FTP, BLS has provided the entire Local Area Unemployment dataset, which you could then use other statistical programs to analyze. The above download would fit in an eighteen wheeler, but if you wanted to, you could fill a supertanker with Local Area Unemployment information about all the states.

Users should note that in all the above examples save the 15 Minute Series, it was difficult to directly correlate the web to the associated FTP information. If you want to use FTP for downloading lots of information, you should expect this, and expect to contact the information maintainers or your librarian to help you. Driving an eighteen wheeler is often more difficult than driving in your car. It will continue to be so until information maintainers realize that easy FTP access is as important as easy web access. Unfortunately, this is not a widely held principle in the Internet community.

That said, if you really want to exploit the resources of the Internet, a working knowledge of FTP is required. And by the way, webmasters use it extensively to set up the pretty web scenery that we all enjoy. That is called uploading FTP files, but that is something for another place and time.

InterNIC News

This article originally appeared as part of the End User's Corner, a featured column of InterNIC News, which was published monthly by Network Solutions, Inc. and InterNIC from May 1996 through March 1998. As of April 1998, End User's Corner will be published by the Internet Scout Project.

Copyright Susan Calcari and the University of Wisconsin Board of Regents, 1994-1998. Permission is granted to make and distribute verbatim copies of the End User's Corner provided the copyright notice and this paragraph is preserved on all copies. The Internet Scout Project provides information about the Internet to the US research and education community under a grant from the National Science Foundation, number NCR-9712140. The Government has certain rights in this material.

Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the University of Wisconsin - Madison or the National Science Foundation.

Internet Scout

A Publication of the Internet Scout Project

Comments, Suggestions, Feedback

Use our feedback form or send email to scout@cs.wisc.edu.

© 1997 Internet Scout Project

Information on reproducing any publication is available on our copyright page.