Internet Power Searching:
Finding Pearls in a Zillion Grains of Sand
by Amelia Kassel
During the past two years, web content has expanded enormously. Global access to hundreds of government resources and agencies worldwide, more than 1,400 Internet-based online public access catalogs (OPACs) from libraries on every continent1, professional and trade associations, and experts in millions of subjects are just a few examples of categories of information not readily found online in the past. As the Internet erupted, search engines, metasearch engines, and intelligent agents with value-added features came on the scene and gradually began to refine their offerings, turning information retrieval into a more organized process than ever before. Traditional vendors used by professional searchers also became accessible on the web. For example The Dialog Corporation, Dow Jones Interactive, LEXIS-NEXIS, OCLC FirstSearch, Ovid, Silver Platter, and STN all now provide web-based database searching2. In addition, a 1997 survey of database producers on the web found remarkable progress3. Of fifty-four leading databases from thirty-eight database producers, thirty-five searchable databases were either on the web or had been announced. Added to these, new entrepreneurial publishers, also called niche market research boutiques, entered the market. This incredible growth has made the Internet the major research tool of the late twentieth century-although not without some serious shortcomings. Unfortunately, much time can be spent-and wasted-when searching without knowing the tricks of the trade. Furthermore, the search engines are constantly changing, growing, and improving in their quality and capabilities for locating needed information. As a result, library and information professionals must learn new skills and incorporate them into their daily activities. There is no doubt that the technology has come a long way but still has a long way to go and improvements are on the horizon. Nevertheless, a major challenge for information professionals is knowing how to find what's needed.
Search Engine Size
An April 1998 article in Science measured the size of the Internet and reported 320 million pages at that time4. This figure has grown to more than 380 million plus hundreds of databases in recent months. Nevertheless, one of the search engines, HotBot, has estimated that only 200 million pages are searchable within their system. These numbers, along with other information about search engine coverage indicate that a large proportion of the web is not reachable at all through search engines. According to Danny Sullivan (http://searchenginewatch.com), there are both technical and physical reasons that search engine coverage is incomplete. Some of the reasons are:
Since so many web sites can not be reached, it is important for researchers to amass knowledge about a range of resources useful for uncovering information not found by search engines, as well as to learn how to use search engines for a range of requests.
Focus on Big
The new Internet economy has brought about the development of competing search engine companies, each with its own proprietary software. Sites are collected and updated differently. After a search is conducted, one search engine provides exactly what's required within the first ten hits whereas another is useless. Frequently, there is tremendous overlap, although no two search engines are exactly alike. Since the outcome varies from search engine to search engine, researchers often find it necessary to use several search engines for the same question for either the best or more comprehensive results. The larger the index compiled by a search engine, the more likely the chance of finding obscure material. Spiders or crawlers constantly visit sites to create catalogs or indexes of web pages that are searchable. Results are sorted or ranked by relevancy based on individual proprietary algorithms. Although dozens of search engines now exist, the focus here is on those that are big. One of the major search engines is AltaVista (http://www.altavista.com). It began operation in 1995 and is one of the largest. It remained unchallenged until September 1997 when Hotbot (http://www.hotbot.com) began to compete and surpassed it in terms of number of pages indexed at that time. Other search engines of note are Excite (http://www.excite.com) and Northern Light (http://www.northernlight.com). In fact, early this year, Greg R. Notess (http://www.notess.com/search) suggested that Northern Light now ranks first, followed by AltaVista and HotBot. Another very well known and useful site is Yahoo, (http://www.yahoo.com), the oldest web directory with some 750,000 sites. It is based on user submissions and staff selections. All of the search engines mentioned here, plus Yahoo, have expanded and improved whereas others have tapered off in size or completely disappeared. Some key features of the largest search engines follow.
AltaVista (http://www.altavista.com)
Excite (http://www.excite.com)
HotBot (http://www.hotbot.com)
Northern Light (http://www.northernlight.com)
Yahoo (http://www.yahoo.com)
DejaNews (http://www.dejanews.com) and Reference.com (http://www.reference.com)
Where to Start
Where and how to search depends on research goals and needs. Indeed, whether to use the Internet or a traditional database is often the first decision and whether to use a narrow or broad strategy is another consideration. Fundamentally, it's necessary to become familiar with several major search engines and select the right one for the job. Much Internet research is trial and error and serendipity, too. Nonetheless, self-education is necessary and preparing for Internet research involves visiting major search engine sites to review how each works. The more that is known about a particular search engine, the better prepared the searcher will be to decide which is appropriate for each request. Each search engine provides detailed instructions about basic or simple searches and how to use more advanced or power searching techniques. Before searching, it's important to plan the search by considering unique words, phrases, and synonyms that describe the topic. Once a search is conducted, a review of results can lead to reformulating the search when what you are looking for is not found. If you find yourself spending too much time at one site, move on to the next search engine. Search results often improve when taking a search elsewhere.
Search Engine Basic Hints & Tips
Search Engine Advanced Hints & Tips
One of the best ways to refine searches is with power features such as field searching. Ran Hock explains that, "fortunately, some web search engines do provide at least a rudimentary field search capability, but because of the immature nature of the engines, the options are neither very numerous nor particularly sophisticated." AltaVista allows date, title, URL, and language searching, plus a half-dozen other fields all related to the types of features included on the page, such as image and sound files. HotBot, similarly, provides date, title, and URL searching. In addition, it lets a user search for records that contain a sound or video file, search by page depth, by what words are included in hypertext links, and for the presence of a variety of scripting languages and plug-ins. For a detailed discussion on this subject, see Hock's article "How to Do Field Searching in Web Search Engines: A Field Trip" 5.
Metasearch Engines
Metasearch engines are web sites that send a search to several search engines all at once. Often, only a selected number of sites from each search engine are identified and then incorporated into what are blended results from many search engines into one page. Some well-known metasearch engines are described below.
Dogpile (http://www.dogpile.com)
Dogpile integrates many search engines as well as other types of sources and sorts the results by search engine. Included in the search are 1) Search engines: Yahoo!, Lycos' A2Z, Excite Guide, GoTo.com, PlanetSearch, Thunderstone, What U Seek, Magellan, Lycos, WebCrawler, InfoSeek, Excite and AltaVista, 2) Usenet: Reference.com, Dejanews, AltaVista and Dejanews' old database. 3) More than two dozen online news services or other types of sources.
Internet Sleuth (http://www.isleuth.com)
Internet Sleuth is a 3,000-strong collection of specialized online databases, which can also simultaneously search up to six other search sites for web pages, news, and other types of information. It's excellent for highly specialized searches of any subjects in its detailed directory.
Links popular Net search engines and allows you to specify categories like business, computers, education, sports, etc.
MetaCrawler (http://www.metacrawler.com)
A powerful metasearch engine that searches several popular search engines and sorts the results. It is excellent for getting a quick hit of what's out there. But if you don't see what you want in the results, its limited search options make it tough to issue really precise queries.
ProFusion (http://www.profusion.com)
Lets you select what search engines to search including AltaVista, InfoSeek, Lycos, Excite, WebCrawler, and others. Filters results to remove duplicates and broken links.
SavvySearch (http://www.savvysearch.com)
Searches multiple Internet search engines, web directories such as Yahoo or Magellan, Usenet, and other sources via just one query and then returns the linked results.
Intelligent Agents
Metasearch engines can be advantageous for getting a quick overview, but because every search engine differs in how it functions and because metasearch engines provide limited results per each search engine, the outcome is incomplete. In addition, some metasearch engines are rather slow and create another problem, that of duplicates. A better solution is to consider using intelligent agents, software programs that search many search engines at once, similarly to metasearch engines, but which add other features such as automatically finding, analyzing, filtering, and presenting information rapidly. BullsEye, one of the most recent entrants to the marketplace, offers a trial version for download (http://www.intelliseek.com). As compared to metasearch engines, one valuable feature is that the user can specify the number of total hits and how many are desired from each search engine. As a result, a much larger list of hits is created than when using metasearch engines on the web. A unique and automated feature of BullsEye is that it can track and update searches based on the time frame selected by the user-either hourly, daily, weekly-and then e-mail updates to you.
Hard-to-Find Information
Two categories of hard-to-find information are industry statistics and market data. Often, this information is developed and provided by two distinct types of organizations-government agencies or professional and trade associations. Consider what agency or association would typically generate the required information and search for that first. For example, when looking for U.S. population statistics, consult the U.S. Bureau of the Census at http://www.census.gov since it is the governmental agency responsible for compiling these statistics. If you need market data about restaurants, try the National Restaurant Association at http://www.restaurant.org. A reference book for additional help with hard-to-find information is Finding Statistics Online by Paula Berinstein, Information Today, Inc., 1998 (http://www.infotoday.com). Here are some additional web sites which are useful for finding information not readily available or indexed by search engines.
Price's List of Lists (http://gwis2.circ.gwu.edu/~gprice/listof.htm)
The Internet contains many lists of information in the form of rankings of different people, organizations, companies, etc. This site contains a collection that is designed to be a clearinghouse for these types of resources.
Direct Search (http://gwis.circ.gwu.edu/~gprice/direct.htm)
This site contains links to resources not easily searchable by search engines such as archives & library catalogs, books, news sources, and ready reference
Internet Publishers & Databases
Although there is an astounding amount of free information, professional researchers have also seen the commercialization of the web during the past year. As mentioned previously, many traditional commercial database vendors who were available only through dial-up telecommunications have launched web products and new publishers have entered the market with unique products. Here are examples of some of the new producers or products that have come onto the scene:
Web Tools & Specialty Search Engines
A very interesting web navigation service is Alexa (http://www.alexa.com). It works in conjunction with a web browser and resides as a tool bar at the bottom of the browser. Alexa provides useful information about the sites you are visiting and suggests related sites with links to click on. This can immediately add relevant sites to the search process as one way to save time on a search. An
example of a specialty search engine is Liszt (http://www.liszt.com). Liszt provides brief descriptions of some 90,000 electronic mailing lists and discussion groups. These are especially valuable for keeping up with current trends in your own profession or those related to your areas of subject expertise and interest. A search can be initiated by key word or there are broad categories from which to choose such as Business, Computer, Education, Politics, or Science. Another specialty search engine for finding companies from all over the world is Corporate Information (http://www.corporateinformation.com). It's new search engine and A-Z list of countries with links to sites makes this a unique source for global company information.
Keeping Up
Keeping up with changes in search engines and the latest information necessary for professional information workers is quite a challenge. Here are some selected sources:
Cyberskeptic Guide to Internet Research (http://www.bibliodata.com) is a newsletter with articles about useful sites for searchers.
Free Pint (http://www.freepint.co.uk) is a British-based free e-mail newsletter that includes information on quality and reliable information on the web. It contains tips, tricks, and articles written by information professionals in the United Kingdom and is currently sent to more than 12,000 information professionals every two weeks.
On the Net (http://www.onlineinc.com), a column by Greg Notess covers the information side of the Internet and is published in Online and Database.
The Search Engine Update (http://searchenginewatch.com) is a free site with a subscription-based e-mail newsletter emailed twice monthly with access to "in progress" projects and detailed information only available to subscribers.
Web Wise Ways (http://www.infotoday. com) a column by Amelia Kassel, began in October 1998 and is published in Searcher magazine. This column provides in-depth reviews of new web-based research products and compares them to traditional commercial database products when applicable.
What's Next for Internet Power Searchers?
Just when searchers have conquered the methods and idiosyncrasies of a search engine, it changes. My very first personal favorite, Open Text, has disappeared. I then discovered that Hotbot was easy-to-use and most satisfactory for the majority of my research requests. Of late, Northern Light, the most significant entry to the playing field during the past year and half, continues to add new content and features while others have remained either fairly static or in some cases deteriorated. In recent months, there has been a hush in new search engine development. Nothing much new! Nevertheless, Reva Basch points out that, with regard to search engines, "the only constant is change"6 . This insightful comment implies, to me, that information professionals will want to continue their experimentation with search engines, and acclimate themselves to changes or new features. For the moment, we can hone our skills using existing products while waiting to see what the next generation will bring. For now, searchers will need to continue to identify, collect, evaluate, and organize useful web sites and learn new tools that come onto the scene since so much on the web is not accessible via search engines. Many of the same skills that we learned in graduate schools of library and information science are applicable to this new searching environment that we have had to meet head on.
Amelia Kassel is president and owner of MarketingBASE, a successful
information brokerage specializing in market research, competitive intelligence,
and worldwide business information since 1984. Kassel holds a Master's degree
in library science(1971, UCLA) and combines an in-depth knowledge of information
sources with an emphasis on theuse of databases, and a knowledgeof business
and marketing strategies. Kassel has taught information brokering and electronic
research for the University of California, Berkeley and San Jose State University,
Division of Library and Information Science. A recognized author and national
and international speaker, she also conducts workshops for conferences and
associations.
Error processing SSI file