Information searching and retrieval
State-of-the-art approaches to retrieving information employ two generic techniques: (1) matching words in the query against the database index (key-word searching) and (2) traversing the database with the aid of hypertext or hypermedia links.
Key-word searches can be made either more general or more narrow in scope by means of logical operators (e.g., disjunction and conjunction). Because of the semantic ambiguities involved in free-text indexing, however, the precision of the key-word retrieval technique—that is, the percentage of relevant documents correctly retrieved from a collection—is far from ideal, and various modifications have been introduced to improve it. In one such enhancement, the search output is sorted by degree of relevance, based on a statistical match between the key words in the query and in the document; in another, the program automatically generates a new query using one or more documents considered relevant by the user. Key-word searching has been the dominant approach to text retrieval since the early 1960s; hypertext has so far been largely confined to personal or corporate information-retrieval applications.
The exponential growth of the use of computer networks in the 1990s presages significant changes in systems and techniques of information retrieval. In a wide-area information service, a number of which began operating at the beginning of the 1990s on the Internet computer network, a user’s personal computer or terminal (called a client) can search simultaneously a number of databases maintained on heterogeneous computers (called servers). The latter are located at different geographic sites, and their databases contain different data types and often use incompatible data formats. The simultaneous, distributed search is possible because clients and servers agree on a standard document addressing scheme and adopt a common communications protocol that accommodates all the data types and formats used by the servers. Communication with other wide-area services using different protocols is accomplished by routing through so-called gateways capable of protocol translation. The architecture of a typical networked information system is illustrated in . Several representative clients are shown: a “dumb” terminal (i.e., one with no internal processor), a personal computer (PC), a Macintosh (Mac), and a NeXT machine. They have access to data on the servers sharing a common protocol as well as to data provided by services that require protocol conversion via the gateways. Network news is such a wide-area service, containing hundreds of news groups on a variety of subjects, by which users can read and post messages.
Evolving information-retrieval techniques, exemplified by an experimental interface to the NASA space shuttle reference manual, combine natural language, hyperlinks, and key-word searching. Other techniques, seeking higher levels of retrieval precision and effectiveness, are studied by researchers involved with artificial intelligence and neural networks. The next major milestone may be a computer program that traverses the seamless information universe of wide-area electronic networks and continuously filters its contents through profiles of organizational and personal interest: the information robot of the 21st century.
Information display
For humans to perceive and understand information, it must be presented as print and image on paper; as print and image on film or on a video terminal; as sound via radio or telephony; as print, sound, and video in motion pictures, on television broadcasts, or at lectures and conferences; or in face-to-face encounters. Except for live encounters and audio information, such displays emanate increasingly from digitally stored data, with the output media being video, print, and sound.
Video
Possibly the most widely used video display device, at least in the industrialized world, is the television set. Designed primarily for video and sound, its image resolution is inadequate for alphanumeric data except in relatively small amounts. Use of the television set in text-oriented information systems has been limited to menu-oriented applications such as videotex, in which information is selected from hierarchically arranged menus (with the aid of a numeric keyboard attachment) and displayed in fixed frames. The television, computer, and communications technologies are, however, converging in a high-resolution digital television set capable of receiving alphanumeric, video, and audio signals.
The computer video terminal is today’s ubiquitous interface that transforms computer-stored data into analog form for human viewing. The two basic apparatuses used are the cathode-ray tube (CRT) and the more recent flat-panel display. In CRT displays an electron gun emits beams of electrons on a phosphorus-coated surface; the beams are deflected, forming visible patterns representative of data. Flat-panel displays use one of four different media for visual representation of data: liquid crystal, light-emitting diodes, plasma panels, and electroluminescence. Advanced video display systems enable the user to scroll, page, zoom (change the scale of the details of the display image for enhancement), divide the screen into multiple colours and windows (viewing areas), and in some cases even activate commands by touching the screen instead of using the keyboard. The information capacity of the terminal screen depends on its resolution, which ranges from low (character-addressable) to high (bit-addressable). High resolution is indispensable for the display of graphic and video data in state-of-the-art workstations, such as those used in engineering or information systems design.
Modern society continues to be dominated by printed information. The convenience and portability of print on paper make it difficult to imagine the paperless world that some have predicted. The generation of paper print has changed considerably, however. Although manual typesetting is still practiced for artwork, in special situations, and in some developing countries, electronic means of composing pages for subsequent reproduction by photoduplication and other methods has become commonplace.
Since the 1960s, volume publishing has become an automated process using large computers and high-speed printers to transfer digitally stored data on paper. The appearance of microcomputer-based publishing systems has proved to be another significant advance. Economical enough to allow even small organizations to become in-house publishers, these so-called desktop publishing systems are able to format text and graphics interactively on a high-resolution video screen with the aid of page-description command languages. Once a page has been formatted, the entire image is transferred to an electronic printing or photocomposition device.
Printers
Computer printers are commonly divided into two general classes according to the way they produce images on paper: impact and nonimpact. In the first type, images are formed by the print mechanism making contact with the paper through an ink-coated ribbon. The mechanism consists either of print hammers shaped like characters or of a print head containing a row of pins that produce a pattern of dots in the form of characters or other images.
Most nonimpact printers form images from a matrix of dots, but they employ different techniques for transferring images to paper. The most popular type, the laser printer, uses a beam of laser light and a system of optical components to etch images on a photoconductor drum from which they are carried via electrostatic photocopying to paper. Light-emitting diode (LED) printers resemble laser printers in operation but direct light from energized diodes rather than a laser onto a photoconductive surface. Ion-deposition printers make use of technology similar to that of photocopiers for producing electrostatic images. Another type of nonimpact printer, the ink-jet printer, sprays electrically charged drops of ink onto the print surface.
Microfilm and microfiche
Alphanumeric and image information can be transferred from digital computer storage directly to film. Reel microfilm and microfiche (a flat sheet of film containing multiple microimages reduced from the original) were popular methods of document storage and reproduction for several decades. During the 1990s they were largely replaced by optical disc technology (see above Recording media).
Voice
In synthetic speech generation, digitally prestored sound elements are converted to analog sound signals and combined to form words and sentences. Digital-to-analog converters are available as inexpensive boards for microcomputers or as software for larger machines. Human speech is the most effective natural form of communication, and so applications of this technology are becoming increasingly popular in situations where there are numerous requests for specific information (e.g., time, travel, and entertainment), where there is a need for repetitive instruction, in electronic voice mail (the counterpart of electronic text mail), and in toys.
Dissemination of information
The process of recording information by handwriting was obviously laborious and required the dedication of the likes of Egyptian scribes or monks in monasteries around the world. It was only after mechanical means of reproducing writing were invented that information records could be duplicated more efficiently and economically.
The first practical method of reproducing writing mechanically was block printing; it was developed in China during the T’ang dynasty (618–907). Ideographic text and illustrations were engraved in wooden blocks, inked, and copied on paper. Used to produce books as well as cards, charms, and calendars, block printing spread to Korea and Japan but apparently not to the Islamic or European Christian civilizations. European woodcuts and metal engravings date only to the 14th century.
Printing from movable type was also invented in China (in the mid-11th century ad). There and in the bookmaking industry of Korea, where the method was applied more extensively during the 15th century, the ideographic type was made initially of baked clay and wood and later of metal. The large number of typefaces required for pictographic text composition continued to handicap printing in the Orient until the present time.
The invention of character-oriented printing from movable type (1440–50) is attributed to the German printer Johannes Gutenberg. Within 30 years of his invention, the movable-type printing press was in use throughout Europe. Character-type pieces were metallic and apparently cast from metallic molds; paper and vellum (calfskin parchment) were used to carry the impressions. Gutenberg’s technique of assembling individual letters by hand was employed until 1886, when the German-born American printer Ottmar Mergenthaler developed the Linotype, a keyboard-driven device that cast lines of type automatically. Typesetting speed was further enhanced by the Monotype technique, in which a perforated paper ribbon, punched from a keyboard, was used to operate a type-casting machine. Mechanical methods of typesetting prevailed until the 1960s. Since that time they have been largely supplanted by the electronic and optical printing techniques described in the previous section.
Unlike the use of movable type for printing text, early graphics were reproduced from wood relief engravings in which the nonprinting portions of the image were cut away. Musical scores, on the other hand, were reproduced from etched stone plates. At the end of the 18th century, the German printer Aloys Senefelder developed lithography, a planographic technique of transferring images from a specially prepared surface of stone. In offset lithography the image is transferred from zinc or aluminum plates instead of stone, and in photoengraving such plates are superimposed with film and then etched.
The first successful photographic process, the daguerreotype, was developed during the 1830s. The invention of photography, aside from providing a new medium for capturing still images and later video in analog form, was significant for two other reasons. First, recorded information (textual and graphic) could be easily reproduced from film, and, second, the image could be enlarged or reduced. Document reproduction from film to film has been relatively unimportant, because both printing and photocopying (see above) are cheaper. The ability to reduce images, however, has led to the development of the microform, the most economical method of disseminating analog-form information.
Another technique of considerable commercial importance for the duplication of paper-based information is photocopying, or dry photography. Printing is most economical when large numbers of copies are required, but photocopying provides a fast and efficient means of duplicating records in small quantities for personal or local use. Of the several technologies that are in use, the most popular process, xerography, is based on electrostatics.
While the volume of information issued in the form of printed matter continues unabated, the electronic publishing industry has begun to disseminate information in digital form. The digital optical disc (see above Recording media) is developing as an increasingly popular means of issuing large bodies of archival information—for example, legislation, court and hospital records, encyclopaedias and other reference works, referral databases, and libraries of computer software. Full-text databases, each containing digital page images of the complete text of some 400 periodicals stored on CD-ROM, entered the market in 1990. The optical disc provides the mass production technology for publication in machine-readable form. It offers the prospect of having large libraries of information available in virtually every school and at many professional workstations.
The coupling of computers and digital telecommunications is also changing the modes of information dissemination. High-speed digital satellite communications facilitate electronic printing at remote sites; for example, the world’s major newspapers and magazines transmit electronic page copies to different geographic locations for local printing and distribution. Updates of catalogs, computer software, and archival databases are distributed via e-mail, a method of rapidly forwarding and storing bodies of digital information between remote computers.
Indeed, a large-scale transformation is taking place in modes of formal as well as informal communication. For more than three centuries, formal communication in the scientific community has relied on the scholarly and professional periodical, widely distributed to tens of thousands of libraries and to tens of millions of individual subscribers. In 1992 a major international publisher announced that its journals would gradually be available for computer storage in digital form; and in that same year the State University of New York at Buffalo began building a completely electronic, paperless library. The scholarly article, rather than the journal, is likely to become the basic unit of formal communication in scientific disciplines; digital copies of such an article will be transmitted electronically to subscribers or, more likely, on demand to individuals and organizations who learn of its existence through referral databases and new types of alerting information services. The Internet already offers instantaneous public access to vast resources of noncommercial information stored in computers around the world.
Similarly, the traditional modes of informal communications—various types of face-to-face encounters such as meetings, conferences, seminars, workshops, and classroom lectures—are being supplemented and in some cases replaced by e-mail, electronic bulletin boards (a technique of broadcasting newsworthy textual and multimedia messages between computer users), and electronic teleconferencing and distributed problem-solving (a method of linking remote persons in real time by voice-and-image communication and special software called “groupware”). These technologies are forging virtual societal networks—communities of geographically dispersed individuals who have common professional or social interests.