Psychological testing - Validity, Reliability, Norms

psychological testing

Table of Contents

References & Edit History Quick Facts & Related Topics

Images

For Students

psychological testing summary

Discover

African Americans demonstrating for voting rights in front of the White House as police and others watch, March 12, 1965. One sign reads, "We demand the right to vote everywhere." Voting Rights Act, civil rights.

Timeline of the American Civil Rights Movement

Ice Sledge Hockey, Hockey Canada Cup, USA (left) vs Canada, 2009. UBC Thunderbird Arena, Vancouver, BC, competition site for Olympic ice hockey and Paralympic ice sledge hockey. Vancouver 2010 Olympic and Paralympic Winter Games, Vancouver Olympics

10 Best Hockey Players of All Time

Attack on Pearl Harbor. December 7, 1941. Pearl Harbor infographic. World War II. Hawaii. United States. Japan. SPOTLIGHT VERSION

Attack on Pearl Harbor Timeline

Secret Service Agent Listens To Earpiece

Secret Service Code Names of 11 U.S. Presidents

bird. mourning dove. pigeon and dove. Mourning Dove (Zenaida macroura) family Columbidae.

Nikola Tesla's Weird Obsession with Pigeons

Which Waters Do You Pass Through When You “Sail the Seven Seas”?

In this aerial photo, structures are damaged and destroyed October 15, 2005 in Balakot, Pakistan. It is estimated that 90% of the city of Balakot was leveled by the earthquake. The death toll in the 7.6 magnitude earthquake that struck northern Pakistan on October 8, 2005 is believed to be 38,000 with at least 1,300 more dead in Indian Kashmir. SEE CONTENT NOTES.

6 of the World’s Deadliest Natural Disasters

Other characteristics

inpsychological testing inGeneral problems of measurement in psychology

Also known as: psychological measurement, psychometrics

Written by Donald W. Fiske, Dorothy C. Adkins•All

Fact-checked by The Editors of Encyclopaedia Britannica

Article History

Also called:: psychometrics

Key People:: Sir Cyril Burt; L. L. Thurstone; James McKeen Cattell; Robert M. Yerkes; Joy Paul Guilford

Related Topics:: personality assessment; ecological validity; intelligence test; ipsative measurement; aptitude test

See all related content

A test that takes too long to administer is useless for most routine applications. What constitutes a reasonable period of testing time, however, depends in part on the decisions to be made from the test. Each test should be accompanied by a practicable and economically feasible scoring scheme, one scorable by machine or by quickly trained personnel being preferred.

A large, controversial literature has developed around response sets; i.e., tendencies of subjects to respond systematically to items regardless of content. Thus, a given test taker may tend to answer questions on a personality test only in socially desirable ways or to select the first alternative of each set of multiple-choice answers or to malinger (i.e., to purposely give wrong answers).

Response sets stem from the ways subjects perceive and cope with the testing situation. If they are tested unwillingly, they may respond carelessly and hastily to get through the test quickly. If they have trouble deciding how to answer an item, they may guess or, in a self-descriptive inventory, choose the “yes” alternative or the socially desirable one. They may even mentally reword the question to make it easier to answer. The quality of test scores is impaired when the purposes of the test administrator and the reactions of the subjects to being tested are not in harmony. Modern test construction seeks to reduce the undesired effects of subjects’ reactions.

Types of instruments and methods

Psychophysical scales and psychometric, or psychological, scales

The concept of an absolute threshold (the lowest intensity at which a sensory stimulus, such as sound waves, is perceived) is traceable to the German philosopher Johann Friedrich Herbart. The German physiologist Ernst Heinrich Weber later observed that the smallest discernible difference of intensity is proportional to the initial stimulus intensity. Weber found, for example, that, while people could just notice the difference after a slight change in the weight of a 10-gram object, they needed a larger change before they could just detect a difference from a 100-gram weight. This finding, known as Weber’s law, is expressed more technically in the statement that the perceived (subjective) intensity varies mathematically as the logarithm of the physical (objective) intensity of the stimulus.

In traditional psychophysical scaling methods, a set of standard stimuli (such as weights) that can be ordered according to some physical property is related to sensory judgments made by experimental subjects. By the method of average error, for example, subjects are given a standard stimulus and then made to adjust a variable stimulus until they believe it is equal to the standard. The mean (average) of a number of judgments is obtained. This method and many variations have been used to study such experiences as visual illusions, tactual intensities, and auditory pitch.

barometer. Antique Barometer with readout. Technology measurement, mathematics, measure atmospheric pressure

Britannica Quiz

Fun Facts of Measurement & Math

Psychological (psychometric) scaling methods are an outgrowth of the psychophysical tradition just described. Although their purpose is to locate stimuli on a linear (straight-line) scale, no quantitative physical values (e.g., loudness or weight) for stimuli are involved. The linear scale may represent an individual’s attitude toward a social institution, his judgment of the quality of an artistic product, the degree to which he exhibits a personality characteristic, or his preference for different foods. Psychological scales thus are used for having a person rate his own characteristics as well as those of other individuals in terms of such attributes, for example, as leadership potential or initiative. In addition to locating individuals on a scale, psychological scaling can also be used to scale objects and various kinds of characteristics: finding where different foods fall on a group’s preference scale; or determining the relative positions of various job characteristics in the view of those holding that job. Reported degrees of similarities between pairs of objects are used to identify scales or dimensions on which people perceive the objects.

The American psychologist L.L. Thurstone offered a number of theoretical-statistical contributions that are widely used as rationales for constructing psychometric scales. One scaling technique (comparative judgment) is based empirically on choices made by people between members of any series of paired stimuli. Statistical treatment to provide numerical estimates of the subjective (perceived) distances between members of every pair of stimuli yields a psychometric scale. Whether or not these computed scale values are consistent with the observed comparative judgments can be tested empirically.

Another of Thurstone’s psychometric scaling techniques (equal-appearing intervals) has been widely used in attitude measurement. In this method judges sort statements reflecting such things as varying degrees of emotional intensity, for example, into what they perceive to be equally spaced categories; the average (median) category assignments are used to define scale values numerically. Subsequent users of such a scale are scored according to the average scale values of the statements to which they subscribe. Another psychologist, Louis Guttman, developed a method that requires no prior group of judges, depends on intensive analysis of scale items, and yields comparable results. Quite commonly used is the type of scale developed by Rensis Likert in which perhaps five choices ranging from strongly in favour to strongly opposed are provided for each statement, the alternatives being scored from one to five. A more general technique (successive intervals) does not depend on the assumption that judges perceive interval size accurately. The widely used graphic rating scale presents an arbitrary continuum with preassigned guides for the rater (e.g., adjectives such as superior, average, and inferior).