What is validity of a research instrument?

Reliability and validity are important aspects of selecting a survey instrument.  Reliability refers to the extent that the instrument yields the same results over multiple trials.  Validity refers to the extent that the instrument measures what it was designed to measure.  In research, there are three ways to approach validity and they include content validity, construct validity, and criterion-related validity.

Content validity measures the extent to which the items that comprise the scale accurately represent or measure the information that is being assessed.  Are the questions that are asked representative of the possible questions that could be asked?

request a consultation

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

  • Bring dissertation editing expertise to chapters 1-5 in timely manner.
  • Track all changes, then work with you to bring about scholarly writing.
  • Ongoing support to address committee feedback, reducing revisions.

Construct validity measures what the calculated scores mean and if they can be generalized.  Construct validity uses statistical analyses, such as correlations, to verify the relevance of the questions.  Questions from an existing, similar instrument, that has been found reliable, can be correlated with questions from the instrument under examination to determine if construct validity is present.  If the scores are highly correlated it is called convergent validity.  If convergent validity exists, construct validity is supported.

Criterion-related validity has to do with how well the scores from the instrument predict a known outcome they are expected to predict.  Statistical analyses, such as correlations, are used to determine if criterion-related validity exists.  Scores from the instrument in question should be correlated with an item they are known to predict.  If a correlation of > .60 exists, criterion related validity exists as well.

Reliability can be assessed with the test-retest method, alternative form method, internal consistency method, the split-halves method, and inter-rater reliability.

Test-retest is a method that administers the same instrument to the same sample at two different points in time, perhaps one year intervals.  If the scores at both time periods are highly correlated, > .60, they can be considered reliable.  The alternative form method requires two different instruments consisting of similar content.  The same sample must take both instruments and the scores from both instruments must be correlated.  If the correlations are high, the instrument is considered reliable.  Internal consistency uses one instrument administered only once.  The coefficient alpha (or Cronbach’s alpha) is used to assess the internal consistency of the item.  If the alpha value is .70 or higher, the instrument is considered reliable.  The split-halves method also requires one test administered once.  The number of items in the scale are divided into halves and a correlation is taken to estimate the reliability of each half of the test.   To estimate the reliability of the entire survey, the Spearman-Brown correction must be applied.  Inter-rater reliability involves comparing the observations of two or more individuals and assessing the agreement of the observations.  Kappa values can be calculated in this instance.

An instrument that is a valid measure of third grader’s math skills probably is not a valid measure of high school calculus student’s math skills. An instrument that is a valid predictor of how well students might do in school, may not be a valid measure of how well they will do once they complete school.  So we never say that an instrument is valid or not valid…we say it is valid for a specific purpose with a specific group of people. Validity is specific to the appropriateness of the interpretations we wish to make with the scores.

In the reliability section, we discussed a scale that consistently reported a weight of 15 pounds for someone. While it may be a reliable instrument, it is not a valid instrument to determine someone’s weight in pounds. Just as a measuring tape is a valid instrument to determine people’s height, it is not a valid instrument to determine their weight.

There are three general categories of instrument validity.
Content-Related Evidence (also known as Face Validity)
Specialists in the content measured by the instrument are asked to judge the appropriateness of the items on the instrument. Do they cover the breath of the content area (does the instrument contain a representative sample of the content being assessed)? Are they in a format that is appropriate for those using the instrument? A test that is intended to measure the quality of science instruction in fifth grade, should cover material covered in the fifth grade science course in a manner appropriate for fifth graders. A national science test might not be a valid measure of local science instruction, although it might be a valid measure of national science standards.

Criterion-Related Evidence
Criterion-related evidence is collected by comparing the instrument with some future or current criteria, thus the name criterion-related. The purpose of an instrument dictates whether predictive or concurrent validity is warranted.

– Predictive Validity
If an instrument is purported to measure some future performance, predictive validity should be investigated. A comparison must be made between the instrument and some later behavior that it predicts.  Suppose a screening test for 5-year-olds is purported to predict success in kindergarten. To investigate predictive validity, one would give the prescreening instrument to 5-year-olds prior to their entry into kindergarten. The children’s kindergarten performance would be assessed at the end of kindergarten and a correlation would be calculated between the screening instrument scores and the kindergarten performance scores.

– Concurrent Validity
Concurrent validity compares scores on an instrument with current performance on some other measure.  Unlike predictive validity, where the second measurement occurs later, concurrent validity requires a second measure at about the same time.   Concurrent validity for a science test could be investigated by correlating scores for the test with scores from another established science test taken about the same time. Another way is to administer the instrument to two groups who are known to differ on the trait being measured by the instrument. One would have support for concurrent validity if the scores for the two groups were very different. An instrument that measures altruism should be able to discriminate those who possess it (nuns) from those who don’t (homicidal maniacs).  One would expect the nuns to score significantly higher on the instrument.

Construct-Related Evidence
Construct validity is an on-going process. Please refer to  pages 174-176 for more information. Construct validity will not be on the test.

– Discriminant Validity
An instrument does not correlate significantly with variables from which it should differ.

– Convergent Validity
An instrument correlates highly with other variables with which it should theoretically correlate.

Del Siegle, Ph.D.
Neag School of Education – University of Connecticut
[email protected]
www.delsiegle.info

What is validation of instrument in research?

When a test or measurement is "validated," it simply means that the researcher has come to the opinion that the instrument measures what it was designed to measure. In other words, validity is no more than an expert opinion.

What is validity and reliability of instrument in research?

Reliability refers to the extent that the instrument yields the same results over multiple trials. Validity refers to the extent that the instrument measures what it was designed to measure.

What is validity of a measuring instrument?

Validity refers to the degree to which an instrument accurately measures what it intends to measure.

Why is validity of the research instrument important?

An instrument that is externally valid helps obtain population generalizability, or the degree to which a sample represents the population. Content validity refers to the appropriateness of the content of an instrument. In other words, do the measures (questions, observation logs, etc.)