Convergent Validity
Convergent validity is the easier half of the Campbell-Fiske validity pair. If a new measure of conscientiousness correlates 0.65 with the NEO-PI-R conscientiousness scale, the convergent evidence is in hand. The harder question — whether the new measure correlates more strongly with conscientiousness than with neighboring constructs like emotional stability and agreeableness — is what discriminant validity was invented to address. The two are designed to be evaluated together, and reporting convergent validity without discriminant validity is reporting half a study.
The reason convergent evidence is easier to obtain is that the bar is intuitive. Two measures of the same construct should agree more than two measures of different constructs. The technical work is in choosing the right comparison measures, designing a study that doesn’t inflate the correlations with shared method variance, and interpreting coefficients that are usually below 0.80 and never close to 1.00. A perfect convergent correlation would actually be suspicious — it would suggest the two measures are the same instrument under different labels, which is redundancy rather than convergence.
The Two Things It Tests
Convergent validity does two related but distinguishable jobs:
Same-construct convergence. Two instruments that claim to measure the same thing should produce highly correlated scores. The new measure of work engagement should correlate strongly with the Utrecht Work Engagement Scale, with the Gallup Q12 summary score, with the Job Engagement Scale. If it doesn’t, either the construct definitions disagree (the new measure has carved up engagement differently than the established measures) or the new measure isn’t measuring what it claims to measure.
Related-construct convergence. A measure of a construct should correlate with measures of theoretically related (but distinct) constructs in the direction and magnitude that theory predicts. A measure of conscientiousness should correlate moderately and positively with measures of self-discipline, organization, and task persistence. If it correlates near zero with all of those, the construct definition isn’t holding up against the nomological network the theory implies.
The same-construct version is the more familiar one because it’s the test typically reported in instrument-development papers. The related-construct version is often more diagnostic because it tests the location of the construct within the broader trait space. A new measure of “grit” that correlates 0.85 with conscientiousness has convergent evidence — but the convergence is with a neighboring construct, not the same construct, and the interpretation is that the new measure is repackaging conscientiousness under a different name.
How Coefficients Are Interpreted
The expected magnitudes depend on what’s being compared:
- Same-construct, same-method. Two self-report measures of the same construct should correlate in the 0.50 to 0.80 range. Higher than 0.85 usually means redundancy (the measures are paraphrases of each other), lower than 0.40 usually means the construct definitions disagree.
- Same-construct, different-method. A self-report measure correlated with a behavioral or observer-rated measure of the same construct should correlate in the 0.30 to 0.60 range. Lower than this isn’t a failure — different-method correlations are systematically lower because shared method variance is removed, and the interpretation has to account for that.
- Related-construct. Measures of theoretically adjacent constructs should correlate in the 0.20 to 0.50 range depending on how close the constructs are. Conscientiousness and self-discipline should correlate higher than conscientiousness and emotional stability.
A coefficient of 0.45 between a new measure and an established one means different things depending on the method comparison. Same-method, it’s mediocre convergent evidence. Different-method, it’s strong. Vendor materials rarely make this distinction explicit.
The Method Variance Problem
The clean convergent-validity finding — high correlations with same-construct measures, lower correlations with different-construct measures — gets contaminated when all the measures use the same method. Self-report measures share response style, social desirability, acquiescence, and item-format effects. Two self-report measures of unrelated constructs will correlate higher than they should because both measures are partly picking up “general tendency to respond to items in characteristic ways.”
This shows up in the data as a baseline correlation between any pair of self-report scales, often in the 0.15 to 0.30 range, that has nothing to do with construct content. Convergent validity reported on same-method designs is inflated by this baseline. Discriminant validity reported on the same designs is also affected, but more visibly — the inflation makes neighboring constructs look more correlated than they are, which is the failure mode that gets the most attention.
The Campbell-Fiske multitrait-multimethod (MTMM) matrix was designed exactly to disentangle this. The matrix measures multiple traits using multiple methods (self-report, observer report, behavioral) and looks for the pattern: same-trait correlations should exceed different-trait correlations across methods. This is the cleanest test of convergent validity available, and it is rarely run because designing a multi-method study is expensive enough that most instrument-development work skips it.
Where It Sits in the Validation Sequence
The standard validation sequence treats convergent evidence as an early-stage test:
- Construct definition is established — the construct’s scope, boundaries, and relationships to neighboring constructs are specified.
- Item development produces candidate items that reflect the construct.
- Factor analysis (CFA or exploratory) tests whether the items group as the construct definition predicted.
- Internal consistency (Cronbach’s alpha, omega) is reported for the resulting scales.
- Convergent validity is tested by correlating the new measure with established measures of the same construct.
- Discriminant validity is tested by correlating the new measure with measures of theoretically distinct constructs.
- Criterion validity is tested by correlating the new measure with the outcomes it’s supposed to predict.
Convergent validity at step five is necessary but not sufficient. A new measure can converge strongly with existing measures of the same construct (passing step five) while diverging poorly from neighboring constructs (failing step six) and predicting nothing useful (failing step seven). The publishable evidence base looks fine; the operational use case doesn’t survive the field test.
The reverse pattern is also common: a new measure passes discriminant evidence but fails convergent — it’s clearly distinct from neighboring constructs but doesn’t correlate as expected with same-construct measures. This usually means the construct definition has drifted between the new and established measures. The new measure isn’t bad; it’s measuring something different than it claims to be measuring.
Common Reporting Failures
Two reporting patterns make convergent-validity evidence harder to interpret than it should be:
Reporting only the highest correlation. A new measure is correlated with five existing measures of the same construct, and only the highest correlation (say, 0.62) is reported. The lower correlations (0.35, 0.41, 0.48, 0.55) are present in the data but absent from the manual. The honest version reports all five and discusses why the convergence is variable — usually a story about which existing measures share the new measure’s construct boundaries and which don’t.
Reporting same-construct convergence without related-construct comparison. The new measure correlates 0.70 with the established same-construct measure. That’s the convergent finding. But the same study should report correlations with related-but-distinct constructs (0.30 to 0.50 territory) and with unrelated constructs (0.00 to 0.20 territory), and the relative magnitudes should match the construct definition’s predictions. A study that reports the same-construct correlation alone is reporting the convergent number out of context.
Using outdated benchmark measures. Convergent validity against a 1990s measure of organizational citizenship behavior may or may not be the right benchmark today, because the citizenship-behavior literature has evolved (and the measure has been critiqued). The convergent correlation is technically reported, but the benchmark is no longer the consensus operationalization of the construct, and the convergence finding doesn’t carry the weight it would have a generation ago.
Why It Matters Less Than It Looks
Convergent validity is often presented as the validity finding in instrument development papers because it’s the easiest to obtain and the most intuitive to explain. A 0.65 correlation between the new measure and the established one looks like solid evidence, and stakeholders read it as the test passing.
What convergent validity actually shows is that the new measure isn’t unrelated to the established measure of the same construct. It doesn’t show that the new measure is better than the established measure for any use case, that it adds incremental predictive value, or that it would survive field deployment in a high-stakes setting. Those are different questions, answered by criterion validity, incremental-validity studies, and operational pilot programs respectively.
The instruments I’ve worked on at Gyfted report convergent evidence as part of the construct-validation package, but the weight given to it is calibrated against the rest of the evidence. A measure with strong convergent validity, strong discriminant validity, and weak criterion validity is operationally weaker than a measure with mediocre convergent validity, mediocre discriminant validity, and strong criterion validity — even though the first looks more impressive on a technical-manual table. Convergent validity is a necessary input to the validity case, not the case itself.