Reading Comprehension Is Not a Skill
The Case for Vocabulary Instruction as an Engine of Comprehension and Equity
I taught English and reading comprehension for eighteen years. One thing I learned slowly, and against the grain of almost everything I was trained to do, is that when a student cannot grasp the main idea of a passage, the problem is almost never that they lack a “strategy.” The problem is that they do not understand enough of the words. Not a missing mental procedure, not a deficit in “inferencing” skills, not an inability to “think critically.” They simply do not know what enough of the words mean, and everything downstream from that is a system optimising for the wrong variable.
One of the more counterintuitive things about large language models is that the prompt contains almost none of the knowledge required to generate the response. A few words typed into ChatGPT can produce paragraphs of detailed, coherent text, not because the prompt contained that information but because the model has already absorbed vast amounts of language and knowledge during training. The prompt activates patterns that already exist inside the system. Remove that prior knowledge and the same prompt produces nothing at all.
Reading works in much the same way. A text does not carry its meaning fully formed inside it. It relies on knowledge that already exists in the reader’s mind: vocabulary, background information, conceptual frameworks built over years of experience. When those structures are present, meaning emerges quickly and almost effortlessly. When they are absent, the words remain visible but the understanding never arrives.
The parallel is not merely metaphorical. In cognitive psychology, Walter Kintsch’s construction–integration model describes comprehension as a process in which a text activates a network of possible meanings drawn from the reader's prior knowledge; the mind then settles on the interpretations that best fit. Large language models operate in a very similar way: a prompt activates learned associations and the system converges on the most coherent continuation. In both cases, the crucial ingredient is not the input but the knowledge already present in the system.
“Teach big words to little kids, and do it interestingly.”
In education, the idea that reading comprehension is a teachable, generic skill that can be applied to any given text is an error, but a natural one, which is partly what made it so durable. We observed what comprehension failure looked like: students who could not identify the main idea, who could not draw inferences, who could not synthesise information across paragraphs. And then we did something that felt entirely logical but was, in retrospect, a category error: we turned the description of the failure into the curriculum. We taught “main idea” identification, inferencing, synthesis etc. We built an entire instructional architecture from the symptoms of the problem rather than from its cause. The description of what students could not do was mistaken for an explanation of why they could not do it.
For me, the deeper reason why strategy instruction fails to transfer is that what we call “strategies” are really meta-labels for underlying linguistic principles. You cannot summarise a passage if you do not understand the semantic relationships between its sentences. You cannot infer an author’s purpose if you do not know what half the words mean. You cannot “find the main idea” if the main idea is expressed in vocabulary that might as well be a foreign language. The strategy is downstream of the knowledge, not upstream of it. It is like teaching someone the rules of chess on a board where half the pieces are invisible. The rules are not the problem. The missing pieces are.
One of the most powerful vehicles for helping students understand a text and also one of the most misunderstood, is vocabulary instruction. This point is brilliantly made by Isabel Beck who says that what we should do about this is “Teach big words to little kids, and do it interestingly.” The phrase is deceptively simple. What it recognises is that comprehension depends on the language students bring to a text. The size of a child's vocabulary is not a measure of how many words they have memorised but of how much of the world they have been taught.
What a New Study Reveals
A new meta-analysis on reading comprehension that reexamined and reanalysed studies from three previous meta-analyses, breaking the results down by a distinction that turns out to be decisive: standardised assessments versus researcher-created measures. The headline finding is stark. Nearly all effect sizes shrank substantially when measured on standardised tests. Cognitive strategies, the bread and butter of comprehension instruction for the past thirty years, produced a weighted effect size of just 0.09. Metacognition strategies showed no significant effects at all. For anyone who has sat in a professional development session being told that teaching students to “learn about learning” or “monitor their own thinking” is the key to comprehension, this should be a sobering corrective.
However, a surprising finding is that content instruction appeared to show no effect either. But this is where the picture requires a second, harder look. Olivia Mullins has published a detailed response arguing that the collection of papers on content knowledge and comprehension included in Hansford et al. are simply not an appropriate set from which to draw conclusions about knowledge-building. Critical papers were excluded. Some included papers had dubious methodology. The data, she argues, needs to be taken in context, because the broader research base tells a rather different story.
The meta-analysis told us something important about strategies. But its treatment of content and knowledge instruction is where we need Mullins’ corrective. And that corrective, as it turns out, reframes something much larger than a single study.
But What About Fluency?
There is an obvious objection here, and it deserves to be taken seriously. Reading comprehension does not rest on vocabulary alone. Fluency matters: the speed and accuracy with which a reader can process text. If decoding is slow and effortful, working memory is consumed by the mechanics of turning print into language, and there is nothing left over for meaning. Hirsch put the problem precisely in 2003: if decoding does not happen quickly, the decoded material will be forgotten before it is understood. Anyone who has tried to follow a film in a language they half know will recognise this immediately; by the time you have translated one sentence, the next has already passed, and the connections between them dissolve.
But here is what makes fluency an ally of the vocabulary argument rather than a rival to it. Fluency is not purely a decoding phenomenon. It is also a knowledge phenomenon. A reader who knows a domain reads that domain faster, not because their eyes move differently but because their mind can chunk familiar information into larger units. This is the lesson of de Groot’s famous chess experiment: grand masters can reconstruct a mid-game board from a five-second glance, not because they possess superior memory but because they recognise patterns that novices cannot see. Arrange the pieces randomly and the advantage vanishes. The “skill” was knowledge all along.
Hirsch draws the analogy to reading directly: word knowledge speeds up word recognition, and world knowledge speeds up comprehension of textual meaning by offering a foundation for making inferences. The three factors, fluency, vocabulary, and domain knowledge, are not independent columns holding up the same roof. They are the same column, seen from different angles.
Spencer, Quinn, and Wagner confirmed this at enormous scale. They tested 425,000 children from first through third grade on decoding, vocabulary, and comprehension. Among those who had adequate decoding and adequate vocabulary, fewer than one per cent scored poorly on comprehension. Fewer than one in a hundred. Once the word-level problems are solved, the decoding and the vocabulary, comprehension takes care of itself in virtually every case. The finding does not diminish the importance of fluency. It clarifies its place: fluency is necessary but not sufficient, and in the presence of adequate vocabulary, it is almost always enough.
However the bottleneck, for the vast majority of struggling readers, is not the speed at which they process words. It is whether they know what those words mean. And even within Hansford’s own data, Willingham’s observation holds: students appear to gain all the benefits of strategy instruction after approximately ten hours, and increasing instructional time by as much as 400 per cent produces no further gains. Rosenshine found that spending six classes on teaching comprehension skills had the same effect on reading comprehension as spending twenty-five.
Strategies are not a deep competence that rewards sustained practice; they are a thin procedural layer that maxes out almost immediately. When you strip away the researcher-created measures, the flattering proximal tests, the assessments designed to detect exactly what was taught, strategy instruction produces almost nothing on the measures that matter.
Knowledge-Building Is Not an Intervention
But the meta-analysis did something else, something I think it perhaps did not intend. In its treatment of content instruction, it revealed a methodological bias so deep that it calls into question how we evaluate knowledge-building interventions altogether. This is where Mullins’ critique becomes essential. Her central argument deserves to be quoted at length:
“Knowledge building is not a short intervention. In fact, it’s not an intervention at all. It’s a gradual, cumulative, lifelong process. We already understand that short bursts of content instruction will not affect general reading comprehension.”
I think this is a hugely important point and a big weakness in the science of learning as a field, which sees everything as an intervention but in a domain where the signal to noise ratio is often imperceptible. The entire architecture of the randomised controlled trial assumes that “content instruction” is a sort of treatment: something that can be administered in a controlled dose, measured against a control group, and evaluated after a fixed period. But knowledge does not work like a treatment. It compounds. It accretes. It builds the very architecture through which future comprehension becomes possible. To measure it as though it were a pill is to fundamentally misunderstand the mechanism.
Barbara Oakley has recently argued for what she calls "cognitive realism": the principle that there are facts about how brains encode, consolidate, and retrieve information, and that any instructional framework which cannot be falsified by those facts is not a framework but an ideology. The deeper problem is categorical. Knowledge-building is not a strategy; it is what I have elsewhere called an instructional invariant: a condition, derived from how human cognition actually works, that must hold true continuously for learning to function. Invariants are not treatments to be tested; they are constraints to be maintained. Asking whether knowledge-building 'works' in a six-week trial is like testing whether foundations 'work' by building a house without them and checking if it falls down in the first month. The house might stand for six weeks. But it won’t stand for six years.
Mullins identifies ten reasons to look beyond the meta-analysis, and several of them are damning. The content studies ranged from three days to one school year; averaging across these is like averaging the temperature of ice and boiling water and calling it warm. Important studies were excluded, including the Romance and Vitale work, perhaps the cleanest quasi-experimental studies we have, which produced an effect size of 0.56 after a full year of integrated science and literacy instruction. Many included studies used student-led approaches unlikely to build knowledge effectively. But here is the critical detail: even within Hansford’s own data, there was a significant positive correlation (r = 0.43) between instruction duration and effect size for content instruction. Content instruction was the only treatment where this relationship was found. For every other approach, including strategies, longer duration was associated with smaller effects. Knowledge-building was the only intervention that got stronger the longer it ran. This single finding should have been the headline of the paper.
And what happens when we let that compound interest run? As Natalie Wexler pointed out, a rigorous study of more than 2,000 students using Core Knowledge Language Arts, a curriculum that systematically builds knowledge from kindergarten, found that after four years of implementation, students from low-income families matched their higher-income peers on state reading comprehension tests. Four years. Not six weeks. The knowledge-building did not merely help; it closed the gap entirely. But you would never see that result in a study that measured outcomes after a single term.
Drops in an Ocean
But there is a complication, and it is one I am not entitled to avoid, because I believed it myself for years. If the problem is vocabulary, then surely the solution is to teach more vocabulary: pre-teach the key words before each text, drill Tier 2 lists, front-load definitions. This is the naive version of the argument, and it is also wrong.
Direct vocabulary instruction produces meaningful effects on comprehension of texts containing the taught words. This much is well established. But Cervetti and colleagues found in 2023 that direct word teaching does not reliably generalise to untaught vocabulary breadth. You cannot simply pre-teach twenty words per week and expect comprehension to blossom across all texts. The words you teach help students read the passages that contain those specific words. They do not, on their own, build the kind of broad vocabulary knowledge that underpins general reading comprehension.
The threshold research explains why this matters so acutely. Laufer found in 1989 that students need 95 to 98 per cent known-word coverage for comfortable comprehension. Hu and Nation raised that estimate in 2000 to 98 to 99 per cent for unassisted reading. When a student encounters a passage where even five per cent of the words are unknown, the signal-to-noise ratio is too low for any strategy to rescue meaning. The text becomes opaque. Not difficult, not challenging, but genuinely impenetrable.
But here is the arithmetic that should trouble us. If a student needs to know 98 per cent of the words in a passage, and they are encountering academic texts with thousands of distinct word forms, and direct instruction of individual words does not generalise beyond the words taught, then the gap cannot be closed one word at a time. Schools teaching 300 to 400 Tier 2 words per year through explicit instruction are adding drops to an ocean. The scale of the problem dwarfs the scale of the solution.
So the question becomes: is there an approach to vocabulary instruction that gives us a better bang for our buck? which is multiplicative rather than additive? One that does not just teach individual words but teaches the generative logic from which families of words become comprehensible?
Words Are the Visible Surface of Knowledge
This is where vocabulary instruction and knowledge-building converge, and where a much more cohesive theory of how reading actually develops begins to emerge.
Vocabulary does not live in word lists. It lives in schemas, in morphological families, in the accumulated residue of having learned about the world. To teach words well is to teach the content from which they derive their meaning. Beck and McKeown’s framework calls for six to twelve robust encounters before a word enters long-term usable vocabulary. Not flashcard reviews; encounters in context where the word does real semantic work. This only happens within content-rich curricula, across time, through accumulation.
But it can also happen through morphology. Nagy and Anderson estimated in 1984 that roughly 60 per cent of the new words students encounter in print have meanings that can be predicted from their morphological parts. Sixty per cent. That is not a marginal gain; it is the difference between a closed system and an open one. Lyn Stone, a linguist and literacy specialist, puts the principle sharply in Reading for Life: it would be impossible to teach the definition of every single word in English, but that does not mean we cannot teach children to be generative. Knowing how words work, having an understanding of a core of prefixes, roots, and suffixes, helps children generate the meaning of new words.
So if the arithmetic of direct vocabulary instruction is so dispiriting, the question becomes whether there is an approach that teaches not just individual words but the generative logic from which families of words become comprehensible. Not 400 words a year, but the structural principles that unlock thousands. This is where the work of Sean Morrisey and the Word Mapping Project offers something that the research literature demands but rarely provides: a concrete, classroom-tested illustration of what generative vocabulary instruction actually looks like.
Morrisey’s approach centres on morphology matrices. Take the base sect, from Latin, meaning “cut.” Arrange its combinatorial possibilities: prefixes on the left (bi-, dis-, in-, inter-, mid-, sub-), suffixes on the right (-ion, -or, -ed, -ing, -cide, -vore, -ous, -al). Now give students the matrix and ask them to build words.
What emerges is not a vocabulary list but a generative system. From a single base, properly understood, students can construct bisect, dissect, insect, intersection, midsection, subsection, insecticide, insectivorous, and dozens more. They are not memorising words. They are learning the code from which words are made.
The pedagogical architecture is as important as the morphological content. Students begin by investigating the matrix: working alone or with a partner, building as many words as possible, writing word sums on the left (bi + sect + ion) and the whole word on the right (bisection). This is retrieval-generative practice, not passive reception. The teacher lesson includes full morphological breakdowns with definitions rooted in the base meaning: bi + sect = to cut into two parts; dis + sect = to cut apart; in + sect = literally “cut into,” a segmented creature. The etymology is not decoration. It is the explanatory mechanism. Students learn why words mean what they mean, not just that they do.
Extended vocabulary passages then group semantically related words into clusters. Not isolated definitions but constellations of near-synonyms that force discrimination. One passage, for instance, presents arduous, strenuous, laborious, gruelling, and demanding alongside effortless, smooth, unchallenging, painless, and seamless. Students partner up, reread the passage two to three times, then work on questions that require distinguishing nuance between words that share a broad category but differ in shade and weight.
This for me is Beck and McKeown’s “robust encounters” principle made concrete: words appear in context, doing real semantic work, and students must think about the differences between them, not just their shared meaning.
On subsequent days, retrieval takes unexpected forms. “Would you rather” questions: Would you rather diminish the amount of broccoli on your plate or diminish the number of commercials in a show? These are low-stakes, high-engagement, and they force students to use the word in a new context, exactly the kind of productive use that moves vocabulary from receptive recognition to active command. Reader’s theater scripts feature characters named after vocabulary words: Stricken, Protected, Bickering, Accord. The words become agents in a story. Students cannot perform the scene without internalising what the words mean.
But the finding that matters most, the one that answers the Cervetti problem, is this: Morrisey’s standardised spelling test data (Test of Written Spelling, 5th Edition) showed substantial growth over just four months, and crucially, on words that were not specifically taught.
This is the transfer that Cervetti’s research says does not happen with standard direct vocabulary instruction. Morphological instruction generates transfer because it teaches the underlying code, not individual items. A student who understands that sect means “cut” and that -ion makes a noun does not need to be explicitly taught dissection. They can construct its meaning from its parts. This is the difference between teaching a student a hundred fish and teaching them the structure of aquatic ecosystems.
But What About Critical Thinking?
There is one more thing that needs to be said, because the argument above invites a misreading I want to anticipate. Nothing here suggests that inference, critical analysis, or the capacity to weigh competing perspectives are unimportant. They are profoundly important. But my claim is that they are dependent: they require a substrate of organised knowledge before they can function. You cannot think critically about the causes of the First World War if you do not know what the causes of the First World War were. You cannot weigh both sides of an argument about gene editing if the word genome is a stranger to you. The higher-order thinking we rightly value is not an alternative to knowledge; it is what knowledge makes possible.
Nor does the argument reduce to “teach more facts.” Isolated facts are almost as instructionally inert as isolated strategies. What matters is connected knowledge: words understood in relation to other words, concepts organised into schemas, information structured so that new learning has something to attach to. This is why morphological instruction works where word lists do not, and why content-rich curricula outperform topic-of-the-week approaches. The goal is not to fill children’s heads with disconnected information. It is to build the architecture of understanding, so that when they are finally asked to infer, to analyse, to synthesise, they have something to infer from.
If higher-order thinking depends on knowledge, and knowledge depends on vocabulary, then the students with the smallest vocabularies are not merely behind in reading. They are being locked out of the very capacities we claim to value most: the ability to reason, to question, to see a problem from more than one angle. A child who cannot access the language of a debate cannot participate in it. A child who does not know what the words mean cannot think critically about what they say. The vocabulary gap is not just a literacy problem. It is, at bottom, an equity problem, and it is one that begins earlier and compounds faster than most people realise.
Running the Same Race From a Different Starting Line
Which brings me finally to the students for whom all of this matters most. Hart and Risley found in 1995 that vocabulary exposure gaps exceed thirty million words between socioeconomic groups by age three. The vocabulary gap is visible by the end of Grade 2, often exceeding 3,000 root words between the highest and lowest performing groups. After that point, low-vocabulary children may grow at the same rate as their peers, but they are running the same race from a different starting line.
Before age ten, more than 80 per cent of words are learned from language experiences in context (e.g. explanations from parents, being read to, conversation, etc) rather than from inference from reading. The school curriculum is their primary source of the background knowledge and word meanings that more advantaged children absorb at home.
The sociologist James S. Coleman, after spending a career examining the characteristics of effective schools, concluded that the most important feature of a good school programme is that it makes good academic use of school time. The reason is as simple as it is consequential: for disadvantaged children, school time is the only academic learning time, whereas advantaged students learn a great deal outside of school. A good programme is inherently compensatory, because it has a bigger effect on those who depend on it most. Coleman’s principle reframes the vocabulary gap not as a problem of ability but as a problem of exposure, and it places the responsibility squarely on what schools choose to do with the hours they have.
The point that should keep curriculum designers awake at night is that even if content instruction truly had no effect on comprehension, which the evidence does not show, knowledge-building curricula would still be the right choice. Integrated content instruction does not hurt comprehension. It does improve content knowledge, vocabulary, and writing. There is simply no competing coherent argument for scattered, content-poor approaches to literacy instruction.
The students most harmed by the strategy-heavy, knowledge-light approach are the ones who arrive at school with the smallest vocabularies. They are the ones for whom the school curriculum is not a supplement but a lifeline: the primary means by which they will acquire the words and the knowledge that make comprehension possible. To deny them systematic, cumulative, content-rich instruction, instruction that builds morphological knowledge, that teaches the generative logic of language, that treats vocabulary as knowledge rather than a checklist, is to treat a knowledge deficit as a skills deficit.







This is a dynamite article with profound implications for schooling and teacher preparation.
Really appreciated this analysis, especially the point that strategy instruction may be targeting symptoms rather than causes. The discussion of vocabulary thresholds and knowledge-building is crucial.
One thing I wanted to add: the matrices shown in the article (generated with Mini-Matrix Maker and expressed through word sums) come directly from Structured Word Inquiry (SWI), developed through the work of Peter Bowers and the WordWorks community. Those tools were specifically designed to help students see the generative structure of English spelling through morphology, etymology, and phonology.
It’s encouraging to see matrices appearing in broader discussions of comprehension research, because they offer a concrete way to move from teaching individual vocabulary items to teaching the structure of word families. If we want vocabulary instruction to scale beyond a few hundred words a year, understanding the combinatorial structure of the writing system seems essential.