The Monthly Dispatch - What's New in Learning Science? - March 2026
New studies on retrieval practice, seductive details, engagement myths, reading instruction, and the cognitive costs of modern technology
Memory, Learning, and Knowledge
This study examined whether the well established laboratory finding that spacing retrieval improves long term retention actually generalises across real university STEM courses. The researchers embedded spaced versus massed retrieval opportunities into bi weekly quizzes across eight introductory STEM courses, and then examined performance on a final criterial test. A ninth course, calculus, had been published previously but was included in a combined analysis. The headline finding is mixed: significant spacing benefits emerged clearly in calculus and in Chemistry for Health Professionals, small positive effects appeared in a few other courses, but most individual courses did not show statistically significant gains. When all nine courses were meta analysed together, the average benefit of spacing was small, around two to three percentage points, and highly variable across subjects.
Spacing may help in some STEM contexts, especially where procedures and integration across topics matter, but it is not guaranteed to produce gains everywhere. The authors raise important practical moderators: the nature of the questions, the feedback provided, the extent of initial learning, and how students engage outside quizzes. In short, spaced retrieval works beautifully in the lab. In real STEM classrooms, the picture is messier.
Not all revision is equal: testing beats restudy for retention. Students may dislike practice tests, but they work better than re-reading.
In authentic high-stakes exam preparation, testing only improves later performance when it includes feedback, and even then it is no better than focused restudy in the short term. This paper reports three experiments with Chinese teacher-training students preparing for a real, high-stakes educational psychology exam. Students revised key course concepts using one of four strategies: testing with feedback, testing without feedback, restudy, or an unrelated control task. Two experiments took place shortly before a high-stakes course exam, while a third occurred after the exam in a low-stakes context. Crucially, the practice materials and the final assessments used different questions, meaning any gains depended on transfer rather than simple item repetition.
Main findings: In high-stakes contexts, both testing with feedback and restudy produced similar immediate gains. However, only testing with feedback reliably transferred to better performance on the actual course exam days later. Testing without feedback was largely ineffective, and none of the strategies worked at all in low-stakes conditions. For teachers, the message is uncomfortable but clear: without feedback and without genuine stakes, its benefits largely disappear. If the goal is short-term exam performance on varied questions, structured restudy can work just as well as quizzing.
Instruction and Practice
Can we really measure student engagement?
This study set out to validate the Secondary School Student Engagement in Learning Activities Short Scale, a 12 item self report tool designed specifically to measure engagement in learning activities rather than broader school belonging. Drawing on self determination theory, the scale measures four dimensions: cognitive engagement, affective engagement, behavioural engagement, and agentic engagement. Using a sample of 566 Portuguese secondary students, the authors conducted confirmatory factor analysis and found good model fit for the four factor structure. The scale showed acceptable reliability overall, strong concurrent validity with an existing engagement measure, and some predictive validity for competence, relatedness, self reported achievement, and Portuguese grades.
Main takeaway is that engagement is not one thing. Students can be thinking deeply, enjoying learning, putting in effort, and actively shaping instruction in different combinations. The inclusion of agentic engagement is particularly important. Students who ask questions, express preferences, and influence their learning environment appear to gain psychological benefits beyond effort and enjoyment alone. The scale is short enough to be used for monitoring trends or identifying at risk students, but it is not a diagnostic instrument. It gives a snapshot, not a full psychological profile.
Does this paper address Rob Coe’s claim that engagement is a poor proxy indicator of learning? Well this paper is a measurement validation study, not a causal test of whether engagement leads to learning. It shows that four forms of self reported engagement are associated with self reported achievement and Portuguese grades. This paper neither proves nor disproves it. It shows engagement correlates with achievement, but it does not test whether engagement tracks durable knowledge acquisition.
For me the critique still stands: A student can look engaged and not be learning, and a student can look disengaged and still encode effectively. Observable engagement is not a guaranteed indicator of knowledge change.
Do decorative extras in lessons really help, or do they quietly get in the way of learning?
This meta analysis suggests that seductive details have a small but reliable negative effect on learning, mainly because they increase extraneous cognitive load rather than helping learners understand or transfer knowledge.
This study tackles a long running question in instructional design: what happens when we add interesting but irrelevant material to learning resources. These seductive details might be decorative pictures, entertaining anecdotes, background music, or eye catching but non essential content. Drawing on 177 effect sizes from 50 studies, the authors found a small overall negative effect on learning outcomes. That pattern held across recall, comprehension, and transfer, with the strongest negative effect for comprehension. In simple terms, the extras may make materials seem more appealing, but they tend to slightly reduce what people actually learn.
Reading and Language Development
This meta analysis set out to answer a deceptively simple question: which instructional approaches genuinely improve reading comprehension, particularly when measured using standardized assessments rather than researcher designed tests. That distinction matters because many interventions look effective when measured with tests closely aligned to the taught material, but the effect often shrinks when students are assessed using broader comprehension measures.
To answer this, the authors screened 1,557 studies drawn from ERIC, Education Source, and earlier meta analyses, eventually analysing 70 studies and 116 effect sizes that met strict inclusion criteria such as experimental or quasi experimental designs and sufficient statistical reporting.
The results show generally small average effects on standardized reading comprehension outcomes. Weighted effect sizes were modest for most approaches: vocabulary instruction around d = .11, content instruction d = .17, cognitive strategy instruction d = .21, and metacognitive strategies d = .06. Reciprocal teaching stood out with a larger effect of d = .45, though the evidence base for this approach was relatively limited.
One of the most interesting findings is that combinations of instructional approaches produced larger effects than individual pedagogies. For example, when vocabulary instruction was combined with content knowledge and cognitive strategy instruction, the effect size rose to around d = .51, suggesting a synergistic effect between these components.
Another key point is that instruction often improves comprehension of texts related to the taught material but does not reliably transfer to generalized reading comprehension across new texts. This transfer problem helps explain why effect sizes frequently shrink when standardized tests are used.
This study has proved to be somewhat controversial. Olivia Mullins and Natalie Wexler have written about this paper and I have also responded here:
Reading and writing appear to depend on the same underlying language system.
The study followed 261 children from kindergarten to Grade 2 in the United States to investigate the relationship between reading comprehension and written composition and the skills that explain that relationship.
Using confirmatory factor analysis and structural equation modelling, the author examined whether shared language and literacy skills account for the connection between reading and writing. The analysis showed that Grade 2 reading comprehension and writing quality were very strongly correlated (r = .82) but were still statistically distinct constructs.
Crucially, once oral discourse skills (listening comprehension and retell) and lexical literacy skills (word reading and spelling) were included in the model, the correlation between reading comprehension and writing almost disappeared. Handwriting fluency contributed to writing quality but not to reading comprehension.
For teachers, these findings imply that the common practice of teaching reading and writing in isolated silos may be inefficient. Since oral discourse and word-level skills (spelling/decoding) are the "pillars" supporting both domains, strengthening these foundations in the early years provides a double benefit. Teachers might focus more on integrated lessons where oral storytelling, vocabulary development, and spelling are used to bridge the gap between understanding a text and producing one, ensuring that children have the cognitive tools necessary to succeed in both.
Classrooms, Behaviour, and Attention
The paper reviews literature from the scholarship of teaching and learning (SoTL) and identifies three recurring ways researchers describe charismatic teachers. First, charisma is often portrayed as an uncultivable natural gift possessed by only a few individuals. Second, charismatic teachers are frequently associated with humour, which is assumed to increase engagement and approachability. Third, charisma is linked to high energy performance through expressive delivery, enthusiasm, and emotionally charged classroom interactions.
The author argues that each of these ideas is problematic. Treating charisma as a rare gift risks discouraging teachers who do not see themselves as naturally entertaining. Linking charisma to humour overlooks the risks of inappropriate or culturally misinterpreted jokes. And equating effective teaching with emotional intensity may place unrealistic expectations on teachers while privileging a performative style that is not necessary for learning. The paper ultimately suggests that education may benefit from shifting focus away from theatrical charisma.
This paper examines a persistent belief in education: that strict or controlling teaching may be necessary for students who lack motivation. The study approaches this question through the lens of Self-Determination Theory (SDT), which argues that motivation and engagement depend on the satisfaction of three psychological needs: autonomy, competence, and relatedness. Controlling teaching styles undermine these needs by pressuring students to behave, think, or learn in ways dictated by the teacher.
The researchers followed nearly 5,000 Chinese middle school students across a full academic year. Their goal was to understand how perceived controlling teaching influences learning outcomes and whether the impact differs depending on students’ motivation profiles.
The results suggest that students who perceived their teachers as more controlling showed lower behavioural engagement (participation and effort) and lower cognitive engagement (deep thinking, strategy use, and persistence). Those drops in engagement were in turn associated with lower academic achievement one year later.
A key contribution of the study is its focus on motivation profiles rather than average motivation levels. Using latent profile analysis, the authors identified four types of students: controlled motivation, moderately motivated, strongly motivated, and autonomous motivation. The researchers then tested whether the effects of controlling teaching differed between these groups.
The main finding is that the negative effect appeared across all motivation profiles. Even students with lower motivation did not benefit from controlling instruction. While the strength of some pathways varied slightly between groups, the direction of the effects was consistently negative. This directly challenges a common classroom assumption: that strict control might be necessary or effective for disengaged students.
SEN, Individual Differences and Inclusion
A large brain-imaging study suggests that common stimulant medications for ADHD (like methylphenidate/Ritalin and amphetamines/Adderall) do not primarily boost attention networks in the brain as previously assumed. Instead, these drugs seem to act on brain circuits that control alertness (wakefulness) and reward/motivation, helping people feel more awake and increasing the perceived value of tasks that they would normally find uninteresting.
This matters because it reframes what ADHD medication is actually doing. If stimulants mainly increase alertness and motivation rather than fixing attention itself, then poor sleep is not a side issue but a central factor. A tired brain will struggle to focus no matter what, and medication may simply be masking sleep deprivation rather than solving the underlying problem. For some children, getting consistent, high-quality sleep may therefore deliver many of the same benefits as medication, without the trade-offs, and should be treated as a first-order intervention rather than an afterthought.
This study finds that first graders cluster into three clear profiles of early fraction understanding, and those profiles meaningfully predict end of year mathematics achievement. The researchers examined 204 first graders at the start of the school year, focusing on early, mostly informal fraction knowledge across nonsymbolic and symbolic tasks. Using latent profile analysis, they identified three distinct groups. Profile 1 showed strong nonsymbolic fraction knowledge. Profile 2 was similarly strong but weaker specifically on nonsymbolic equivalence tasks. Profile 3 showed broadly limited fraction understanding across areas.
Crucially, these early profiles predicted later outcomes. By the end of first grade, children in Profiles 1 and 2 outperformed those in Profile 3 on mathematics achievement. Interestingly, the initial weakness in nonsymbolic equivalence shown by Profile 2 largely disappeared by year end. In other words, children who were strong on part whole relations and equal sharing but weaker on equivalence appeared to “catch up” in that specific area over time. For teachers, the message is nuanced: early strengths in equal sharing and part whole reasoning matter, and a specific early weakness in equivalence may not signal long term difficulty. However, broadly weak fraction knowledge at school entry is a more serious concern.
AI, Technology, and Cognitive Tools
AI amplifies the Matthew Effect: without strong foundational knowledge, inequality deepens.
A new Brookings report argues that artificial intelligence will not automatically transform education for the better or the worse. Its impact depends on how it is integrated into teaching and learning. AI has the potential to expand access, personalise instruction, support neurodivergent learners, and free teacher time. But under current conditions, the risks outweigh the benefits. The report warns that AI can weaken cognitive development through overreliance, erode trust between teachers and students, undermine assessment integrity, disrupt social and emotional growth, and deepen equity divides.
A central claim is that AI amplifies existing differences. Students with strong foundational knowledge can use AI to extend and refine their learning, while those without such knowledge are more likely to misinterpret outputs, outsource thinking, and fall further behind. The authors therefore call for a shift: embed AI within sound pedagogy, strengthen knowledge rich instruction, build holistic AI literacy across the curriculum, establish national guardrails, and prioritise child safety, privacy, and developmental needs.
How much damage does a single phone notification really do to thinking?
The researchers asked university students to complete a demanding Stroop task while realistic smartphone notifications appeared on screen. Crucially, the notifications mimicked real iPhone pop ups and varied in whether participants believed they were their own, generic, or visually blurred. Across conditions, notifications caused a clear and measurable slowdown in response speed lasting roughly seven seconds, even though accuracy mostly recovered more quickly. The disruption was strongest when notifications were believed to be personally relevant, weaker when they were generic, and weakest when they were visually degraded.
The study matters because it moves beyond artificial lab alerts and shows that ordinary notification cues hijack attention automatically. The effect does not require opening the phone or reading the message. Simply seeing the alert is enough to disrupt thinking, and this disruption stacks up across a day filled with interruptions.
This study, published in Neural Computation, examined whether adding self directed inner speech to an AI system with working memory would improve multitask generalisation. The researchers designed models with multiple working memory slots and trained them to “mumble” to themselves a specified number of times during task processing. They tested these systems on tasks of increasing difficulty, such as reversing sequences, reconstructing patterns, and switching between tasks. Models with multiple memory slots outperformed simpler ones, and performance improved further when structured inner speech was added. The gains were most pronounced in multitasking and multi step problems, and importantly the system reportedly required less training data to generalise.
Why does this matter? The core claim is that learning is shaped not only by architecture but by internal self interaction during training. In human terms, it suggests that structured self talk plus working memory scaffolding supports flexible problem solving. For educators, this resonates strongly with decades of research on metacognition, self explanation, and overt verbalisation. If AI systems benefit from explicit internal rehearsal and structured working memory supports, that reinforces the idea that prompting students to articulate reasoning and manage cognitive load is not superficial but central to flexible thinking.
Can humans still tell the difference between a child’s painting and an AI imitation of it?
The study tested a version of the Artistic Turing Test. Seventeen oil paintings by a single child served as the human baseline. For each one, the researchers crafted a detailed prompt and used DALL E 2 to generate a corresponding AI image. Eighty seven adults were shown the images in pairs and asked to decide which was human made.
Human accuracy was 49.76 percent, effectively no better than chance. Precision, recall, and F1 scores were all in the low fifties. By contrast, a ResNet 18 classifier using leave one out cross validation achieved 97.06 percent accuracy, with an AUC of 99.65 percent. Put simply, people could not reliably tell the difference, but a machine could.
The implication is not that AI possesses human creativity. It is that surface stylistic cues are no longer dependable signals of authorship. Framing the works as child art may also have led participants to interpret quirks and imperfections as authentic. Generative models have reached a level of perceptual plausibility that challenges visual judgement, even if deeper questions about meaning and intention remain open.
This systematic review of 43 studies concludes that AI is widely associated with innovation and pedagogical change in higher education, but the evidence base mainly describes potential benefits rather than demonstrating clear causal improvements in learning outcomes.
This study set out to synthesise research on the use of artificial intelligence in higher education between 2021 and 2025. The authors reviewed 43 articles retrieved from Scopus, ScienceDirect, and Google Scholar and grouped them into three main themes: AI driven learning innovation, pedagogical transformation, and implementation challenges. The review argues that AI systems can support personalised learning, automated feedback, predictive analytics for identifying struggling students, and generative tools such as ChatGPT that assist with writing and content generation. Across the literature, AI is portrayed as a driver of digital transformation in universities that may reshape teaching practices and institutional decision making.
For educators and institutions, the key message is that AI appears to change the structure of teaching rather than simply adding new tools. Teachers increasingly act as designers, facilitators, and interpreters of learning analytics while AI systems automate routine tasks such as grading or content generation. However, the literature also emphasises concerns about privacy, algorithmic bias, digital inequality, and teacher readiness. The practical implication is that adopting AI in universities requires not only technological infrastructure but also ethical governance, staff training, and careful integration into existing pedagogical practices.
Housekeeping
Listen to my conversation with the brilliant Ian Leslie here
The Dutch Version of Instructional Illusions is out now. Just got my copy and what a lovely cover is is too.
I have rescheduled my Australia dates that were in March until May. For information on all the upcoming dates in May, check my website.
I’ll be at three upcoming researchED events; Tokyo on April 18th, Warrington on 25th April and Crieff on the 9th May.
Finally, I have written a review of Jared Cooney Horvath’s new book the Digital Delusion.






Just what is a "real reading test"?
I'll leave this rhetorical question here, just to point out that there are so many stupid assumptions packed in that small phrase.
The finding on spaced retrieval is one of the most important reality checks I've seen in a while - and the honest framing of 'sometimes, and we don't yet fully know when' is exactly the kind of epistemic humility the field needs more of. It resonates with something I've been exploring in my own work around how we translate cognitive science into classroom practice: the lab-to-classroom gap is rarely acknowledged openly enough, and practitioners end up feeling like they're failing when the 'evidence-based' strategy doesn't perform as promised. The moderators the authors raise - nature of questions, extent of initial learning, how students engage outside quizzes - feel like the real research agenda hiding inside this paper. I'm curious whether you think the variability across subjects points more to a problem with implementation fidelity or with the underlying theory's scope conditions?