The Research Brief: What's New in Learning Science - September 2025
A round-up of new evidence from cognitive science on learning, teaching and educational practice for educators
This month's round-up features studies on mobile phone bans in classrooms, belonging in schools, the practice testing paradox among struggling students, digital versus paper reading comprehension, gaps in teacher knowledge of learning science, flaws in mastery learning models, bias against AI-generated content, video versus text for learning, the relationship between anxiety and reading performance, sleep's role in memory consolidation, and overconfidence following introductory courses.
A randomised controlled trial of nearly 17,000 students across 10 higher education institutions found that mandatory classroom phone collection increased academic performance by 0.086 standard deviations, with the strongest effects amongst lower-performing, first-year, and non-STEM students. These gains were not due to increased attendance but likely stemmed from reduced distraction and improved classroom focus.
Classroom spot checks showed fewer disruptions, more engaged teachers, and fewer off-task behaviours, suggesting that removing phones fosters a better learning environment overall. Interestingly, while students felt a slight increase in FOMO, this did not translate into poorer well-being or reduced motivation. Teachers considering phone bans can therefore do so confidently, knowing that the short-term discomfort for students may be offset by improved focus, achievement, and even student support for the policy in the long term.
This robust study addresses a critical paradox in education: despite overwhelming evidence that practice testing enhances learning, students who need it most often use it least. The researchers employed sophisticated methodology, combining survey data, learning analytics, and prediction rule ensembles to examine both who benefits from and who engages with practice testing in an authentic gateway mathematics course.
The findings reveal troubling inequities alongside promising possibilities. Practice testing demonstrated clear benefits; each additional quiz attempt increased exam scores by 0.21 standard deviations and improved passing rates by 10 percentage points. However, the benefits were not equally distributed. Students with stronger prior achievement (higher secondary school grades) both engaged more frequently with practice testing and derived greater benefits from it. Lower-achieving students required substantially more practice attempts to achieve similar gains, needing five or more quiz attempts compared to three for their higher-achieving peers. Critically, students needed to achieve at least 35% accuracy on quizzes to realise meaningful benefits, suggesting a threshold effect rather than universal applicability.
Do struggling readers fare worse with digital texts? New research suggests not—but their overconfidence rises with screen use.
This research explored how 10–12-year-olds with different reading comprehension abilities (good, average, poor) performed on two reading tasks (a cloze task (fill-in-the-gap) and a proofreading task) presented in both paper and digital formats. Crucially, the study found that reading comprehension ability strongly predicted performance across all formats, but the format itself (digital vs paper) did not affect scores. However, digital tasks took longer to complete and led to more metacognitive bias—students thought they performed better than they actually did, especially during proofreading. These effects were more pronounced in younger pupils and lessened with age.
A study of 107 university faculty found that while most could identify evidence-based teaching strategies, many also endorsed debunked learning myths, showing limited pedagogical knowledge and poor awareness of their own misconceptions—issues present even among education faculty.
This research assessed faculty’s ability to distinguish between well-established learning principles, such as retrieval practice, spacing, direct instruction, summarisation, and dual coding, and common misconceptions, including learning styles, pure discovery learning, multitasking, digital natives, and the overuse of extrinsic motivation. On average, faculty scored just above 6 out of 10, correctly recognising most effective practices but also endorsing myths at high rates. Particularly striking was the finding that education faculty, whose discipline focuses on pedagogy, performed no better than colleagues from other fields, with two-thirds believing in learning styles and discovery learning. More senior academics tended to be better at rejecting myths than newer faculty, suggesting that experience may counter some misconceptions—but across the board, myths persisted.
Perhaps most concerning was the lack of metacognitive awareness: faculty confidence in their pedagogical knowledge bore no relationship to their actual understanding. Many believed their teaching was guided by strong knowledge of learning science, yet their responses revealed gaps that could limit instructional effectiveness. For educators, the takeaway is clear: confidence should be matched with continual engagement with robust research, and professional development must actively address neuromyths. Schools, universities, and training programmes should integrate cognitive science explicitly into teacher and lecturer preparation, not as an optional extra but as a core component, ensuring that future educators can critically evaluate claims about learning and avoid embedding ineffective strategies in their practice.
I’ve written about this previously but the fact that 50% of teachers believe in nonsense like Brain Gym is so dispiriting and the SEN-related myths (dyslexia, deafness, ADHD) are especially worrying.
The authors created and validated the Educational Neuroscience Knowledge Test (ENKT), which blends endorsement of “neuro-facts” with rejection of common neuromyths across two domains: general cognitive functions (memory, attention, executive function, neuroplasticity) and special educational needs (e.g., dyslexia, ADHD, autism). In a UK sample of 366 qualified teachers, performance was significantly above chance, but the strongest predictor of higher scores was formal educational neuroscience training (undergraduate/postgraduate modules). Teachers with formal training outperformed those with only CPD, informal exposure (books/blogs/videos), or no exposure; experience (years teaching) showed no relationship with knowledge.
In terms of professional development, schools shouldn’t assume experience inoculates against neuromyths and should prioritise structured, training that covers core cognitive principles and SEN-relevant knowledge, ideally embedded in initial teacher training and then deepened through high-quality CPD. Second, treat informal sources with caution: they can be valuable but are uneven and risk perpetuating myths; training should explicitly teach critical appraisal skills (how to spot weak claims), and connect principles (e.g., working memory limits, spacing, retrieval) to day-to-day planning, scaffolding, and assessment.
Across six sessions of students learning Lithuanian–English word pairs, the authors pitted three popular student‑modelling approaches; Bayesian Knowledge Tracing (BKT), BKT with a forgetting parameter, and the Additive Factors Model (AFM)—against what actually happened to learners over weeks. If you fit the models to all the data after the fact, they broadly mimic the upward learning curve (AUC≈0.74–0.79). But when used the way schools would really want—predicting how pupils will do next week based on this week—each model badly overestimates performance and fails to reflect the spacing advantage (wider lags leading to better delayed recall). In their one‑week “walk‑forward” test, the models over‑predicted Session‑2 scores by roughly 47–58% and got the ordering of spacing conditions wrong.
For teachers and leaders, the takeaway is practical: don’t equate today’s smooth performance with tomorrow’s durable learning, and don’t assume the “mastery” a platform reports will stick. Tools that don’t explicitly model time, spacing, and retrieval can be blind to what really matters: long‑term retention between sessions. If you’re using adaptive software, press for (a) time‑based cross‑validation in vendor evidence, (b) spacing‑aware scheduling, and (c) successive relearning cycles (initial retrieval to a criterion, then spaced top‑ups). Build schemes of work that mandate spaced retrieval across weeks, even if short‑term quiz scores dip, because that pattern better matches how memory actually works.
This study ran two experiments using identical Netflix show descriptions, changing only the “authorship” label—AI, human, or human–AI collaboration. Across both studies, human-authored work was rated highest in creativity and likeability, followed by human–AI collaboration, and then AI alone. The difference was explained by a chain effect: when people think more effort went in, they rate creativity higher, which in turn boosts overall attitude. Interestingly, people with stronger “machine heuristic” beliefs (trusting in machine capabilities) were less prone to discount AI’s effort or creativity, and so showed less bias against AI authorship.
This study by Yang Lin investigates how using video recordings of lessons, alongside traditional live observations, can improve the reflective and collaborative aspects of lesson study (LS). A group of Year 4 maths teachers conducted two LS cycles: one traditional and one that included video analysis. The findings showed that in the video-based model, teachers engaged in richer, more exploratory dialogue, questioned each other’s assumptions more critically, and focused more sharply on student learning rather than teacher performance or vague impressions. Video allowed for repeated viewing, selective focus, and deeper individual analysis before group discussion, all of which led to higher-quality professional conversations.
Across 72 participants (36 adults, 36 adolescents) the researchers dictated words and pseudowords containing Italian digraphs (e.g., sc, ch, gl, gn) and measured the millisecond gaps between letters (inter-letter intervals, ILIs) in both handwriting (on a tablet) and typing (on a keyboard). In both age groups and both modalities, the pause before the digraph (i.e., at the syllable boundary) was the longest—evidence that writers “pre-plan” the upcoming complex chunk. In handwriting, pauses then sped up through and after the digraph, suggesting a more cohesive, syllable-level plan; in typing, that speeding didn’t appear—post-digraph timing stayed flat—implying a more discrete, key-by-key approach. Adolescents, who were less practised at typing, showed bigger slow-downs, especially when initiating the complex syllable. The impact of orthograp…
This study shows that while feedback guidance or reward expectation alone can encourage students to choose retrieval practice, their combination leads to stronger and longer-lasting adoption of the strategy. The research highlights a familiar challenge for teachers: students may know that retrieval practice is effective, yet still fail to use it consistently. Feedback guidance, telling students why certain strategies work, can correct misconceptions, but often doesn’t translate into action. This study suggests that pairing such cognitive interventions with motivational ones, like reward expectations, can bridge the gap between “knowing” and “doing.” When students both understand the value of retrieval and feel incentivised to use it, they are more likely to choose it not just immediately but also over time.
The researchers worked with 5- to 6-year-olds, asking them questions about science texts before reading the stories. If the children were given the correct answers straight away, their learning improved significantly compared to free-drawing or just being told the lesson objectives. However, when no feedback was provided, the prequestions didn’t help. The study also found that this effect was not linked to differences in children’s working memory, meaning the strategy could be effective across the whole class.
For teachers, the key point is that prequestioning by itself isn’t enough for very young learners, the feedback is what makes it work. Simply asking questions may only leave children guessing or confused, but if the teacher provides clear, immediate answers, it sharpens attention and primes children to listen for and remember key information. This suggests that short “pre-quiz and feedback” routines before reading or introducing new material could be a powerful way of preparing young children to learn.
Analysing 31 studies, the authors found a consistent moderate correlation (r = .37) between students’ digital literacy (DL)( their ability to navigate, evaluate, and create digital content, and their capacity for self-regulated learning (SRL)—planning, monitoring, and adapting their own learning. Importantly, this relationship was not uniform: it was stronger or weaker depending on the measurement tools used, the learners’ backgrounds, and geographical context. Only about a third of the studies were explicitly grounded in theory, and those that were often drew from very different conceptual models. This theoretical fragmentation means we still lack a clear, unified framework for how DL and SRL develop together and reinforce each other.
For educators, schools could, for example, pair explicit instruction in evaluating online sources with reflective prompts about learning processes, or use collaborative digital projects to foster both technical fluency and planning skills. The lack of a unified framework suggests that educators should draw consciously from multiple perspectives, tailoring their approach to the specific digital and self-regulatory needs of their learners rather than relying on one-size-fits-all programmes.
Researchers tested 62 carefully matched students (half bilingual, half trilingual) on three core executive functions: inhibitory control (Stroop task), information updating (N-back), and mental-set shifting (task switching). By controlling for key variables such as age of language acquisition, language proficiency, and orthographic distance, the authors sought to isolate the impact of managing more languages. Across all tasks, bilinguals outperformed trilinguals, with the largest differences seen in speed and accuracy under conditions requiring cognitive flexibility and inhibition. This challenges the idea that learning and managing more languages necessarily boosts general cognitive skills, aligning instead with the skill-learning theory, which suggests that language use becomes automatic and demands less executive control over time.
For educators, these findings caution against overgeneralising the “multilingual advantage” narrative. While bilingualism and multilingualism have clear cultural, social, and communicative benefits, this study suggests that beyond two languages—particularly for late sequential learners in single-language contexts—the extra cognitive demands may not translate into broader thinking skills. For teaching, it’s a reminder to consider students’ actual language use contexts, proficiency levels, and learning histories rather than assuming that “more languages = better cognitive performance.” In bilingual/trilingual classrooms, executive function may be better strengthened through deliberate cognitive challenge—such as problem-solving and switching tasks—rather than relying solely on the cognitive side-effects of language learning.
A new review argues ‘belonging’ isn’t a feeling you bolt on but a culture you build across perceptions, competencies, motivations and real opportunities.
Belonging matters for learning and wellbeing, but the field is fragmented: definitions vary, measures rely heavily on self‑perceptions, and theories used in research rarely translate into classroom practice. Allen uses a qualitative, interview‑plus‑review design (INRAM) to test whether the Integrative Framework of Belonging fits school realities. Experts highlight that belonging is context‑dependent, historically situated, and strongly influenced by leadership, social networks and structural features of schools, not just individual feelings.
Crucially, the experts caution that “perceptions” act as the gatekeeper: students’ interpretations of opportunities and of their own competencies determine whether any well‑meant intervention lands. They also press for more explicit attention to school‑level composition (e.g., gender/ethnic mix), social identity and stereotype cues, and the role of “powerful people” in setting norms. The upshot: belonging is ongoing, negotiated and multi‑layered; frameworks should add networks, agency and structural context, and then be field‑tested.
Teaching students to “think like a mathematician” requires both strategy and substance.
A meta-analysis of 52 studies found that the most effective way to teach word problems to 5–11-year-olds with maths difficulties is to combine concrete–semiconcrete–abstract (CSA) instruction with schema-based instruction, supported by graphic organisers and metacognitive strategies.
This large-scale network meta-analysis synthesised findings from over 6,900 students and compared a range of instructional strategies, both singly and in combination. The researchers found no single “silver bullet” strategy. Instead, the most effective interventions were multi-component approaches anchored in two core elements: CSA instruction(moving from hands-on materials to visual representations to abstract symbols) and schema-based instruction (teaching students to recognise problem types and apply appropriate solution plans). When these were supplemented with graphic organisers and metacognitive strategies, outcomes were strongest. Interestingly, metacognitive strategies alone were among the least effective, suggesting that reflective problem-solving skills must be built on a solid base of conceptual and procedural understanding.
Application: for students struggling with word problems, start with a carefully scaffolded shift from the concrete to the abstract while explicitly teaching problem structures. Then layer on supports like graphic organisers to help students organise information, and metacognitive prompts to guide planning and checking. Avoid relying solely on “thinking about thinking” approaches without substantive content and representation work. This aligns with the idea that deep mathematical understanding comes from a synergy between conceptual grounding, strategic structure, and guided reflection, rather than from isolated strategy instruction.
Moderate levels of anxiety can actually boost reading comprehension. Too much or too little is the real problem.
This research with 197 students aged 10–16 followed over two years challenges the simple view that anxiety is always bad for learning. Instead, it found an “inverted U” effect: moderate general anxiety was linked with the highest reading comprehension, while both very high and very low anxiety predicted weaker performance. Test anxiety, however, did not significantly influence reading in this sample—possibly because the reading assessment was low-stakes and administered at home. The strongest positive predictors of reading comprehension were attention and effortful control (the ability to regulate one’s emotions and behaviour to stay on task). Interestingly, high positive affect—especially high-energy emotions like excitement—was linked to slightly lower reading comprehension once attention and effortful control were taken into account.
For teachers, this means that anxiety isn’t always something to be eliminated. Some concern about performance may focus students’ efforts. What matters is preventing anxiety from becoming overwhelming. Supporting students’ ability to sustain attention and to manage their behaviour is likely to yield bigger benefits for reading than trying to eliminate all anxiety. High-arousal positive emotions may sometimes be distracting for reading comprehension, so helping students regulate both positive and negative emotions is important. Interventions should therefore target strengthening attention and self-regulation alongside creating learning environments that keep anxiety at an optimal level, rather than aiming for its complete removal.
A new report from Jill Barshay on a May symposium at the American Enterprise Institute shows that chronic absenteeism in American schools has dropped from its 2021–22 pandemic peak but still affects nearly one in four students, roughly 11 million nationwide. The increase is not limited to disadvantaged students; rates are up across income levels, high-achieving districts, and racial groups (once income is accounted for). Even moderate absenteeism is rising, with more students missing occasional days who previously had near-perfect attendance. Contributing factors include mental health struggles, changes in parental norms about keeping children home, disengagement, and, notably, a cultural shift that treats in-person schooling as optional, mirroring adult work patterns in a hybrid world. Technology now allows students to catch up remotely, reducing the perceived consequences of absence, even as grades and graduation rates climb, possibly due to grade inflation.
Fifth and sixth graders remembered more from videos than from illustrated texts.
In a within-subject experiment with Finnish fifth and sixth graders, Haavisto, Lepola, and Jaakkola compared learning from illustrated science texts versus videos containing identical content. Across retrieval, transfer, and delayed retention measures, videos produced better long-term retention and lower reported mental effort, while performing just as well as texts for finding information during open-book tasks. Crucially, the advantage of videos was most pronounced for children with weaker decoding and reading comprehension, suggesting that the auditory delivery bypassed some of the cognitive load imposed by reading. The study controlled for real-world variables by using typical classroom materials and procedures, making the results highly applicable to everyday teaching.
Neveu, Libersky, and Kaushanskaya compared two ways of learning new words: paired-associate learning (PAL), where each word is taught with a clear meaning, and cross-situational word learning (CSWL), where learners infer meaning from repeated but ambiguous contexts. Across 378 participants, PAL consistently led to better immediate performance, but retention fell off more steeply over 24 hours than CSWL.
This suggests that while PAL is more efficient for initial word learning, CSWL may create more durable, though fewer, word-referent links. Importantly, phonological working memory supported longer-term retention in CSWL, indicating that individual learner differences (working memory) influence which method works best.
Through two experiments with university students, the researchers tested how three design choices; anthropomorphism (low vs. high human-like features), content alignment (matched vs. mismatched to the lesson topic), and role integration (embedded vs. separate from the main content), affect attention, emotions, cognitive load, and learning performance. They found that content alignment was the strongest factor: VPAs that matched the lesson theme increased focus, emotional engagement, and strategic interaction with the video, regardless of how human-like they looked. When VPAs were mismatched, higher anthropomorphism helped hold attention, possibly because expressive cues compensated for poor thematic relevance.
In the second experiment, the format of integration mattered: content-aligned VPAs shown separately from the main content (“decoupled”) produced the best overall results, but when a VPA was embedded within the content, high anthropomorphism reduced frustration and boosted performance. This suggests a “fit-for-purpose” approach—matching the VPA to the subject matter, then choosing its level of human-likeness and placement based on the teaching goal. For educators using instructional videos, this means not just adding an avatar for social presence, but ensuring that its look, role, and relevance are intentionally aligned to support learning, rather than distract from it.
This study is flawed but potentially very significant in the sense that it's basically a prototype of what AI-driven teaching might look like: prompt-engineered AI automation with guardrails and a knowledge base can mimic structured, rubric-driven feedback at scale.
A custom, prompt‑engineered GPT‑4o was used to give formative feedback on first‑year physics lab reports; students (n=15) generally judged the feedback clear, actionable and useful—especially the concrete rewrites—though occasional inaccuracies mean it should augment, not replace, human input.
The researchers built a “custom GPT” with tightly written instructions and a small knowledge base (marking criteria, model reports, lecture notes) so that the AI focused on writing quality rather than scientific content. They trialled it with first‑year physics students: AI feedback was structured (strengths, weaknesses, concrete next steps) and included exemplary rewrites of weak sections. Students liked the clarity, consistency and speed; they also valued being able to ask follow‑up questions. Concerns centred on odd inaccuracies and the need for moderation.
Across two preregistered online experiments using a 12–12 (day vs night) design, sleep reliably improved retention of previously learned material, but (on the authors’ preregistered tests) did not boost next‑day learning of new material; only an exploratory re‑analysis (controlling for pre‑sleep performance) hinted at a modest positive link.
The researchers taught adults word pairs (and once a visuospatial task), tested them either after a night’s sleep or a day awake, and then had them learn a new set of word pairs straight away. Sleep helped people keep hold of yesterday’s learning—both word pairs and, in the first study, picture locations—compared with being awake for the same interval. However, sleep didn’t make them better at learning fresh word pairs the next day on the primary analyses. In one combined, exploratory analysis that accounted for how well participants had recalled before sleep, better overnight retention did correlate with better next‑day learning in the sleep group.
Atir and Dunning’s studies reveal that students who completed an introductory finance or psychology-and-law class later claimed knowledge of made-up terms in those subjects more often than students in control classes. Even two years later, these inflated self-assessments persisted. Experimental studies using GPS training replicated the same effect: after brief instruction, participants expressed confidence not only in what they had learned, but also in false or unfamiliar information that “sounded right.” In other words, introductory education seems to expand students’ sense of their expertise faster than their actual knowledge grows.
Other stuff:
Subscribe for regular updates and follow me on X, LinkedIn, BlueSky and Facebook.
I’ve built an e-learning course on How Learning Happens here.
Our new book ‘Instructional Illusions’ is out now.
“Perhaps most concerning was the lack of metacognitive awareness: faculty confidence in their pedagogical knowledge bore no relationship to their actual understanding.”
As someone who communicates regularly with literacy professors, I can testify to this. I’m reminded of the exchange in the Mel Brooks film The Producers where Leo tells Max that actors are not animals, and Max replies, “They’re not? Have you ever eaten with one?” Communicate with a whole-language professor and you’ll soon find their confident claims related to reading instruction are rarely supported by cited research.
Check out An Experimental Study Of The Educational Influences Of The Typewriter In The Elementary School Classroom, by Wood and Freeman:
https://archive.org/details/AnExperimentalStudyOfTheEducationalInfluencesOfTheTypewriterInTheElementarySchoolClassroom
This research from 1932 shows what happens when students use typewriters in their classrooms. The researchers found a positive influence on gains in the type of educational achievement measured by the Stanford Achievement Test , including spelling, arithmetic computation, geography, word meaning, language use, and paragraph meaning.
It was sponsored by -- wait for it -- the typewriter industry. Today, people are much less transparent about their biases and where their research funding comes from. So maybe we should all be a little more skeptical about "learning science."
Plus, people: go to the primary source and read it critically. Don't just trust Carl Hendrick to tell you the truth about it.