The Monthly Dispatch - What's New in Learning Science - January 2026
Why do students use study strategies they know don't work? Plus new studies on retrieval practice, pre-questions, adaptive learning, habit-driven studying, and classroom attention
Welcome to the first monthly newsletter of 2026. You’ll notice things look a little different this time around. To help you navigate the ever-growing mountain of research on learning and instruction, I’ve redesigned the newsletter into a more structured, thematic format. Instead of a long stream of papers, the research is now grouped into core pillars;
Memory, Learning, and Knowledge
Instruction and Practice
Reading and Language Development
Motivation, Self-Regulation, and Wellbeing
Classrooms, Behaviour, and Attention
Individual Differences and Inclusion
AI, Technology, and Cognitive Tools
At the end of the newsletter, you’ll also find a short section on recommended reading and upcoming talks and events. Happy new year to you all and thank you again for your continued interest in my ramblings. Here is this month’s breakdown of new papers on the science of learning.
Memory, Learning, and Knowledge
This study examined whether retrieval practice boosts learning equally across different teaching conditions or whether its benefits depend on the quality of the instructional explanation. Participants received either high quality or low quality explanations of target concepts, then engaged in either retrieval practice or restudy. The results showed that retrieval practice made a marked difference only when the instructional explanation was poor. When explanations were strong, retrieval and restudy produced similar outcomes.
The implication for educators is that retrieval practice is not magically effective in all contexts. It seems particularly valuable when initial teaching is fragile or incomplete because it prompts learners to reconstruct meaning and fill gaps. When explanations are already precise and conceptually supportive, the added benefit of retrieval is smaller. Teachers should therefore see retrieval not as a universal enhancer but as a tool whose impact depends on the quality of the initial instruction.
This multilevel meta-analysis synthesised over sixty years of research on prequestions, drawing on 55 studies and data from 12,003 participants. The researchers examined whether asking learners questions before they engage with educational material (text, video, or lecture) improves subsequent learning. They found a robust, medium to large effect for “repeated questions” (test items covering the same content as the prequestions), but essentially zero effect for “new questions” (test items covering other material from the same resource). The specific benefit held across variations in content domain, learner age, retention interval, and mode of delivery. Crucially, receiving feedback after prequestions amplified the effect substantially (g = 1.25), suggesting that prequestions work best when learners can confirm or correct their initial attempts
This study asked 213 undergraduates to study a five page research article on false memory, then assigned them to one of four conditions: reread only, reread plus free recall twice, reread plus free recall then a quiz, or reread plus free recall then writing their own test questions. A week later they took a 19 item test with multiple choice and short answer items that were similar but not identical to anything they had seen before, and they also reported how hard the study phase had felt.
The headline result is awkward for simple retrieval practice slogans: none of the retrieval combinations outperformed plain rereading on the delayed test, and students who experienced higher cognitive load during study did worse, regardless of which strategy they had been given.
The authors argue that for such advanced materials it may be better to build a basic grasp first through easier exposure and support, then bring in retrieval practice once students’ knowledge is more stable, and to pay attention to how heavy the study load feels to students as that can quietly erode the benefits we expect from testing.
This paper reframes the long-standing problem of students overusing low-impact strategies such as rereading and highlighting by shifting the explanation away from poor metacognition and towards habit theory. Drawing on contemporary models of habit formation, the authors argue that ineffective strategies are more likely to become habitual because they are repeatedly used in stable contexts, feel fluent, demand little effort, and deliver immediate but misleading rewards. A small proof-of-concept study supports this claim: highlighting and rereading showed substantially stronger habit strength than self-testing or concept mapping, measured via a validated habit index.
The significance lies in what this explanation adds. The intention–behaviour gap in self-regulated learning has been well documented, but it is often treated as a motivational or volitional failure. This paper shows that even well-informed students may default to ineffective strategies because those strategies are automatically cued and resistant to change, especially under stress or time pressure. In short, students are not always choosing badly; they are often acting habitually.
Instruction and Practice
Very interesting new theoretical paper from Slava Kalyuga (a key figure in Cognitive Load theory) who revisits the theory through an evolutionary lens, treating human cognition as an intelligent natural information processing system. He rearticulates the familiar five principles underpinning CLT and adds a sixth: the explicit intention to learn principle. This distinguishes human learning of biologically secondary knowledge from automatic, biologically primary learning and places motivation, goal orientation, and deliberate effort at the centre of instruction. From this foundation, Kalyuga proposes a goal-driven, integrated reconceptualisation of CLT in which instructional activities are selected not just for schema acquisition, but also for pre-instructional goals such as activating prior knowledge, surfacing knowledge gaps, and motivating engagement.
Key takeaways: adaptive teaching is not just about matching task difficulty to prior knowledge. It is about deliberately sequencing different learner activities to serve different goals at different moments. Worked examples, problem solving, exploration, prompts, and scaffolds should not be judged as globally good or bad, but as tools whose value depends on learner expertise and instructional purpose. The paper strongly reinforces the expertise reversal effect and challenges simplistic interpretations of cognitive activation, productive failure, and discovery approaches. Adaptive instruction, in practice, means adjusting guidance, challenge, and support in real time as learners move from novice to more expert performance, while recognising that motivation and affect are not optional extras but core design constraints.
This study examined whether the effectiveness of two teaching approaches; Project Based Learning (PBL) and Direct Instruction (DI), depends on students’ mathematical disposition. The authors used a quasi-experimental 2×2 factorial design with 75 grade-XI students at a public school in Southeast Sulawesi, Indonesia. All students received differentiated instruction (i.e. tasks adapted to their readiness), but one group followed PBL and the other DI. After a unit in statistics, mathematical communication skills (written explanation, visual representation and reasoning) were assessed, alongside a questionnaire measuring students’ mathematical disposition (confidence, persistence, curiosity, etc.).
The key finding was a significant interaction: for students high in mathematical disposition, PBL produced notably better communication skills than DI; for students low in disposition, DI produced better results.
This comparative review of 14 education systems finds that curriculum specificity supports equity and coherent progression, early differentiation by choice or selection exacerbates inequality, and successful reform depends less on the governance model than on adequate time, resources, and professional development for teachers.
This report synthesises evidence from 14 jurisdictions to examine how curriculum policy choices affect student outcomes, teacher workload, and educational equity. The central tension it identifies is between specificity and flexibility: systems with vaguer curricula intended to grant teacher autonomy consistently report confusion, workload inflation, inconsistent implementation, and widening gaps between advantaged and disadvantaged students. Meanwhile, systems that maintain specific content expectations while building in structured flexibility (through content choices, pacing options, or unallocated time) appear better placed to ensure all students access foundational knowledge. The report also finds that early tracking, whether through student choice or academic selection before age 15, tends to funnel disadvantaged learners into narrower pathways.
Reading and Language Development
This study examined how teachers’ reading-related knowledge (declarative, procedural, and pedagogical) relates to how skilled they believe they are at teaching reading. Using a sample of Estonian general education teachers, special education teachers, and special education student teachers, the authors found that overall knowledge levels were modest, especially declarative knowledge of core reading concepts. Special education teachers consistently outperformed general education teachers across all knowledge domains. Crucially, higher declarative knowledge was associated with higher perceived skill in differentiation, while higher procedural knowledge was associated with lower perceived skill in both differentiation and supporting reading motivation.
This matters for educators because it exposes a persistent calibration problem. Teachers who know more about the mechanics of reading often judge their own practice more harshly, while teachers with weaker procedural knowledge tend to feel more confident than their knowledge warrants. In practice, this means professional development that relies on self-reported confidence is likely to miss the teachers who most need support, while reassuring those who are least well equipped. Accurate self-assessment is not a soft skill here; it is a prerequisite for instructional improvement.
This national survey study examined how 375 US elementary teachers (general and special education) report using writing to support reading within the current science of reading climate. Although a large majority of teachers endorsed the idea that writing improves reading, actual classroom use was modest: most writing-to-read practices were used only occasionally or monthly, with just a small subset (such as spelling or short written answers) used weekly. Teachers were more hesitant to use writing with younger, lower-attaining, or special education pupils, despite strong meta-analytic evidence that these groups often benefit most.
Motivation, Self-Regulation, and Wellbeing
This study evaluated a 16-week university wellbeing course built around standard positive psychology interventions such as gratitude, strengths, mindfulness, and meaning, delivered to predominantly Emirati undergraduates. Using pre–post measures across a wide range of outcomes (life satisfaction, affect, stress, mental health, somatic symptoms, locus of control, and cultural orientation), the authors found essentially null effects. After correcting for multiple comparisons, the only reliable change was a small reduction in students’ fear of happiness. In other words, students became slightly less wary of happiness, but they were not meaningfully happier, less stressed, or mentally healthier by the end of the course.
Why this matters is simple but uncomfortable. Wellbeing courses are widely assumed to work, are popular with students and institutions, and are often justified by meta-analyses showing positive effects of PPIs. This study shows that when such interventions are embedded in formal, assessed university courses, in a non-Western cultural context, their impact may be minimal. For educators, the implication is not that wellbeing does not matter, but that importing Western positive psychology curricula wholesale, bolting them onto academic structures, and expecting meaningful change is naïve. Context, culture, motivation, and delivery conditions matter more than the menu of activities.
Using PISA 2022 data from 507,588 pupils in 74 countries, the study finds that growth mindset is usually but not always linked to higher maths scores, and that socioeconomic status sometimes changes that link but in no consistent direction across countries.
This paper asks two straightforward questions: is a growth mindset associated with maths achievement, and does that association differ for pupils from higher versus lower socioeconomic backgrounds. It analyses nationally representative PISA 2022 samples country by country, controlling for socioeconomic status, and then adds an interaction term to test moderation. In 50 of 74 countries (about two thirds), growth mindset showed a statistically significant positive association with maths achievement after controlling for SES, but the size varied a lot and a handful of countries showed small negative associations.
In many systems, holding growth minded beliefs seems to go with slightly higher attainment, but the expected equity story is shaky: in 41 countries SES did not moderate the link; in 26 countries the association was stronger for higher SES pupils; and in only 7 countries it was stronger for lower SES pupils. That means a mindset push, on its own, is at best unreliable as a gap closing strategy and at worst could widen gaps if advantaged pupils are better able to convert beliefs into action through resources and support.
Classrooms, Behaviour, and Attention
This preregistered randomised controlled trial, conducted across ten higher education institutions in Odisha, India during Spring 2024, assigned nearly 17,000 students at the department grade level to either a phone collection condition or business as usual. Students in treatment classrooms deposited their phones in wooden boxes at the start of each lecture. The primary finding is a statistically significant 0.086 standard deviation increase in GPA, roughly equivalent to 40% of the difference between having an average teacher and a very good one. Critically, the benefits were not evenly distributed: students with below median prior grades showed a 0.161 SD gain, first year students gained 0.142 SD, and non STEM students gained 0.097 SD. High performing students and STEM majors showed negligible effects. Random classroom observations revealed less disruptive behaviour, reduced off topic conversation, and greater teacher engagement with materials in phone free rooms.
Teachers should note, however, that the study found no effect on students’ self reported distraction, even though external observers perceived students as more distracted in phone free rooms. This curious paradox may reflect a kind of attentional withdrawal effect: without phones to occupy idle moments, students become more sensitive to ambient distractions, yet this heightened awareness does not translate into subjective feelings of being distracted.
The study looked at how novice and expert teachers visually monitor a class when disruption occurs. Thirty three teachers watched an eighty second video of a mathematics lesson that included several minor and one major disruption while their eye movements were recorded. The researchers defined areas of interest for each student and used scanpath analysis, Markov transition probabilities and Shannon entropy to see how predictable and stable teachers’ gaze patterns were before, during, and after the disruption. Experts showed higher transition probabilities and lower entropy scores, which means their gaze moved in more regular, structured patterns, and they returned more quickly to their usual scanning of the whole class after focusing on the disruptive student.
Expert teachers do not simply “notice more”; they have routinised ways of scanning the class, briefly zoom in on the disruption, then rapidly re engage with everyone else. Novice teachers, by contrast, show more scattered, exploratory gaze behaviour and are more easily pulled off their routine. This has very practical implications: learning to manage behaviour is partly learning to manage your eyes. Training that makes these expert gaze routines visible and gives novices guided practice in using them could help them keep lessons on track rather than getting visually and cognitively stuck on the disruptive pupil.
Individual Differences and Inclusion
This large scale replication study examined whether fourth grade students without special educational needs (SEN) are disadvantaged when learning alongside classmates who have SEN, specifically those with learning disabilities, speech or language impairments, or emotional disorders. Drawing on consecutive nationally representative assessments of German primary schools in 2011 and 2016, the researcher compared outcomes for over 22,000 students without SEN, roughly 9,000 of whom had at least one classmate with SEN.
The study measured academic achievement across reading, listening, spelling, and mathematics, alongside self reported motivation (academic self concept, interest, enjoyment, boredom) and psychosocial outcomes (social integration and school satisfaction). The headline finding is: inclusive classrooms do not meaningfully disadvantage students without SEN. However, small negative associations emerged for spelling and mathematics achievement, with effect sizes around d = −0.07 to −0.09, roughly equivalent to less than one tenth of a typical school year’s learning gain.
AI, Technology and Cognitive Tools
This study introduces the Scientific Discovery Evaluation (SDE) framework, a benchmark designed to assess whether LLMs can actually contribute to scientific research rather than merely answer decontextualised quiz questions. The authors argue that existing science benchmarks like GPQA and MMMU test knowledge retrieval but not the iterative reasoning, hypothesis generation, and evidence interpretation central to genuine discovery. To address this, they organised 1,125 questions across biology, chemistry, materials, and physics into modular research scenarios, each tied to authentic research projects. The benchmark operates at two levels: question level accuracy on scenario specific items, and project level evaluation where models must propose testable hypotheses, run simulations, and interpret results. The findings reveal a consistent performance gap between SDE scores (around 0.60 to 0.75 depending on domain) and scores on general benchmarks (0.84 to 0.86 on GPQA Diamond for the same models). Reasoning enhanced models outperform their base counterparts, but additional reasoning effort beyond medium levels yields negligible gains. Strikingly, top tier models from different providers (GPT 5, Claude Sonnet 4.5, DeepSeek R1, Grok 4) exhibit highly correlated error patterns, frequently converging on the same incorrect answers.
For educators and researchers interested in AI assisted scientific work, this study delivers a sobering calibration. Models that appear impressive on standardised science questions may struggle when tasked with the kind of context dependent reasoning that characterises actual research. The finding that question level performance does not reliably predict project level success, and vice versa, is particularly instructive: a model might score poorly on transition metal complex questions yet successfully optimise polarisability in an evolutionary search, suggesting that LLMs can navigate hypothesis spaces serendipitously even without precise domain knowledge. Conversely, strong performance on retrosynthesis questions did not translate to generating valid multistep synthesis routes. The practical implication is that educators should avoid assuming benchmark scores indicate research readiness, and should instead evaluate AI tools against the specific reasoning patterns their research contexts demand.
The study tackles a real bottleneck in reinforcement learning for instruction-following LLMs: sparse and ambiguous rewards when instructions contain multiple constraints. Binary rewards are too unforgiving, while aggregated rewards blur which constraints actually matter. The authors propose Hindsight Instruction Replay (HiR), which selectively reuses failed responses by rewriting the original instruction to include only the constraints the model actually satisfied. These rewritten “pseudo-instructions” are then treated as successes and replayed during RL training. Empirically, this leads to large gains in instruction-level accuracy across multiple benchmarks, especially for smaller models, while using less compute.
What matters here is not the benchmarks per se, but the conceptual move: failure is no longer noise or waste. It becomes structured training signal. The paper also provides a clean theoretical framing of HiR as dual-level preference learning — over both responses and instructions — which helps explain why it avoids the distortions introduced by aggregated rewards.
This study compared 100 short stories written by university students with 100 stories generated by ChatGPT using identical prompts, then examined both their linguistic features and how readers experienced them. Linguistic analysis showed clear stylistic differences: AI stories used fewer personal pronouns and fewer references to time and space, but more positive emotion words. When 380 readers were randomly assigned to read one story, AI and human stories were rated as equally enjoyable, novel, and meaningful. The only consistent difference was narrative transportation: readers were slightly less absorbed in AI-generated stories.
Professional Development/Training
This systematic review synthesises 12 studies published between 2000 and 2024 examining how teachers aged roughly 50 plus experience professional learning. Using the Job Demands Resources model as an organising lens, the authors show that there is no settled definition of either late-career teachers or professional learning in this literature, with studies variously defining teachers by age, experience, or perceived expertise, and learning ranging from formal courses to informal peer dialogue. Across contexts, late-career teachers consistently favour workplace-embedded, collaborative learning such as mentoring, reflection, and collegial discussion, while formal training is often experienced as poorly targeted or irrelevant.
The review matters because it challenges the quiet assumption that experienced teachers are either “finished products” or reluctant learners. For educators and school leaders, the implication is blunt: if professional learning is designed around early-career deficits or compliance requirements, late-career teachers disengage. When learning is relational, purposeful, and recognises their expertise, it becomes both a driver and a consequence of professional engagement. Schools that want to retain experienced teachers need to stop treating professional learning as remedial or optional and start treating it as core to professional identity and wellbeing.
Other Stuff
Recommended reading
Over Christmas, I read a few of books indirectly related to instructional design to help me flesh out a framework I’m calling Instructional Invariants which I started working here. These books have been hugely influential in that.
Herbert A. Simon – The Sciences of the Artificial
Maybe the single most important book behind the framework. Instruction as a designed artifact shaped by goals, environments, and bounded cognition.
Christopher Alexander et al. – A Pattern Language
This book really blew my mind. It’s actually nothing to do with learning/teaching but it has so much to offer instructional design. A masterclass in distinguishing solutions from invariants. Key concept: Patterns work only if they both stand alone and integrate into a larger system. This book has such a brilliant formulation around the idea that coherence is a system property, not a local one.
W. Ross Ashby – An Introduction to Cybernetics
Like most people, I thought I knew what cybernetics meant but I really had no clue. This book is one of the foundational texts and the intellectual source of a brilliant idea: “requisite variety” which has huge importance for instructional design: If learners can exploit shortcuts, they will. Instruction must match the variety of learner strategies, or lose control.
Upcoming events
I am doing some webinars for Galway Education Centre on Tuesday 3rd, 10th, 21st February. Register here.
Session 1: How Learning Happens – Key Principles from Cognitive Science - 3rd Feb at 7pm
Session 2: High-Leverage Teaching Strategies – Checking for Understanding and Retrieval Practice - 10th Feb at 7pm
Session 3: Explicit Instruction and Responsive Teaching – Making Thinking Work - 24th Feb at 7pm
In March, I will be in Australia and New Zealand. If interested, register your interest here. The following events are in partnership with Think Forward Educators.
Melbourne: Saturday 7 March, 5:00pm - 8:30pm
Venue TBC
Event Title: TFE Meet and Greet featuring ‘How Learning Happens’Canberra: Thursday 12 March, 9:00am - 3:00pm
Venue TBC
Event Title: Science of Learning: Responsive Teaching & High Impact Levers of LearningAuckland: Monday 16 March, 9:00am - 3:00pm
Venue: Fairway Events Centre, Wairau Valley, Auckland.
Event Title: Science of Learning: Responsive Teaching & High Impact Levers of Learning
*Partnering with Liz Kane LiteracyOn Sat 14th March, I’ll also be speaking at an event I’ve always wanted to attend; researchED Ballarat organised by the incomparable Greg Ashman.
There are more events planned for Australia in March including a double-header with David Didau, so look out for those announcements.
Finally check out this short clip of me talking about memory and retrieval for the Victorian dept of education in Australia earlier this year. Honoured to contribute to the excellent VTLM 2.0 Evidence to action series.




"Many teachers overestimate their ability to teach reading, with perceived skill often misaligned with actual reading-related knowledge, particularly procedural knowledge."
Amen!
The finding that retrieval practice benefits depend on instruction quality challenges blanket recommendations. The phone collection study showing meaningful grade improvements, especially for lower performers, has immediate practical implications. More: https://thoughts.jock.pl/p/ai-writes-code-what-should-schools-teach-2026