What We Don’t Know

There is a particular kind of dishonesty that operates under the cover of enthusiasm. It appears in popular science books that present convergent hypotheses as established facts, in fitness protocols that cite a handful of studies as if they foreclosed all further inquiry, in health journalism that smooths over the distinction between what a trial demonstrated in a specific population and what that demonstration implies for the person reading the article on their phone. It is the dishonesty of the advocate who has decided the conclusion first and selected the evidence to support it. It is not lying, exactly. It is something more insidious than lying: it is the strategic presentation of partial truth as complete understanding.

This chapter is an attempt to do the opposite.

Every claim in this book that is stated with confidence is stated with confidence because the underlying evidence warrants it. But a significant portion of this book’s argument rests on evidence that is indirect, inferential, or drawn from adjacent populations and then extrapolated to the populations most likely to be reading these pages. That extrapolation is not dishonest—it is, in fact, the ordinary practice of applied science, which must always operate somewhat ahead of the specific evidence—but it imposes an obligation. The obligation is to tell you, clearly and without embarrassment, exactly where the direct evidence ends and the inference begins. Not in a footnote. Not in an appendix. In a chapter of its own, because the gap between what we know and what we are building on matters, and pretending it does not exist is a form of the dishonesty described above.

What follows is a structured account of the most significant evidence gaps in the scientific foundation of this protocol. The goal is not to undermine the preceding chapters. The goal is to give you a precise understanding of which arguments rest on firm experimental ground and which rest on the kind of convergent indirect evidence that is intellectually defensible but not yet experimentally confirmed. If you understand the difference, you are in a better position to evaluate the protocol for yourself—and to follow, in the coming years, the research that will eventually fill these gaps.

The Single Most Important Finding: Zero Tier 1 Evidence

Before examining individual gaps, there is a structural fact about the evidence base that must be stated at the outset, because everything else in this chapter flows from it.

A comprehensive analysis of the 21 research files underlying this book—covering 105 key claims across biomechanics, hormonal physiology, spinal health, metabolic effects, nasal breathing, and population-specific considerations—produced the following result: not one claim in the entire library is supported by a published randomized controlled trial using civilian recreational rucking as the primary intervention.

Zero. Not a small number. Zero.

This does not mean the evidence is fabricated or that the protocol is without scientific basis. It means that the evidentiary architecture of this book operates entirely at Tier 2 or below—proxy RCTs using weighted vests or other resistance modalities (15 claims), mechanistic and observational studies (36 claims), theoretical inference and extrapolation (25 claims), and acknowledged evidence gaps (16 claims). The strongest evidence comes from weighted vest RCTs and direct biomechanical measurements, both of which require a transferability assumption that itself has never been empirically validated. The framework is defensible. But its position at the frontier of inference rather than at the center of established science must be clearly understood before reading any further claim in this chapter as either confirmed or dismissed.

The practical consequence is that virtually every benefit claimed for rucking—osteogenic stimulus, hormonal advantage, disc protection, nasal breathing synergy, neuroplasticity enhancement, fall prevention, metabolic syndrome management—is an inference. A reasonable, convergent, mechanistically coherent inference. But an inference nonetheless. The experiments that would confirm or correct these inferences have not yet been run.

The Hormonal Argument: Coherent but Not Yet Directly Tested

Chapter Six made a case about testosterone and cortisol. The case was this: high-volume running chronically suppresses testosterone via the Exercise-Hypogonadal Male Condition, while the resistance-type loading characteristics of rucking—specifically, the sustained axial load that recruits the posterior chain in a pattern biomechanically similar to loaded carries—are more likely to maintain a favorable anabolic-to-catabolic hormonal ratio. The argument was grounded in Hooper et al.’s 2016 EHMC documentation, in Gaviglio et al.’s demonstration that farmer’s walk variants produce acute testosterone elevation, in Taylor et al.’s finding that military men whose occupational demands include regular load carriage maintain anabolic hormone profiles comparable to recreational weightlifters. It is a strong argument. The convergent direction of the evidence is consistent.

But no published randomized controlled trial has directly compared serum testosterone and cortisol between rucking and high-volume running in a controlled crossover design.

This is not a minor caveat. It is the central evidentiary fact about the hormonal claims in this book. The ideal study would recruit a cohort of men matched for age, baseline testosterone, training history, and body composition, randomize them to a rucking program versus a running program matched for duration and energy expenditure, and measure not a single morning testosterone draw but a dense sampling protocol—the kind Hooper’s team used, drawing blood every fifteen minutes across a four-hour window—at baseline, mid-intervention, and conclusion. No such study exists. What exists is the inferential architecture: the EHMC data establishes that high-volume running suppresses testosterone; the resistance exercise literature establishes that loaded carries tend to elevate it; the military occupational load carriage data suggests that men who ruck regularly as part of their work maintain favorable hormonal profiles. From these three bodies of evidence, the inference that rucking is hormonally superior to running is reasonable. But it has not been tested directly, and the difference between reasonable inference and experimental confirmation is the difference between a hypothesis and a finding.

The honest position, then, is this: the hormonal argument is the most compelling indirect case in this book, and it remains an indirect case. It should inform your decision-making about exercise selection with the weight appropriate to convergent inferential evidence, not the weight appropriate to a randomized controlled trial. Those are different things.

The Spine: A Mechanistic Argument Without Imaging Evidence

Chapter Nine described rucking’s axial loading profile as a form of cyclical spinal stimulus—the alternating compression and decompression of walking gait, amplified by the addition of external load, driving the fluid exchange and nutrient diffusion that maintains intervertebral disc hydration across decades. This is mechanistically coherent. The intervertebral disc is avascular. It depends on the pumping action of cyclic loading and unloading for nutrient transport into the nucleus pulposus. Prolonged static compression expresses fluid from the disc. Dynamic cyclic loading of the kind that walking produces promotes the alternating pressure gradients that draw fluid and dissolved nutrients back in. The addition of moderate axial load from a rucksack amplifies this mechanical effect without—at sensible loads—producing the pathological overloading documented in Marines carrying sixty percent of their body weight across extended deployments.

The problem is that none of this has been confirmed by imaging.

No T2-weighted magnetic resonance imaging study and no T1 rho mapping study—the two modalities that can quantify disc hydration in vivo—has measured intervertebral disc water content before and after loaded walking at any intensity. Not at ten percent of body weight. Not at twenty. Not in a healthy population, and not in a clinical one. The spinal health argument in this protocol is mechanistic inference from first principles of disc physiology and general load carriage biomechanics. It is not evidence that rucking improves disc hydration. It is an argument that the mechanical characteristics of rucking are the kind that, based on what we understand about disc biology, should promote disc health rather than impair it—and that the catastrophic loading levels documented in military populations are substantially above the loads recommended here.

Onodera et al.’s 2019 MRI study of active-duty Marines showed disc height reductions under operationally relevant loads, but those loads—often exceeding forty-five kilograms—were not the loads this protocol recommends, and the study measured disc kinematics under acute loading rather than disc hydration over time. The existing imaging literature establishes the outer bounds of harmful loading; it does not characterize the beneficial zone within which this protocol operates.

Until someone runs a longitudinal disc hydration imaging study in a non-military rucking population—measuring T2 signal before and after sessions, and across months of regular practice—the spinal health argument remains mechanistic. Well-grounded mechanistically. Unconfirmed experimentally.

The Missing Population: Women Between Forty and Seventy

This is arguably the most consequential gap in the entire literature, for a reason that is not primarily about evidence quality but about who needs this intervention most.

The biomechanical load carriage literature that underpins this book is overwhelmingly derived from young, predominantly male, military populations. Walsh and Low’s 2021 systematic review of military load carriage effects on gait—the single most comprehensive synthesis of the relevant biomechanics literature—synthesized studies in which participants were characteristically between twenty and thirty-one years old, predominantly or entirely male, and selected for physical fitness levels well above civilian norms. Johnson et al.’s 2024 study, which is the most methodologically rigorous sex-comparative biomechanical dataset available, examined U.S. Army Basic Combat Training recruits—fit young adults at the beginning of military careers, not women in their fifties managing declining bone density, shifting hormonal environments, altered pelvic floor mechanics, and the early stages of dynapenic muscle loss that will determine whether they remain functionally independent in their eighties.

The argument for rucking in women between forty and seventy is built on the bones of this military biomechanical data and then extrapolated through adjacent literatures: the postmenopausal bone density research, the pelvic floor physiology, the female-specific RED-S considerations that Chapter Seven addressed. But the direct data—force plate studies, tibial acceleration measurements, spine loading measurements, pelvic floor pressure measurements during loaded walking—for this age group, in this sex, at the loads this protocol recommends, does not exist in anything approaching adequate form. Gait mechanics change after forty. Bone geometry changes. The pelvic floor’s ability to generate and sustain intra-abdominal pressure changes with parity history and hormonal status in ways that have direct implications for both injury risk and the core stabilization benefits that Chapter Eight attributed to nasal-diaphragmatic breathing under load.

The population that most needs the osteogenic and anabolic benefits of weighted walking is precisely the population for which the direct biomechanical data is most sparse. This should be stated plainly, and it should motivate the practical conservatism that runs through the load prescription in Part III: the recommendation to start at five to eight percent of body weight, to progress slowly, to monitor symptoms, is not excessive caution. It is the appropriate response to operating in an evidence gap.

Dynapenia and Older Men: Dose-Response Data We Don’t Have

The argument for rucking as a dynapenia countermeasure in men over sixty-five was made in Chapter Twelve, and it rests on a genuine and urgent biological reality. Muscle strength—not muscle mass, but the neuromuscular capacity to generate force rapidly and reliably—begins declining in the fifth decade and accelerates through the seventh and eighth. The functional independence threshold—the ability to rise from a chair unassisted, to climb stairs, to arrest a fall, to carry groceries—depends on strength preservation in ways that are increasingly well-characterized in the gerontological literature. Kraemer et al.’s foundational resistance training endocrinology work, even conducted in sixty-two-year-old men, established that periodized mechanical loading stimulates meaningful hormonal and neuromuscular adaptation into the seventh decade. The general case for loaded mechanical stimulus in older men is sound.

What does not exist is a randomized controlled trial specifically examining the minimum effective dose of loaded walking for dynapenia prevention in men over sixty-five.

The clinical question is precise: how much load, at what frequency, for how long, is necessary to produce measurable preservation of grip strength, isokinetic knee extension torque, and functional movement capacity in older men—with the specific outcome of delaying the functional independence threshold? The existing weighted vest studies in older populations are predominantly focused on women, and their primary outcomes are typically bone density and fall prevention rather than strength preservation and sarcopenia resistance. Lowe et al.’s 2025 HRV study during simulated military operations involved physically active males in their late twenties and thirties, not men navigating the compound challenges of late-life hormonal decline, decreased motor unit recruitment capacity, and the recovery limitations that follow from reduced growth hormone pulsatility.

The dose-response question in this specific population is genuinely unanswered. The protocol’s recommendation of two to three sessions per week at fifteen to twenty percent of body weight for men over sixty-five is grounded in the resistance training periodization principles that Kraemer’s work established, in the military load carriage fatigue literature, in the general exercise gerontology framework. But the specific claim that this particular dose produces measurable dynapenia protection in this particular population has not been tested in an RCT. A dose-response study examining five, ten, fifteen, and twenty percent of body weight at two versus three sessions per week, with primary outcomes of grip dynamometry, five-times sit-to-stand performance, and six-minute walk test, would be among the most valuable clinical research investments in exercise gerontology. It has not been done.

Nasal Breathing During Load Carriage: The Protocol’s Untested Element

Chapter Eight made a detailed physiological case for nasal-only breathing during rucking. The case assembled four lines of evidence: paranasal sinus nitric oxide production and its pulmonary vasodilatory effects; autonomic regulation through the parasympathetic dominance that nasal airway resistance promotes; ventilatory efficiency improvements documented in cycling and clinical exercise testing; and the diaphragmatic recruitment cascade that generates intra-abdominal pressure and, in turn, lumbar stabilization under load. The evidence in each of these domains is real. Lundberg et al.’s foundational work on sinus nitric oxide has been replicated. Deus et al.’s 2024 demonstration of parasympathetic HRV shift with nasal breathing is methodologically clean. Eser et al.’s 2024 ventilatory efficiency data—showing VE/VCO2 reductions that, in heart failure patients, matched months of high-intensity interval training—is striking. Trevisan et al.’s electromyographic evidence that nasal breathers recruit the diaphragm more effectively than mouth breathers is consistent with what the respiratory physiology predicts.

But no published study has directly examined nasal-only breathing during loaded walking at any intensity.

The entire Akureyri Protocol nasal breathing component is, in technical terms, a mechanistic inference applied to a movement modality in which it has never been directly tested. Rappelt et al.’s 2023 nasal breathing study used low-intensity cycling. Eser et al.’s 2024 data came from seated exercise testing on a cycle ergometer with heart failure patients. Lee, Seo, and Lee’s 2025 study—the most controlled comparison of breathing modes during progressive exercise (G.-Y. Lee et al., 2025)—characterised the nasal ventilatory ceiling during unloaded treadmill walking and running in ten women, establishing that nasal breathing becomes unsustainable above 11 km/h and that the nasal-to-oral transition coincides with the ventilatory threshold. But those women were not carrying packs. The WKU 2024 walking study that found significant RER differences between breathing modes used five-minute steady-state bouts, not sixty-minute rucking sessions, and the sample of thirty-three participants represents modest statistical power. None of these studies involved a backpack. None measured the specific interaction between thoracic load distribution, spinal stabilization demands, respiratory mechanics under axial compression, and the diaphragmatic function that nasal breathing is supposed to enhance.

The same gap applies to the talk test. Chapter Eight introduced the talk test as a secondary pacing tool—a categorical indicator of ventilatory threshold that has been validated against gas analysis in multiple populations (DeHart, 1999; Kwon et al., 2023; Mahmod et al., 2022; Sørensen et al., 2020). But like nasal breathing, the talk test has zero validation during loaded walking. This matters because Shei et al. (2017) demonstrated that thoracic load carriage alters ventilatory mechanics independent of exercise intensity, increasing dead-space ventilation and changing the relationship between metabolic demand and breathing effort (Shei et al., 2017). A heavy pack on the chest restricts the very mechanics that speech production depends on. The talk test may be more conservative under load—failing at a lower intensity than it would during unloaded exercise—or it may be confounded by the mechanical restriction in ways that degrade its accuracy as a ventilatory threshold proxy. Neither possibility has been tested.

The unified gap, then, is this: no study has compared nasal breathing, the talk test, or heart rate monitoring specifically during rucking. The four-layer pacing model presented in Chapter Eight—nasal breathing as baseline filter, talk test as threshold detector, heart rate as objective confirmation, RPE as integrative check—is assembled from evidence gathered in unloaded exercise modalities and extrapolated to loaded walking. Each layer’s individual validation is sound. Their behaviour under a twenty-kilogram pack is unknown.

It is physiologically plausible, perhaps likely, that the nasal breathing benefits documented in cycling and clinical exercise settings transfer to loaded walking, and that the talk test remains a reliable threshold marker under load. The mechanistic chains are coherent: the nitric oxide pathway is mode-independent, the autonomic effects of nasal airway resistance are mode-independent, the IAP-generating function of forced diaphragmatic recruitment would be, if anything, more relevant under axial load than without it, and the competition between breathing and speech that drives the talk test intensifies rather than diminishes when a pack adds respiratory constraint. But plausible and demonstrated are different categories, and the claim that these pacing tools perform during rucking as they do during unloaded exercise requires direct experimental validation that does not yet exist.

The practical implication for this protocol is modest, because the pacing recommendations rest on a foundation that is mechanistically coherent and unlikely to be wrong. But the intellectual honesty implication is significant: you should know that the Akureyri Protocol’s pacing system—not just the closed-mouth rule, but the entire layered approach—is the protocol’s most experimentally uncharted territory.

Fat Oxidation: Where the Evidence Contradicts Itself

The metabolic substrate argument—that rucking, performed in Zone 2 with nasal breathing as an intensity governor, produces superior fat oxidation relative to higher-intensity aerobic exercise—is supported by evidence that is genuinely mixed, and the mixing deserves more than a footnote.

The Western Kentucky University 2024 study found a statistically significant respiratory exchange ratio difference between nasal and oral breathing during steady-state walking: nasal breathing produced a lower RER, indicating a greater proportion of fat oxidation (F(1,32) = 44.479, p < 0.001). That is a strong result with a large F-statistic, from a study that used walking rather than cycling, and the direction of the effect is exactly what the Akureyri Protocol predicts. The mechanistic interpretation offered in Chapter Eight—that nasal breathing enforces an intensity ceiling that keeps the exerciser in the fat-oxidation-dominant zone—is consistent with this finding.

But Rappelt et al.’s 2023 study, conducted over sixty minutes of self-selected low-intensity cycling with nasal-only versus oronasal breathing, found no significant RER difference between conditions (p = 0.67, partial eta squared = 0.01). That is an emphatically null result. Eser et al.’s 2024 study, using heart failure patients and healthy controls on cycle ergometers, similarly found no RER difference between breathing modes in any group. Bergqvist et al.’s 2025 study in twelve well-trained male cyclists found no significant RER difference across oral, oronasal, and decongested oronasal breathing conditions (p = 0.06 for the trial main effect, which narrowly fails conventional significance thresholds but does not support a strong positive claim).

There is a possible reconciliation. The WKU study used walking; the null-result studies used cycling. The metabolic substrate profile of walking may differ from cycling in ways that make nasal breathing’s intensity-governing function more consequential during ambulatory locomotion. Alternatively, the effect may be real but more modest and less consistent than a single study with thirty-three participants can reliably characterize. The walking-specific evidence is currently the most directly relevant to this protocol, and it supports the argument. But it would be misleading to present that support without acknowledging that the broader nasal breathing and substrate utilization literature does not consistently find the same effect, and that the preponderance of the exercise physiology RCT evidence on this question is currently null.

The conservative reading is that nasal breathing’s primary metabolic contribution to this protocol is the intensity governance mechanism—keeping the exerciser in a zone where fat is the dominant fuel by making higher intensities physiologically untenable through ventilatory constraint—rather than a direct pharmacological-style shift in substrate preference at any given intensity. That is a meaningful contribution, and it is well-supported. The stronger claim—that nasal breathing independently shifts the fat-carbohydrate oxidation ratio at matched intensities—is supported by one walking-specific study and contradicted by several cycling studies, and should be held proportionally.

Cardiovascular Adaptation in Older Ruckers: The Youngest Dataset

Chapter Twelve’s cardiovascular argument drew heavily on Lowe et al.’s 2025 study of heart rate variability during simulated military operations involving daily load carriage at approximately thirty percent of body weight. The study is methodologically rigorous by military exercise science standards: thirty-two participants, twenty days of structured operations cycles alternating low-stress and high-stress phases, dense HRV measurement. The findings are meaningful: short-term exercise HRV metrics—SDNN, RMSSD, LF, and HF power—all increased from the first to the last day of both stress cycles, suggesting progressive parasympathetic adaptation to the chronic load carriage stimulus. This is exactly the cardiovascular signature associated with reduced all-cause mortality risk in the epidemiological literature.

The participants were thirty-two physically active males in their late twenties and early thirties.

No randomized controlled trial has measured chronic cardiovascular adaptations—VO2max changes, arterial stiffness, left ventricular geometry, carotid intima-media thickness progression—specifically from a loaded walking program in men or women over fifty. The HRV data from Lowe et al. is the best available direct evidence that rucking produces parasympathetic cardiovascular adaptation. It is derived from a population that is approximately twenty to twenty-five years younger than the population to whom this protocol is primarily addressed, selected for above-average fitness, and operating under the specific physiological conditions of a military simulation rather than the everyday conditions of a recreational rucking practice. The extrapolation from young military males to older civilians is supported by the general exercise adaptation literature—cardiovascular training principles do not suddenly stop operating at fifty—but has not been directly confirmed for this movement modality in this age group.

The absence of this evidence is particularly consequential because the cardiovascular risk profile of older adults is qualitatively different from that of young military men. Arterial stiffness, impaired baroreceptor sensitivity, blunted heart rate recovery, and reduced cardiac autonomic modulation are characteristic features of aging that intersect with rucking’s specific hemodynamic demands—particularly the blood pressure response to sustained load carriage with a hip belt that transfers compressive force through the iliac vessels—in ways that have not been systematically characterized. The recommendation in Chapter Twelve that men and women over sixty begin with lighter loads and shorter duration, and that those with established cardiovascular disease or uncontrolled hypertension consult a physician before beginning, is not merely legal caution. It is the practical acknowledgment that the evidence base for cardiovascular safety and efficacy in this specific population has a gap large enough to warrant behavioral conservatism.

What the Running Evidence Actually Shows

The adversarial review that led to the revisions throughout this book identified a systematic pattern in the original manuscript: running was treated as a foil rather than a genuine alternative with its own substantial evidence base. That framing was dishonest by omission. Here is what the running evidence actually shows.

Lee and colleagues, in a 2014 prospective cohort study of 55,137 adults followed for fifteen years, found that runners had a thirty percent lower risk of all-cause mortality and a forty-five percent lower risk of cardiovascular mortality than non-runners—benefits that persisted even at modest volumes of five to ten minutes per day (D. Lee et al., 2014). This is the strongest mortality evidence for any single exercise modality. Rucking has zero mortality data.

Alentorn-Geli and colleagues, in a 2017 meta-analysis of 125,810 individuals, found that recreational runners had an osteoarthritis prevalence of 3.5 percent compared to 10.2 percent in sedentary controls and 13.3 percent in competitive runners (Alentorn-Geli et al., 2017). Recreational running is chondroprotective—it protects joints, not destroys them. Timmins and colleagues found runners approximately fifty percent less likely to need knee replacement surgery (Timmins et al., 2017). Lo and colleagues, using longitudinal OAI data, found no association between running history and symptomatic knee osteoarthritis (Lo et al., 2017), and that running did not increase symptoms or structural progression in people who already had knee OA (Lo et al., 2018).

The earlier version of this book implied that running at any substantial volume was destroying joints and suppressing hormones. The joint claim is contradicted by large-scale epidemiological evidence. The hormonal claim is real but applies to volumes (>81 km/week) that most recreational runners never reach. The case for rucking does not require running to be worse than the evidence says it is. It requires rucking to offer something running does not: lower impact forces, simultaneous resistance and cardiovascular stimulus, posterior chain loading, and accessibility to populations for whom running is impractical or contraindicated. Those are genuine advantages. They do not need the straw man of a “running lie” to support them.

The Gravitostat: An Intriguing Hypothesis With a Replication Problem

Chapter Five presented the gravitostat as a plausible mechanism by which rucking might produce fat loss beyond what simple caloric expenditure explains. The hypothesis, developed by Jansson, Ohlsson, and colleagues at the University of Gothenburg, proposes that osteocytes in weight-bearing bones act as a homeostatic body-weight sensor: increased gravitational loading on the lower extremities triggers an osteocyte-dependent signal that reduces food intake and fat mass independently of leptin. The human proof-of-concept came from a 2020 RCT by Ohlsson et al., which showed that wearing a weighted vest at 11 percent of body weight for eight hours per day over three weeks produced 1.37 percent greater weight loss and 4.04 percent greater fat mass reduction compared to a light vest, with lean mass preserved. That is a real finding from a genuine randomized trial.

The problem is that essentially all experimental data supporting the gravitostat come from one research group—the Gothenburg team—and the only systematic attempt at independent evaluation produced a negative result.

Turner et al. at Oregon State University (2020) analyzed multiple animal models: spaceflight (microgravity), hindlimb unloading (simulated microgravity), and centrifugation (increased gravity). The logic is straightforward: if a gravitostat exists, microgravity should cause weight gain, and increased gravity should cause weight loss. The results contradicted both predictions. Rats in spaceflight showed no weight gain or slight weight loss. Centrifugation at 1.05G and 1.56G did not reduce body weight. Hindlimb unloading—which removes gravitational load from the hindquarters—caused weight loss, the opposite of what the gravitostat predicts. Turner et al. proposed that the Gothenburg group’s results may reflect a stress and injury response from the surgical implantation of loading capsules rather than a homeostatic gravitational mechanism.

As of March 2026, the gravitostat paper has approximately 88 citations. Zero independent laboratories have replicated the core rodent finding. The Ohlsson 2020 human RCT—the most compelling human evidence—was conducted by the same group that proposed the hypothesis, used bioelectrical impedance (imprecise for body composition over three weeks), did not measure energy intake or expenditure, and ran for only three weeks. An independent commentary from Thivel and Boirie at Clermont-Ferrand raised all of these methodological concerns and called for longer trials with DXA-based body composition. The Gothenburg group acknowledged the need for more data.

A 2025 pilot from Wake Forest (DeLong et al.) found that older adults who wore a weighted vest during caloric restriction regained less weight at 24 months and preserved resting metabolic rate better than controls, which is at least consistent with a loading-dependent metabolic effect—but this study was not designed to test the gravitostat mechanism and had nine participants per arm.

The honest position: the gravitostat is a biologically plausible idea with an intriguing human finding from a single group, one negative independent replication, and no confirmed afferent molecule connecting osteocytes to the appetite-regulating regions of the hypothalamus. The Johns Hopkins group (Gao et al., 2024) has independently established that bones signal to the brain via PGE2-driven skeletal interoception during mechanical loading, which provides biological plausibility for the general concept—but their work focuses on bone remodeling, not body weight. The gravitostat should be presented as a hypothesis that is worth following, not as an established mechanism. Chapter Five’s argument for rucking’s fat-loss effects is better grounded in the established caloric expenditure data than in the gravitostat, which may or may not survive independent replication.

The Beavers Negative

In 2025, Beavers and colleagues published a twelve-month randomised controlled trial in JAMA Network Open that deserves its own section in this chapter because it is the most rigorous direct test of a core mechanism this book relies on (Beavers et al., 2025).

One hundred and fifty older adults with obesity were randomised to a weighted vest intervention during intentional weight loss or to a control group. The weighted vest was worn during daily activities. The primary outcome was bone mineral density at the hip and spine. The result: the weighted vest did not prevent bone loss. The intervention group lost bone at the same rate as the control group.

This finding does not refute rucking specifically—the Beavers intervention involved passive vest wearing during daily activities, not structured walking sessions with dynamic cyclical loading—but it establishes that simply adding external weight to the body is insufficient to protect bone. The activity performed while carrying the weight is what matters. Chapter Four discussed this distinction in detail: osteocytes respond to dynamic strain and fluid flow, not static compression.

The Beavers study also coincided with intentional caloric deficit, which may have overridden the osteogenic signal through the RED-S mechanism discussed in Chapter Seven. But the honest reading is that weighted loading’s bone-protective effect, when it exists, depends on specific biomechanical conditions—dynamic gait loading, adequate energy availability, sufficient load magnitude—that have not been tested in combination in a rucking-specific RCT.

Metabolic Syndrome: The Absent Trial

One of the most clinically significant gaps in the entire rucking literature is the complete absence of a randomized controlled trial testing rucking—or loaded walking of any kind—as an intervention for metabolic syndrome.

Metabolic syndrome, defined by at least three of five criteria (elevated waist circumference, elevated fasting glucose, elevated triglycerides, low HDL cholesterol, elevated blood pressure), affects an estimated one in three adults in most industrialized populations. It is the upstream condition for type 2 diabetes and cardiovascular disease. Reversing even two of five criteria reduces the clinical designation and meaningfully lowers long-term risk. This is precisely the population for whom rucking’s combined aerobic and resistance stimulus ought, in theory, to be most useful.

The evidence base for this claim is built from three non-overlapping streams. Walking-based lifestyle trials show that unloaded brisk walking can produce sustained metabolic syndrome remission: the 2026 ELM study (Powell et al., n=618, 24 months) achieved 27.8 percent remission versus 21.2 percent in controls using habit-based walking and dietary changes. But walking alone consistently fails to improve more than two of five criteria. The Hui et al. 2015 cluster RCT (n=374) improved waist circumference and fasting glucose but produced no significant changes in blood pressure, triglycerides, or HDL. The 2024 CardioRACE trial (Lee et al., n=406, one year) showed that even supervised combined aerobic and resistance exercise produced improvements in composite cardiovascular risk score but did not significantly reduce systolic blood pressure, LDL, or fasting glucose individually.

The weighted vest literature adds two pieces: the Ohlsson 2020 gravitostat RCT produced fat loss and lean mass preservation but did not measure any metabolic syndrome biomarkers—no blood pressure, no glucose, no lipids. The Kim et al. 2024 Korean RCT is the only published trial combining weighted vest exercise with insulin resistance measurement, finding improvements in HOMA-IR and resistin compared to circuit training alone and control—but the population was normal-weight obese women, not a metabolic syndrome cohort, and the exercise was circuit training rather than walking.

The convergent inference is that rucking—as a modality that combines the aerobic stimulus of brisk walking with the resistance-type muscular loading of vest exercise—should produce greater metabolic syndrome reversal than either walking or vesting alone. The Bellman et al. 2024 PET/CT study adds mechanistic support: wearing a weighted vest during standing increased glucose uptake in femoral and tibial cortical bone (+19%), quadriceps (+28%), and bone marrow (+17%) compared to sitting, suggesting a direct mechanism for improved insulin sensitivity. But mechanistic support and clinical demonstration are different categories of evidence.

What does not exist—and what would most directly validate rucking’s metabolic syndrome claims—is a trial recruiting adults meeting ATP-III metabolic syndrome criteria, randomizing them to loaded walking at ten to fifteen percent of body weight versus unloaded walking versus control, measuring the full metabolic panel (waist circumference, blood pressure, fasting glucose, triglycerides, HDL) as primary outcomes, running for at least twenty-four weeks, and powering the study to detect the remission rate differences visible in the walking literature.

Vest Evidence and Backpack Rucking: A Transferability Gap

Throughout this book, evidence from weighted vest RCTs has been applied to support rucking recommendations. The Snow et al. 2000 five-year vest-plus-jumping study, the Shaw and Snow 1998 fall-risk trial, the Beavers et al. 2025 INVEST trial, the Ohlsson 2020 gravitostat RCT—all used vests, not backpacks. This transferability assumption is the load-bearing architectural element of the book’s evidentiary structure, and it has never been directly tested.

Backpacks and weighted vests are not biomechanically equivalent. The distinction is fundamental. A backpack places load posteriorly, shifting the whole-body center of mass rearward by an amount proportional to the load. The neuromuscular system compensates by increasing forward trunk lean—typically five to fifteen degrees at loads of twenty to forty percent of body weight. A weighted vest distributes mass circumferentially around the torso, keeping the center of mass within approximately one to two centimeters of its unloaded position, minimizing the postural compensation required.

This difference propagates through the entire kinetic chain. Backpacks produce greater anteroposterior ground reaction force asymmetry and higher loading rates at heel strike due to the pendular motion of posterior mass. The US Army Load Carriage Decision Aid required a separate predictive metabolic equation for vest-borne loads because backpack equations systematically overestimate metabolic cost for vest carriage—meaning vest and backpack loads at the same absolute mass produce measurably different physiological demands. Backpacks at loads above twenty percent of body weight significantly decrease stride length and increase double support time; CrossFit-standard weighted vests at equivalent loads show no significant change in spatiotemporal gait parameters.

For spinal biomechanics, the distinction is most consequential. Vest studies show relatively symmetrical axial loading of the spine. Backpacks create an asymmetric posterior moment that increases lumbar flexion, anterior disc pressure, and paraspinal muscle demand. Vest-based conclusions about spinal safety, fall risk, and balance cannot be applied to backpack rucking without explicit qualification.

What transfers reliably: the fundamental principle of axial skeletal loading is present in both systems, so the osteogenic bone-density argument is mechanistically applicable with the caveat that loading rates—and potentially osteogenic stimulus—may differ between systems. The metabolic demand scaling is directionally consistent, though backpack carriage is slightly more metabolically expensive at the same load. Cardiovascular training stimulus evidence is broadly applicable.

What does not transfer: spinal safety margins, gait kinematics, fall risk, and the precise injury risk profile. Vest evidence suggesting low injury risk at moderate loads should not be directly extrapolated to backpack rucking. The ideal study would compare vest and backpack conditions in the same subjects at matched loads, with simultaneous biomechanical, metabolic, and longitudinal bone and joint health outcomes. No such study exists. Until it does, the two literatures are complementary but not interchangeable, and claims grounded in vest RCTs should be clearly labeled as such.

Neuroplasticity: The Dual-Pathway Model Has Never Been Tested During Rucking

Chapter Eleven made a case for rucking’s cognitive and neuroplasticity benefits based on two converging pathways: the aerobic exercise component stimulating brain-derived neurotrophic factor (BDNF) release, and the mechanical loading component stimulating insulin-like growth factor 1 (IGF-1), with both proteins independently promoting hippocampal neurogenesis, synaptic plasticity, and cognitive function. A third emerging pathway—bone-derived osteocalcin acting on the brain via GPR158 receptor signaling—was presented as additional mechanistic support.

Each component of this model is well-supported. Aerobic exercise and BDNF: a 2025 systematic review (Katsanos et al.) confirmed that walking interventions reliably increase circulating BDNF across fifteen or more studies. IGF-1 and neuroplasticity: multiple mechanistic reviews establish that circulating IGF-1 crosses the blood-brain barrier and promotes hippocampal neuronal proliferation in rodent models, and that resistance training elevates IGF-1 and is associated with improved cognitive function in humans. Osteocalcin and brain: the Moriishi and Komori (2025) review establishes that bone-derived osteocalcin promotes neurotransmitter biosynthesis and spatial learning in mouse models, and lower circulating osteocalcin correlates with cognitive decline in humans.

What has never been done: a study that measures BDNF, IGF-1, or osteocalcin during or after actual loaded walking.

The dual-pathway model, in its application to rucking specifically, is an untested theoretical synthesis. Load carriage studies have examined cognitive performance outcomes—reaction time, working memory, executive function—during military operations involving heavy packs, but these contexts confound the neuroplasticity signal with sleep deprivation, caloric restriction, and thermal stress. No study has measured whether adding a backpack to walking produces greater BDNF or IGF-1 elevation than walking alone, whether this difference is dose-responsive to load magnitude, or whether the osteocalcin pathway is activated at the ground reaction forces typical of recreational rucking (1.5 to 2.5 times body weight per stride).

The ideal test would be a crossover design comparing unloaded walking versus loaded walking at fifteen to twenty percent of body weight at matched speeds, with pre- and post-exercise blood draws measuring serum BDNF, serum IGF-1, plasma osteocalcin, lactate, and irisin, plus a cognitive testing battery. Neuroimaging at baseline and after a chronic intervention would be transformative. This study has not been done.

Cardiac Safety: A Single 1988 Study and One Documented Death

The recommendation in this book that individuals with established cardiovascular disease consult a physician before beginning a rucking program is not merely legal caution. It reflects a genuine and disturbing evidence void.

The structured literature search identified exactly one original study that directly measured cardiovascular responses to weight-loaded walking in cardiac rehabilitation patients: Schram and Hanson (1988), published in the Journal of Cardiopulmonary Rehabilitation. That study is 38 years old. No subsequent randomized controlled trial, no prospective cohort, no systematic safety evaluation has replicated or extended it. Two conference abstracts from Martel et al. (2002, 2003) reported that strength training can improve hemodynamic responses to weighted walking in cardiac rehab participants, but these were never published as peer-reviewed papers. The practical implications of loaded walking for people with coronary artery disease, heart failure, or recent myocardial infarction are classified, in formal evidence terms, as Level C—expert opinion only.

The mechanistic concern is specific and not trivial. Ribeiro et al. (2014) measured central hemodynamic responses during walking with a load equal to ten percent of body weight carried in the hands: augmentation index approximately doubled, central systolic pressure doubled, and central pulse pressure doubled compared to unloaded walking. These parameters reflect left ventricular afterload and myocardial oxygen demand—the very variables that matter for patients with compromised coronary reserve. The hand-carry confound is important (grip isometrics amplify the pressor response), but the direction of the finding—that load carriage substantially increases central aortic hemodynamic stress—is unlikely to reverse when the load moves to the back.

The documented safety event is sobering. A NIOSH fatality investigation (FACE Report 2014-12) documented the death of a 61-year-old firefighter lieutenant during the Pack Test—a three-mile walk with a 45-pound weighted vest—after completing 1.5 miles. Autopsy revealed an acute myocardial infarction on top of a recent asymptomatic MI approximately one week earlier, complicated by left ventricular rupture and cardiac tamponade. NIOSH concluded that the physical stress of the loaded walking probably precipitated the rupture. The 45-pound load is substantially heavier than what this protocol recommends, and the man had undiagnosed coronary disease. But the case establishes that vigorous loaded walking can precipitate fatal cardiac events in individuals with occult cardiovascular pathology—and the population most likely to benefit from this protocol is also the population most likely to carry occult cardiovascular risk.

The practical implication is conservative: all cardiac safety recommendations in this book derive from inference and expert opinion, not from any study that has directly established what percentage of body weight constitutes a safe load for cardiac patients, what hemodynamic thresholds to monitor, or whether rucking produces different cardiovascular risk than unloaded walking at matched heart rates. The honest position is that these questions remain unanswered, and anyone with a cardiac history should approach rucking with physician oversight rather than with confidence that the evidence has resolved the safety question.

The Porter Paradox: Centuries of Evidence, Almost No Science

One of the most striking gaps in the rucking literature is the complete absence of biomechanical and physiological research on the populations that have practiced sustained load carriage continuously for centuries or millennia.

The Andean porter traditions—Quechua and Aymara load carriers operating at altitudes between 2,000 and 4,200 meters with loads of 20 to 45 kilograms, sustained over generations—are represented in the peer-reviewed literature by exactly one study: a 2003/2006 descriptive health survey by Bauer of 101 Inca Trail porters in Peru, documenting back pain, respiratory complaints, and inadequate equipment. No metabolic cost data. No gait analysis. No spinal health assessment. No longitudinal outcomes.

Japanese traditional mountain portering—the bokkashi and nimotsu carrier traditions, documented in Ukiyo-e art and historical records of mountain routes like the Tateyama Kurobe Alpine Route—has zero peer-reviewed biomechanical or physiological studies. Korean jige A-frame carriers: zero studies. Vietnamese don ganh shoulder-pole carriers: zero dedicated physiological studies. Indonesian buruh gendong back carriers: one musculoskeletal symptom survey.

By contrast, Nepali Himalayan porters carrying loads of 80 to 200 percent of body mass via tumpline are relatively well-studied: Bastien et al. (2005, Science) demonstrated that Nepalese porters carry these extreme loads 20 percent more metabolically economically than Western controls; Minetti et al. (2006) showed VO2 kinetics in porters were 49.5 percent faster than mountaineers. African head-loaders are documented by Maloiy et al.’s original 1986 Nature finding of the “free ride” phenomenon at loads up to 20 percent of body mass.

The contrasting populations offer something scientifically valuable that the rucking literature currently lacks: natural experiments in long-term load carriage outcomes. Do Quechua porters at altitude develop different spinal degeneration patterns than sedentary highland populations? Does lifelong load carriage at self-selected pace and the loads typical of subsistence transport produce net benefit or net harm to the skeleton, cardiovascular system, and functional capacity? Malville et al. (2001) found that Nepali commercial porters showed no persistent medical problems attributable to load carriage when they self-paced and rested strategically—a finding consistent with this protocol’s emphasis on conversational-pace rucking. But no one has followed these populations with the imaging and biomarker methods that would answer the question definitively.

The absence of data on the Andean populations is particularly consequential because Quechua and Aymara highland peoples have well-documented physiological adaptations to hypoxia—blunted hypoxic ventilatory response, elevated hemoglobin, distinct EPAS1 gene variants—that differ from Tibetan adaptations. Studying whether their load carriage patterns across a lifetime produce different musculoskeletal and cardiovascular outcomes than military training at equivalent loads would be genuinely illuminating. The research infrastructure needed—portable MRI, basic blood panels, standardized gait analysis protocols—exists. The will to deploy it to these populations has not.

The Fall Prevention Gap: Training Risk Factors But Not Falls Themselves

Chapter Eight made the case that rucking trains the neuromuscular and skeletal systems in ways that reduce fall risk in older adults: improved lower-extremity strength and power, enhanced dynamic balance, preserved bone density that reduces fracture severity when falls occur, and proprioceptive training from the balance perturbations that load carriage creates.

Each of these mechanisms is supported. Shaw and Snow (1998) demonstrated that nine months of weighted vest exercise in postmenopausal women significantly improved dynamic balance, hip abduction strength, knee extension strength, and leg power. Bean et al. (2004) showed that high-velocity weighted vest exercise improved leg power and chair stand time, which are stronger predictors of functional decline and fall risk than strength alone. Walsh et al. (2018) confirmed that loaded walking in older adults increases step width variability and reduces mediolateral dynamic stability—the direction most associated with injurious falls—in a pattern that could serve as progressive balance training if applied in controlled doses.

What has never been tested: whether rucking, as an exercise modality, reduces actual fall incidence in older adults. No randomized controlled trial with fall incidence as a primary outcome has used rucking or backpack-loaded walking as the intervention. The closest evidence comes from weighted vest studies, which are themselves hampered by the vest-to-backpack transferability gap described above. Even among weighted vest trials, no study has used falls as a primary outcome—the evidence extends to fall risk factors, not falls themselves.

The practical implication for the protocol is the same as for other gaps: the conservatism is appropriate. The recommendation to start at five to eight percent of body weight, to use flat and predictable surfaces initially, to never train to exhaustion while loaded—these are the appropriate clinical responses to operating in an evidence gap where the theoretical case is strong, the mechanism-level evidence is supportive, and the direct endpoint evidence is absent.

Six Contradictions the Evidence Contains

The structured gap analysis identified six specific contradictions between research files—places where the evidence points in opposite directions or where two claims cannot be simultaneously optimal. These deserve direct disclosure rather than resolution by omission.

Contradiction 1: IAP is simultaneously minimized and maximized. The pelvic floor literature used in Chapter Seven argues that walking-based load carriage generates intra-abdominal pressure comparable to standing from a chair—low enough to be safe for pelvic floor health. The nasal breathing and spinal stabilization literature used in Chapter Eight argues that nasal-forced diaphragmatic recruitment generates greater IAP—high enough to provide lumbar stabilization under load. Both claims appear in adjacent chapters. Both cannot be simultaneously optimal. The tension is genuine: pelvic floor safety requires limiting IAP spikes, while spinal protection benefits from higher IAP. The resolution likely lies in rate, duration, and distribution of IAP elevation, but no study has examined this directly in the context of loaded walking.

Contradiction 2: The gravitostat is presented as both promising and potentially falsified within the research base itself, as the section above describes.

Contradiction 3: Impact exercise may not protect female bone. The women’s bone health chapter drew on the general principle that loaded exercise provides osteogenic stimulus via Wolff’s Law and ground reaction force scaling. However, a 2024 study using Taiwan Biobank data (Wu et al.) found that high-impact exercise was not significantly protective against osteoporosis in women (odds ratio 0.671, p=0.066)—a near-miss for significance that complicates sex-neutral claims about mechanical loading and bone. The data is not definitive, but it introduces a qualification that the simple GRF-to-bone-density argument does not acknowledge.

Contradiction 4: The boundary between disc-protective and disc-destructive loading has no empirical threshold for rucking. The spinal chapter argues that recreational loads (five to twenty percent of body weight) fall within the beneficial loading zone, while military loads (forty-five-plus kilograms) produce the degeneration documented in Onodera et al.’s 2019 MRI study. This is mechanistically plausible. But the specific threshold for recreational rucking—where exactly the beneficial zone ends and the problematic zone begins—is derived from inference and in vitro disc cell studies at physiological pressures, not from imaging studies of people who ruck. The boundary is estimated, not measured.

Contradiction 5: Acute load carriage increases fall risk while chronic load carriage training reduces it. Walsh et al. (2018) measured that loaded walking in older adults (65-plus years) increases step width variability and reduces mediolateral dynamic stability—the exact direction that increases falls. The same literature supports load carriage as a fall prevention training stimulus. The same activity is simultaneously risky and protective, depending on whether you measure acute perturbation or chronic adaptation. For fragile older adults who have not yet developed the adaptation, the acute risk is real. The protocol’s recommendation to start conservatively and on predictable surfaces reflects this contradiction.

Contradiction 6: Civilian and military data give opposite signals on whether load magnitude predicts injury. The NOLS risk-factor analysis (Hamonko et al., 2011, n=1,283) found that pack weight was not a statistically significant predictor of musculoskeletal injury during recreational hiking. Military data consistently finds load magnitude among the strongest injury predictors. The reconciliation is plausible—autonomy, self-pacing, rest breaks, and smaller absolute loads in recreational contexts versus forced march conditions in military contexts—but the two literatures currently present contradictory practical implications that have not been resolved by any bridging study.

The GRADE Scorecard

This is the single most important table in the book. It summarises the evidence quality for every major claim, using the GRADE framework that Chapter One introduced.

Whole-Book Evidence Scorecard

Core Claim	Best Evidence	GRADE	Status
Running (>81 km/week) chronically suppresses testosterone	Replicated observational	Moderate	Supported at high volumes; not supported at recreational volumes
Loaded walking produces distinct biomechanical adaptations	Systematic review (military)	Moderate	Supported in military populations; civilian replication limited
Rucking at 15-25% BW produces osteogenic GRF	Biomechanical measurement	High	GRF measured directly; bone density response to rucking specifically: untested
Weighted vest preserves bone in older adults	RCT (JAMA Network Open)	Moderate	NEGATIVE FINDING (Beavers 2025). Passive vest wearing did not prevent bone loss
Non-supervised exercise improves BMD in osteopenic women	Meta-analysis	High	Supported (Sánchez-Trigo 2022, SMD 0.73-0.85). Broader than rucking alone
Rucking elevates testosterone at recreational loads	Mechanistic inference	Very Low	Zero direct evidence. Extrapolated from strongman (5-10× heavier) and military (diverse training)
Nasal breathing improves ventilatory efficiency	RCTs in cycling/clinical populations	Moderate	Supported in non-rucking contexts. Zero data during loaded walking
Rucking protects intervertebral discs	Mechanistic inference	Very Low	No imaging studies. Plausible from disc physiology; unconfirmed
Rucking reduces all-cause mortality	No evidence	Not ratable	Zero mortality data for rucking. Running: 30% reduction (Lee 2014, N=55,137)
Recreational running destroys knee cartilage	Contradicted by evidence	CORRECTED	Recreational runners have 3.5% OA vs 10.2% sedentary (Alentorn-Geli 2017, N=125,810)
Gravitostat reduces fat mass via osteocyte signaling	Single-group RCT; one negative independent replication	Very Low	Ohlsson et al. 2020 (Gothenburg, n=69, 3 weeks). Turner et al. 2020 (Oregon State) found results inconsistent with gravitostat predictions
Vest evidence transfers to backpack rucking	Mechanistic inference	Very Low	No head-to-head vest-vs.-backpack RCT exists. Backpacks produce higher loading rates, greater trunk lean, and different spinal mechanics than vests
Rucking improves metabolic syndrome parameters	No evidence	Not ratable	Zero RCTs testing rucking for MetSyn. Walking alone improves 2 of 5 criteria (Hui 2015). Loaded vest exercise improves insulin resistance (Kim 2024) but not in MetSyn population
Rucking elevates BDNF, IGF-1, and osteocalcin	Mechanistic inference	Very Low	Each pathway individually well-supported. Zero studies have measured any of these biomarkers during loaded walking
Rucking prevents falls in older adults	Proxy evidence from vest RCTs	Low	No RCT with fall incidence as primary outcome. Multiple vest RCTs show improvement in fall risk factors (Shaw & Snow 1998; Bean et al. 2004)
Loaded walking is safe for cardiac patients	Single 1988 study; one fatality documented	Very Low	Schram & Hanson 1988 is the only direct study (Level C evidence). NIOSH FACE 2014-12 documents one death during 45 lb loaded walking in undiagnosed coronary disease

GRADE: High = direct RCTs or large meta-analyses; Moderate = RCTs with limitations or strong replicated observational data; Low = observational, single study, or indirect evidence; Very Low = mechanistic reasoning, extrapolation, or expert opinion only.

The pattern is clear. The book’s strongest claims—biomechanical GRF measurements, Wolff’s Law mechanisms, the Sánchez-Trigo meta-analysis for bone density—rest on genuine evidence. The book’s most distinctive claims—rucking’s hormonal advantage, nasal breathing under load, disc health, neuroplasticity, metabolic syndrome management—rest on mechanistic inference that has not been directly tested in a rucking-specific study. The gravitostat mechanism, which Chapter Five presented as a body-weight homeostatic argument for loaded exercise, rests almost entirely on one research group and has failed its only independent replication attempt. And one of the book’s original claims—that running destroys joints—was contradicted by large-scale epidemiological evidence and has been corrected throughout these pages.

The honest position is that rucking is a plausible, mechanistically grounded, practically useful exercise modality with zero Tier 1 evidence for any civilian outcome, whose specific claims range from well-supported (at Tier 2 via proxy RCTs) to entirely untested. The protocol recommended in this book is conservative precisely because the evidence is weaker than the confidence of the prose might suggest. The conservatism is not legal caution. It is the appropriate response to operating at the frontier of inference.

What Indirect Evidence Actually Means

The preceding sections document fourteen gaps and six internal contradictions—plus three corrections that the adversarial review process identified. A critic of this protocol could read them and conclude that the scientific foundation is insufficient—that you should not start a rucking practice, or trust the hormonal argument, or follow the nasal breathing instruction, until every gap has been filled. That conclusion would be both logically wrong and practically harmful.

Evidence-based medicine has never been evidence-complete medicine. Every clinical decision made in the absence of a perfect randomized controlled trial—and the overwhelming majority of clinical decisions are made in exactly that absence—relies on convergent indirect inference, mechanistic coherence, population extrapolation, and the accumulated judgment of practitioners who have observed more than the trials have yet quantified. The question is not whether uncertainty exists. Uncertainty always exists. The question is whether the inference is reasonable given the evidence that does exist, and whether it is honestly presented.

The hormonal argument in this book is not speculation. It is a specific, directional, convergent inference from three bodies of evidence that independently point toward the same conclusion. Each body of evidence has been accumulated by researchers who were not studying rucking, using populations that were not recreational middle-aged ruckers, and arriving at mechanistic findings that, taken together, constitute a coherent case. The absence of a direct rucking-versus-running testosterone RCT does not mean the inference is unfounded. It means the inference has not yet been directly tested—which is a different thing.

The spinal argument is mechanistic inference from disc physiology. The mechanistic inference is coherent. The disc hydration mechanism is established. The cycling of compression and decompression during walking is established. The question of whether adding a rucksack amplifies this beneficially or moves the loading into a range that impairs it has not been answered by imaging, but the load parameters recommended in this protocol—five to twenty percent of body weight—are substantially below the ranges associated with documented disc height reduction in military imaging studies. The gap is real; the inference is not reckless.

The nasal breathing evidence is the most directly mechanistic of all the gaps: the physiological pathways invoked are well-established in other exercise contexts, and the question of whether they transfer to loaded walking is a test of context specificity, not a test of whether the pathways exist. The pathways exist. Context specificity is a legitimate scientific question, but it is a question that is generally answered yes when the fundamental physiology is mode-independent.

What convergent evidence means, as a matter of epistemology, is that multiple independent lines of inquiry are pointing in the same direction without having been designed to converge. No single study was trying to make the case for rucking’s hormonal superiority to running. The EHMC researchers were trying to understand testosterone suppression in athletes. The resistance training endocrinologists were trying to understand anabolic adaptation. The military occupational physiologists were trying to understand load carriage performance. When findings from these unrelated projects converge on a consistent directional inference, that convergence is itself evidence—weaker than a direct test, but not nothing.

The Research Agenda

These gaps are not embarrassments. They are a research agenda.

The hormonal gap requires a crossover RCT: matched men, rucking versus running at matched energy expenditure, dense blood sampling across a morning hormonal window, at baseline and after eight and sixteen weeks. This study is feasible. It requires a sports physiology laboratory with the willingness to conduct seventeen blood draws per subject per session. The cost is real but not prohibitive. The value to the field would be substantial.

The disc hydration gap requires a T2 or T1 rho MRI protocol designed for repeated measurement: participants scanned before and after a sixty-minute rucking session at fifteen percent of body weight, then again the following morning after overnight recovery, with comparison to both unloaded walking and rest conditions. This study would be methodologically straightforward for any sports medicine radiology unit with an interest in exercise physiology. That it has not been done reflects the relative neglect of civilian rucking as a research topic, not any technical barrier.

The women forty-to-seventy gap requires biomechanical studies that nobody has prioritized because the military, which funds most load carriage research, has limited institutional interest in the exercise physiology of postmenopausal civilians. This is the most politically tractable gap to close: public health funding agencies interested in fall prevention, osteoporosis management, and functional independence in aging women should be actively funding this research, because the epidemiology of the population that needs it most is precisely the population that has been most systematically excluded from the biomechanical data.

The nasal breathing during rucking gap requires a simple crossover protocol: the same session, nasal-only versus oronasal breathing, measuring RER, PETCO2, blood lactate, HRV, and if possible IAP through intravesical pressure or ultrasound-measured diaphragmatic excursion, during sixty minutes of loaded walking at a standardized speed and load. This is a graduate-level exercise physiology project. It would resolve, in a single well-designed study, the most directly testable uncertainty in this entire protocol.

The older men cardiovascular gap requires what the cardiovascular rehabilitation community already knows how to do—twelve to twenty-four week exercise interventions with pre-post VO2max, pulse wave velocity, and echocardiographic assessment—but applied to loaded walking rather than cycle ergometry or treadmill running.

The gravitostat gap requires an independent replication by a laboratory outside Gothenburg: matched loads, DXA-based body composition (not bioelectrical impedance), measurement of energy intake and expenditure, at least twelve weeks duration, with metabolic syndrome biomarkers as secondary endpoints. This study would either confirm a genuinely novel fat-regulation mechanism or redirect the field toward simpler caloric-expenditure explanations for loaded exercise’s body-composition effects.

The vest-to-backpack gap requires a head-to-head crossover study: the same subjects, the same absolute loads, carrying conditions in both a circumferential vest and a posterior backpack with hip transfer, measuring ground reaction forces (particularly loading rates), lumbar spine kinematics via motion capture, metabolic cost, and—in a subgroup willing to undergo imaging—a pre- and post-session MRI of disc hydration. This is a single experiment that would resolve what is currently the load-bearing assumption underlying an entire branch of evidence in this book.

The metabolic syndrome gap requires a trial recruiting ATP-III-defined metabolic syndrome adults, randomizing to rucking at ten to fifteen percent of body weight versus unloaded walking matched for duration versus no-exercise control, measuring the full metabolic panel at twelve and twenty-four weeks. The ELM study design (Powell et al. 2026) provides a template; substituting loaded for unloaded walking is the minimal intervention change needed to answer the core question.

The BDNF-IGF-1-osteocalcin gap requires an acute crossover design: unloaded walking versus loaded walking at fifteen to twenty percent of body weight at matched speeds, with blood draws measuring serum BDNF, IGF-1, plasma osteocalcin, lactate, and irisin pre-exercise and at fifteen and sixty minutes post-exercise. A graduate student with an exercise physiology laboratory, access to an ELISA platform, and a treadmill could run this study. It would be the first direct test of whether the dual-pathway neuroplasticity model applies to the modality this book is built on.

The cardiac safety gap requires what the cardiac rehabilitation community already knows how to design: a Phase II cardiac rehab study adding progressively loaded walking (starting at two to three percent of body weight) to standard aerobic programming, with telemetry monitoring, continuous blood pressure, and rate-pressure product targets. The Martel et al. conference abstracts from 2002 and 2003 suggest this line of work was started and then abandoned. Someone should finish it.

The fall prevention gap requires a twelve-to-eighteen-month RCT in adults over 65 with documented fall risk, randomizing to backpack-loaded walking at ten percent of body weight versus standard fall prevention exercise versus control, with incident fall rate as the primary outcome and functional mobility measures as secondaries. This study is within reach of any geriatric exercise research center.

The porter populations gap requires international collaboration: a standardized protocol of metabolic cost measurement, gait analysis, spinal MRI, and longitudinal health outcomes applied to Quechua and Aymara load carriers in Peru and Bolivia, Korean jige carriers, and Vietnamese don ganh workers. The comparison between these populations—who ruck as an economic and survival necessity at self-selected pace—and both military populations (who ruck under compulsion at forced pace) and recreational populations (who ruck by choice) would be among the most illuminating natural experiments available to load carriage science.

The most important study that does not exist, and the one that would most fundamentally validate or invalidate this book’s central thesis, is a prospective cohort study of recreational ruckers with all-cause mortality as the primary outcome. Running has this data (Lee 2014, N=55,137). Resistance training has this data (Saeidifard 2019, meta-analysis). Rucking does not. Until it does, the claim that rucking is “the most complete single modality” remains a mechanistic argument, not an epidemiological one. A registry-based cohort study—recruiting regular ruckers, matching them with non-rucking controls, and following both groups for ten to fifteen years—would be transformative. It is also the study that is least likely to be funded, because the commercial interests that fund exercise research have no revenue model for an activity that requires no equipment subscription.

These studies need to be done. The people who will do them are reading exercise physiology journals and supervising graduate students at universities where the equipment exists and the expertise is available. If this book reaches any of them: here is your research agenda.

Why This Chapter Earns Your Trust

There is a paradox operating in this chapter that deserves to be named directly.

By telling you where the evidence is weakest, this book is asking you to trust the evidence more, not less. The mechanism is straightforward. An author who presents only confident claims gives you no basis for calibrating which claims to hold firmly and which to hold tentatively. An author who distinguishes between a direct RCT finding and a convergent inferential argument gives you a precision instrument: you can weight the evidence appropriately, remain appropriately skeptical of the inferential claims, and return to them when the direct evidence eventually arrives. The former author is more persuasive in the short term. The latter author is more useful over time.

Every fitness protocol ever written has evidence gaps. The ones that do not disclose them are not operating from stronger evidence—they are operating from less intellectual honesty. The protocol in this book rests on what it rests on, and what it rests on has been described with the most technical specificity available. Where it is strong, it is strong. Where it is indirect, it has been labeled as indirect. Where it extrapolates from one population to another, the extrapolation has been named.

The practical conservatism embedded in this protocol—the graduated load increases, the emphasis on symptom monitoring, the physician consultation recommendation for older adults and those with cardiovascular histories, the instruction to stop if spinal symptoms emerge and evaluate load before progressing—is not the product of legal caution. It is the product of the gaps described in this chapter. If the disc hydration evidence were direct and robust, the load prescription could be stated with more specificity. If the biomechanical data for women over fifty existed at the resolution it exists for young male soldiers, the female protocol could be more precisely calibrated. In the absence of that data, the responsible position is the conservative one: start with less than you think you need, progress more slowly than you think necessary, and let the absence of symptoms guide the upward adjustments.

The evidence gaps are not a reason to distrust this protocol. They are the reason the protocol is designed the way it is.

You are now in possession of the complete picture. Not just the evidence that supports the argument, but the shape of the argument’s limits—where the ground beneath it is firm and where it is provisional, where you are standing on direct experimental confirmation and where you are standing on the kind of careful inference that science deploys while it waits for the experiments that will eventually either confirm or correct it. That kind of transparency is what an honest book owes its reader.

The man on the trail above Akureyri with a pack on his back and his mouth closed is not waiting for the RCT. He has been at this for years. His spine does not hurt. His sleep is good. His legs carry him further than they did a decade ago. The functional measures—the chair stands, the stair climbs, the absence of the injuries that accumulate in the medical records of his running peers—speak in the language that precedes controlled trials, which is the language of persistent embodied experience across time.

That language is not science. But science, at its best, is a formalization of the questions that embodied experience generates. The questions are good ones. The experiments are coming.

In the meantime, put on the pack.

Alentorn-Geli, E., Samuelsson, K., Musahl, V., Green, C. L., Bhandari, M., & Karlsson, J. (2017). The association of recreational and competitive running with hip and knee osteoarthritis: A systematic review and meta-analysis. Journal of Orthopaedic and Sports Physical Therapy, 47(6), 373–390. https://doi.org/10.2519/jospt.2017.7137

Beavers, D. P., Walkup, M. P., Weaver, A. A., Houston, D. K., Shapses, S. A., Kritchevsky, S. B., & Nicklas, B. J. (2025). Weighted vest use during intentional weight loss in older adults with obesity. JAMA Network Open, 8(7), e2516772. https://doi.org/10.1001/jamanetworkopen.2025.16772

DeHart, N. E. (1999). Relationship between the talk test and ventilatory threshold [Master's thesis]. University of Wisconsin–Milwaukee.

Kwon, Y.-H., Kang, K. W., & Chang, J. S. (2023). The talk test as a useful tool to monitor aerobic exercise intensity in healthy population. Journal of Exercise Rehabilitation, 19(3), 163–169. https://doi.org/10.12965/jer.2346170.085

Lee, D., Pate, R. R., Lavie, C. J., Sui, X., Church, T. S., & Blair, S. N. (2014). Leisure-time running reduces all-cause and cardiovascular mortality risk. Journal of the American College of Cardiology, 64(5), 472–481. https://doi.org/10.1016/j.jacc.2014.04.058

Lee, G.-Y., Seo, H.-D., & Lee, J.-H. (2025). Ventilatory responses to progressive treadmill speeds in nasal, oral, and oronasal breathing conditions. Journal of Exercise Rehabilitation, 21(2), 112–119. https://doi.org/10.12965/jer.2550184.092

Lo, G. H., Driban, J. B., Kriska, A. M., McAlindon, T. E., Souza, R. B., Petersen, N. J., Luo, H., Storti, K. L., Caserotti, P., Eaton, C. B., Hochberg, M. C., Jackson, R. D., Kwoh, C. K., Nevitt, M. C., & Suarez-Almazor, M. E. (2017). Is there an association between a history of running and symptomatic knee osteoarthritis? A cross-sectional study from the Osteoarthritis Initiative. Arthritis Care & Research, 69(2), 183–191. https://doi.org/10.1002/acr.22939

Lo, G. H., Musa, S. M., Driban, J. B., Kriska, A. M., McAlindon, T. E., Souza, R. B., Petersen, N. J., Storti, K. L., Eaton, C. B., Hochberg, M. C., Jackson, R. D., Kwoh, C. K., Nevitt, M. C., & Suarez-Almazor, M. E. (2018). Running does not increase symptoms or structural progression in people with knee osteoarthritis: Data from the Osteoarthritis Initiative. Clinical Rheumatology, 37(9), 2497–2504. https://doi.org/10.1007/s10067-018-4121-3

Mahmod, S. R., Zourdos, M. C., Arent, S. M., et al. (2022). Regulated monosyllabic talk test vs. Counting talk test: Utterance rates and exercise intensity. Journal of Sports Sciences, 40(8), 857–864. https://doi.org/10.1080/02640414.2021.2021776

Shei, R.-J., Chapman, R. F., Gruber, A. H., & Mickleborough, T. D. (2017). Respiratory effects of thoracic load carriage exercise and inspiratory muscle training as a countermeasure. Applied Physiology, Nutrition, and Metabolism, 42(5), 530–537. https://doi.org/10.1139/apnm-2016-0596

Sørensen, M., Larsen, B. J., & Petersen, J. (2020). Validity of the talk test as a method to estimate ventilatory threshold in cardiac patients. European Journal of Preventive Cardiology, 28(15), e22–e24. https://doi.org/10.1177/2047487320939549

Timmins, K. A., Leech, R. D., Batt, M. E., & Edwards, K. L. (2017). Running and knee osteoarthritis: A systematic review and meta-analysis. American Journal of Sports Medicine, 45(6), 1447–1457. https://doi.org/10.1177/0363546516657531