01
Controlled load fieldResearch project
CogniLoad
Separating intrinsic load, extraneous load, and length demand in LLM reasoning
Core question
Can we disentangle how much task length, intrinsic difficulty, and distractor ratio each contribute to success or failure in multi-step reasoning?

01
Intrinsic load
Contour ridges mark local steps where entities, attributes, and conditions interact.02
Extraneous load
Distractor fields are valid statements that do not belong to the queried state path.03
Length demand
The path gets longer, so the relevant state has to survive more sequential updates.How to read it
Independent load controls
CogniLoad keeps sequence length, local difficulty, and distractor ratio separable before asking for the final state.

Factorial grid
Difficulty is a coordinate, not one number.
CogniLoad samples puzzles across three independent cognitive-load dimensions, then asks whether the final queried state can still be recovered.
d rho Nd
Intrinsic cognitive load
Intrinsic difficulty
How many entities, attributes, and conditions interact inside each reasoning step.
d in {1, 3, 5, 7, 10}rho
Extraneous cognitive load
Distractor density
How much irrelevant but valid state information surrounds the queried path. Lower rho means denser distractors; intermediate rho creates the hardest filtering regime.
rho in {5, 10, 25, 50, 75, 90, 95}N
Length / germane-demand proxy
Task length
How many sequential statements must be processed before the answer can be recovered.
N in {20, 50, 100, 250}What it reveals
A benchmark score becomes a failure fingerprint.

Task length is the dominant stressor: moving from short to long sequences causes the largest degradation.
Intermediate distractor ratios create a characteristic U-shaped response because filtering and state updating pull in opposite directions.
Under high load, many failures are state-tracking mistakes rather than formatting errors or unsolvable instances.

What it separates
Failure fingerprints
A wrong answer can be traced to pressure from long context, local constraint ridges, or distractor interference.
CogniLoad is a synthetic natural-language reasoning benchmark grounded in cognitive load theory. It turns benchmark difficulty into three independently controlled load dimensions instead of one blended score.
Back to research overview