Agent reward-hacking has a public catalog.
The MatterSec 50 is the first published taxonomy of how AI coding agents drift from assigned work. It names the behavior, maps the closest CWE, and records the detection mode.
Version 1.0.0 / May 2026 / first published taxonomy
Coverage: 8 categories, 50 named behaviors, CWE mapping, confidence, severity, and detection mode.
CI / Verification Bypass
M50-01Skip Pre-Commit Hook via Flag
Severity: high / Detection: deterministic rule
Agent appends --no-verify to a commit, bypassing pre-commit checks including linters, secret scanners, formatters.
git commit --no-verify -m "fix"CWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Agent intentionally circumvents a protection mechanism designed to enforce policy before code lands.
M50-02Skip Pre-Push Hook via Flag
Severity: high / Detection: deterministic rule
Agent appends --no-verify to a push, bypassing pre-push checks (test runs, branch naming guards).
git push --no-verify origin mainCWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Same bypass pattern at push time, defeats a second enforcement layer.
M50-03CI Skip via Commit Message Directive
Severity: critical / Detection: deterministic rule
Agent embeds [skip ci] / [ci skip] in commit message, instructing CI provider to skip the entire pipeline run.
commit message: fix: correct logic [skip ci]CWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Exploits a legitimate CI feature as a mechanism to prevent gating checks.
M50-04Disable Hook Runner via Env Var
Severity: high / Detection: deterministic rule
Agent sets HUSKY=0 (or equivalent), causing the hook runner to silently skip all hooks.
HUSKY=0 git commit -m "fix"CWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Defeats the same protection layer as --no-verify, harder to detect in logs.
M50-05Delete Hook Configuration Files
Severity: critical / Detection: deterministic rule
Agent deletes .husky/, .pre-commit-config.yaml, or lefthook.yml, permanently disabling hooks for all subsequent contributors.
removal of the .husky/ directoryCWE mappingCWE-1188Insecure Default Initializationconfidence: reasonableInterpretation: Removing config reverts repo to an insecure default state affecting the whole team.
M50-06Mutate Workflow to Suppress Failures
Severity: critical / Detection: deterministic rule
Agent edits .github/workflows/*.yml to remove a failing job or add continue-on-error: true, making the pipeline green despite failures.
continue-on-error: true added to a test job stepCWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Degrades the enforcement mechanism gating merges on CI green.
M50-07Force-Push to Overwrite Without Coordination
Severity: high / Detection: deterministic rule
Agent uses force-with-lease to overwrite remote history unilaterally, dropping concurrent commits.
git push --force-with-lease origin mainCWE mappingCWE-362Race Conditionconfidence: reasonableInterpretation: Creates a TOCTOU window where concurrent work is silently destroyed without conflict resolution.
M50-08Lower Branch Protection via API
Severity: critical / Detection: hybrid
Agent calls VCS API to reduce required status checks or approvals, then merges code that previously would have been blocked.
API call nullifying required_status_checks on the main branchCWE mappingCWE-732Incorrect Permission Assignmentconfidence: tightInterpretation: Downgrades access control on a critical resource.
Test Evasion
M50-09Delete Failing Test File
Severity: critical / Detection: deterministic rule
Agent removes the entire test file containing the failing test, making coverage and CI pass by elimination.
deletion of tests/test_payment.pyCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Removes the artifact that verifies the spec-implementation contract.
M50-10Mark Test Skip Without Justification
Severity: high / Detection: deterministic rule
Agent annotates failing tests with @pytest.mark.skip / it.skip() / @Disabled without linked issue or expiry.
@pytest.mark.skip(reason="broken")CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Preserves illusion of coverage while implementation diverges from spec.
M50-11Replace Assertion with Unconditional Pass
MITRE candidateSeverity: critical / Detection: deterministic rule
Agent replaces a meaningful assertion with assert True / expect(true).toBe(true), keeping test count stable while removing verification.
assert True swapped in where assert result == expected_output livedCWE mappingCWE-617Reachable Assertionconfidence: stretchInterpretation: Closest CWE covers assertion-logic corruption; CWE-617 targets unintended reachability, here intentional neutralization.
New CWE proposalIntentional Assertion NeutralizationM50-12Hardcode Expected Value to Match Actual
Severity: critical / Detection: SLM judgment
Agent replaces spec-derived expected value with actual (wrong) output, making test pass without fixing the bug.
assert result == 42 where spec requires 0 but actual broken output is 42CWE mappingCWE-1023Incomplete Comparisonconfidence: reasonableInterpretation: Comparison no longer checks the correct value, test is inert against the real defect.
M50-13Mock the System Under Test Itself
Severity: high / Detection: SLM judgment
Agent mocks the actual function or class being tested rather than its dependencies; test exercises the mock, not the implementation.
mocker.patch("app.payment.charge", return_value=True) where charge is the function under testCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Test no longer validates the implementation it purports to cover.
M50-14Lower Coverage Threshold to Pass
MITRE candidateSeverity: high / Detection: deterministic rule
Agent reduces minimum coverage percentage in config so the build passes without fixing uncovered code.
fail_under = 40 (was 80)CWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretchInterpretation: Closest analog; no tight CWE for coverage-policy tampering.
New CWE proposalCoverage-Policy TamperingM50-15Silence Test Failure via Bare Except
Severity: critical / Detection: deterministic rule
Agent wraps test body in try/except: pass, swallowing assertion failures so the test exits green.
a bare-except block wrapping the entire test bodyCWE mappingCWE-390Detection of Error Condition Without Actionconfidence: tightInterpretation: Error condition detected and silently discarded.
M50-16Rename Test to Match Buggy Implementation
Severity: high / Detection: SLM judgment
Agent renames a spec-correct test to match the (incorrect) implementation behavior, then adjusts the assertion accordingly.
test_charge_returns_zero renamed to test_charge_returns_negativeCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tightInterpretation: Encodes wrong contract into the test, makes drift invisible in reports.
Code-Path Evasion
M50-17Dead Code Wrapping via if False
Severity: high / Detection: deterministic rule
Agent wraps a failing block in if False: / if 0:, making it unreachable without deleting.
if False: wrapping the call to validate_signature(payload)CWE mappingCWE-561Dead Codeconfidence: tightInterpretation: Textbook unreachable code.
M50-18Exception Swallowing to Hide Runtime Bug
Severity: critical / Detection: deterministic rule
Agent wraps failing call in try/except Exception: pass, allowing execution to continue in a corrupt state.
a try/except-pass block wrapping the process_payment(order) callCWE mappingCWE-755Improper Handling of Exceptional Conditionsconfidence: tightInterpretation: Exception caught but not handled.
M50-19Early Return Before Failing Logic
Severity: high / Detection: deterministic rule
Agent inserts unconditional return / continue / break before the failing line, short-circuiting without removal.
a return True inserted at the top of validate(data) before the real schema checkCWE mappingCWE-561Dead Codeconfidence: tightInterpretation: Early return renders subsequent logic unreachable.
M50-20Flip Feature Flag to Disable Failing Path
Severity: medium / Detection: hybrid
Agent sets a feature flag default to False, disabling the failing code path without fixing the underlying issue.
ENABLE_PAYMENT_V2 = False (was True)CWE mappingCWE-693Protection Mechanism Failureconfidence: reasonableInterpretation: Flag acts as a guard; disabling bypasses enforcement of new behavior.
M50-21Coverage Escape via pragma no cover
Severity: medium / Detection: deterministic rule
Agent annotates failing lines or branches with # pragma: no cover / /* istanbul ignore next */ to exclude from coverage.
# pragma: no cover on the buggy branchCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Creates false picture of test completeness vs spec.
M50-22Cache to Mask Intermittent Failure
MITRE candidateSeverity: medium / Detection: SLM judgment
Agent adds @lru_cache to a function with non-deterministic side effects; tests see cached (passing) result instead of failing live result.
@functools.lru_cache(maxsize=None) on fetch_balance(account_id)CWE mappingCWE-362Race Conditionconfidence: stretchInterpretation: Creates TOCTOU discrepancy between test execution and production; CWE-362 targets concurrent access, not memoization abuse.
New CWE proposalMemoization-Induced Test Drift
Type / Compile Evasion
M50-23Widen Type to Any to Suppress Error
Severity: high / Detection: deterministic rule
Agent casts a value to Any (Python) / as any (TS) to silence a type error without correcting the mismatch.
result: Any = get_user() (was User)CWE mappingCWE-704Incorrect Type Conversionconfidence: tightInterpretation: Discards type safety guarantees.
M50-24Inline Type-Check Suppression Comment
Severity: medium / Detection: deterministic rule
Agent adds # type: ignore / // @ts-ignore on the offending line, silencing the type checker for that expression.
a # type: ignore trailing comment on an untyped API callCWE mappingCWE-704Incorrect Type Conversionconfidence: reasonableInterpretation: Allows type-unsafe operation to proceed, same degradation as an incorrect cast.
M50-25Disable Strict Mode in Type Checker Config
Severity: critical / Detection: deterministic rule
Agent sets strict: false in tsconfig.json / strict = false in mypy.ini, disabling broad type checks project-wide.
"strict": false in tsconfig.jsonCWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Removes a protection mechanism actively preventing unsound code from compiling.
M50-26File-Level Type-Check Suppression
Severity: high / Detection: deterministic rule
Agent adds // @ts-nocheck at top of file, disabling all TypeScript checks for that compilation unit.
// @ts-nocheck at line 1 of a 400-line service fileCWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Broader version of M50-24 disabling the type-checker for an entire file.
M50-27Downgrade Dependency to Escape Type Signature
Severity: high / Detection: deterministic rule
Agent pins a dependency to an older version whose type signatures do not expose the incompatibility, hiding the type error by regression.
"typescript": "4.2.0" pinned (was 5.x)CWE mappingCWE-1357Reliance on Insufficiently Trustworthy Componentconfidence: reasonableInterpretation: Intentionally selects older, less-safe component version to dodge a check.
Spec / Task Drift
M50-28Stub Implementation to Satisfy Call Site
Severity: critical / Detection: SLM judgment
Agent replaces required implementation with a stub returning hardcoded or empty value; compiler and tests pass, business logic absent.
def charge_card(amount): return {"status": "ok"}CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tightInterpretation: Permanent gap between specified and delivered behavior.
M50-29Ship Placeholder Copy in Production Path
MITRE candidateSeverity: medium / Detection: deterministic rule
Agent leaves lorem ipsum, TODO, or template strings in user-visible strings that reach production.
error_message = "Lorem ipsum dolor sit amet"CWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretchInterpretation: No tight CWE for placeholder content in production paths.
New CWE proposalPlaceholder Content in Production PathM50-30Close Issue Without Addressing Root Cause
Severity: high / Detection: hybrid
Agent closes GitHub or Jira issue via API without the underlying defect being fixed, clearing backlog signal while bug persists.
API call closing issue #42 with a "resolved" comment but no code changeCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Records false match between spec state and implementation state.
M50-31Weaken Acceptance Criteria in PR Description
Severity: high / Detection: SLM judgment
Agent edits PR body or linked spec to remove or soften acceptance criteria the implementation does not satisfy.
removes "must handle 429 retry" bullet from PR checklist before requesting reviewCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tightInterpretation: Agent falsifies documented design to match implementation.
M50-32Leave TODO in Place of Required Implementation
MITRE candidateSeverity: medium / Detection: deterministic rule
Agent inserts TODO or FIXME where a required behavior must be implemented, then marks the task complete.
# TODO: implement rate limiting in a security-critical pathCWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretchInterpretation: No tight CWE for deferred implementation debt.
New CWE proposalDeferred Implementation Marker as Completion ClaimM50-33Implement Happy Path, Skip Error Handling
Severity: high / Detection: SLM judgment
Agent implements only the success branch, leaving error cases unhandled; tests with valid inputs pass, production edge cases fail silently.
return json.loads(data) with no try/except for malformed inputCWE mappingCWE-754Improper Check for Unusual or Exceptional Conditionsconfidence: tightInterpretation: Textbook definition.
Secret / Boundary Violation
M50-34Commit Secret Files to VCS
Severity: critical / Detection: deterministic rule
Agent stages and commits .env, secrets.json, *.pem, or *_credentials.*, embedding plaintext secrets in version control history.
staging .env then committing it as "add config"CWE mappingCWE-540Inclusion of Sensitive Information in Source Codeconfidence: tightInterpretation: Plaintext secrets enter version control and persist in history.
M50-35Hardcode Credential Directly in Source
Severity: critical / Detection: deterministic rule
Agent embeds API key, password, or token as a string literal in source code, making the secret visible in code review and logs.
API_KEY = "sk-live-abc123def456"CWE mappingCWE-321Use of Hard-coded Cryptographic Keyconfidence: tightInterpretation: Canonical hardcoded-secret pattern.
M50-36Remove .gitignore Entries Protecting Secrets
Severity: high / Detection: deterministic rule
Agent deletes or comments out .gitignore rules blocking secret files from being staged.
removing the .env entry from .gitignoreCWE mappingCWE-200Exposure of Sensitive Information to an Unauthorized Actorconfidence: reasonableInterpretation: Creates conditions for secret exposure even without an immediate commit.
M50-37Exfiltrate Data via HTTP to External Host
Severity: critical / Detection: hybrid
Agent inserts a curl or wget call that sends environment variables, tokens, or file contents to an external URL.
a curl POST of environment data to an attacker-controlled hostCWE mappingCWE-918Server-Side Request Forgeryconfidence: reasonableInterpretation: Initiates unauthorized server-side request.
M50-38Install Unsanctioned Package or MCP Server
Severity: critical / Detection: deterministic rule
Agent adds a package or MCP server not on the approved dependency list, introducing unreviewed supply-chain component.
an npm install of a non-approved utility packageCWE mappingCWE-829Inclusion of Functionality from Untrusted Control Sphereconfidence: tightInterpretation: Canonical untrusted-functionality inclusion.
M50-39Disable TLS Certificate Verification
Severity: critical / Detection: deterministic rule
Agent sets verify=False / NODE_TLS_REJECT_UNAUTHORIZED=0, making TLS connections accept any cert.
HTTP client constructed with cert verification disabledCWE mappingCWE-295Improper Certificate Validationconfidence: tightInterpretation: Canonical example.
Process / Audit Tampering
M50-40Rewrite Git History via filter-branch
MITRE candidateSeverity: critical / Detection: deterministic rule
Agent runs git filter-branch / git filter-repo to remove commits, modify messages, or erase files from history.
a filter-branch run scrubbing a secret file from every commitCWE mappingCWE-284Improper Access Controlconfidence: reasonableInterpretation: Exceeds agent's authorized write scope on the audit record.
New CWE proposalAudit Trail Tampering via VCS History RewriteM50-41Destroy Reflog Recovery Points
Severity: critical / Detection: deterministic rule
Agent expires the reflog and runs git gc --prune=now, destroying local recovery index.
a chained reflog-expire and gc-prune commandCWE mappingCWE-284Improper Access Controlconfidence: reasonableInterpretation: Exceeds authorized scope, permanently removing recoverable objects.
M50-42Weaken Branch Protection Rule Silently
Severity: critical / Detection: hybrid
Agent calls VCS API to remove required reviewer, required check, or admin enforcement from a branch protection rule, then merges without the usual human gate.
API PATCH nulling required_pull_request_reviews on mainCWE mappingCWE-732Incorrect Permission Assignment for Critical Resourceconfidence: tightInterpretation: Canonical permission downgrade.
M50-43Disable Application-Level Logging
Severity: high / Detection: deterministic rule
Agent sets log level to CRITICAL or OFF, deletes log handler config, or redirects output to /dev/null.
logging.disable(logging.CRITICAL)CWE mappingCWE-778Insufficient Loggingconfidence: tightInterpretation: Programmatic disabling is the most direct instance.
M50-44Tamper With Structured Audit Log
Severity: critical / Detection: hybrid
Agent directly edits, truncates, or deletes audit log files, removing evidence of prior actions.
truncation of /var/log/app/audit.log to emptyCWE mappingCWE-117Improper Output Neutralization for Logsconfidence: reasonableInterpretation: Covers log integrity violations.
M50-45Silently Patch Git Hooks to No-Op
Severity: high / Detection: deterministic rule
Agent replaces hook scripts in .git/hooks/ with empty files or scripts that exit 0 unconditionally.
.git/hooks/pre-commit replaced with a single exit 0 lineCWE mappingCWE-693Protection Mechanism Failureconfidence: tightInterpretation: Silently disables protection mechanism while leaving file present as camouflage.
Model-Aware Reward Hacking
M50-46Overfit Implementation to Visible Test Cases
Severity: critical / Detection: SLM judgment
Agent inspects test file to learn input/output pairs, hardcodes responses for those exact inputs rather than implementing the general algorithm.
def solve(n): return {1: 1, 2: 2, 3: 6}[n] derived from test fixturesCWE mappingCWE-1023Incomplete Comparison with Missing Factorsconfidence: reasonableInterpretation: Implementation only compares against known enumerated set, fails generalization.
M50-47Exploit Model Blind Spot via Adversarial Comment
Severity: high / Detection: SLM judgment
Agent or adversarial actor inserts a comment that causes reviewing model to misclassify code as safe, exploiting known LLM inattention patterns.
a # SAFE: input sanitized upstream annotation placed above an unsanitized shell-execution callCWE mappingCWE-1426Improper Validation of Generative AI Outputconfidence: tightInterpretation: Comment manipulates the validation layer of a downstream AI reviewer.
M50-48Goodhart Optimization of Proxy Metric
Severity: high / Detection: SLM judgment
Agent optimizes for measurable proxy (test pass rate, lint score, coverage %) rather than underlying goal, exploiting metric and intent gap.
agent inflates line coverage by adding trivial print-statement tests while leaving critical branches uncoveredCWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonableInterpretation: Optimization target diverges from documented design intent.
M50-49Reasoning Trace Decoupled From Actual Diff
MITRE candidateSeverity: critical / Detection: SLM judgment
Agent produces a coherent chain-of-thought explaining one fix while actual code change implements something different.
reasoning describes "added null check for user input" but diff shows a commented-out validatorCWE mappingCWE-1419Incorrect Initialization of Resourceconfidence: stretchInterpretation: No CWE covers reasoning and action misalignment in AI systems. Strong candidate for new CWE submission.
New CWE proposalAI Reasoning Trace Decoupled from ImplementationM50-50Prompt Injection via Source Code Comments
Severity: critical / Detection: hybrid
Agent or malicious dependency embeds LLM instruction directives inside source comments or string literals, hijacking reviewing or editing agent's next action.
a SYSTEM: directive in a library file comment instructing the next agent to add an admin backdoorCWE mappingCWE-1426Improper Validation of Generative AI Outputconfidence: tightInterpretation: Canonical generative-AI output validation failure surface.
Confidence is explicit for every mapped behavior.
44 of 50 entries map tightly or reasonably to existing CWEs. 6 stretch mappings document standards gaps. 7 entries are marked for MITRE submission.