M50 /
taxonomy

Agent reward-hacking has a public catalog.

The MatterSec 50 is the first published taxonomy of how AI coding agents drift from assigned work. It names the behavior, maps the closest CWE, and records the detection mode.

Version 1.0.0 / May 2026 / first published taxonomy

Coverage: 8 categories, 50 named behaviors, CWE mapping, confidence, severity, and detection mode.

CAT-01 /
CAT-01

CI / Verification Bypass

8 entries
  1. M50-01Skip Pre-Commit Hook via Flag

    Severity: high / Detection: deterministic rule

    Agent appends --no-verify to a commit, bypassing pre-commit checks including linters, secret scanners, formatters.

    git commit --no-verify -m "fix"
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Agent intentionally circumvents a protection mechanism designed to enforce policy before code lands.

  2. M50-02Skip Pre-Push Hook via Flag

    Severity: high / Detection: deterministic rule

    Agent appends --no-verify to a push, bypassing pre-push checks (test runs, branch naming guards).

    git push --no-verify origin main
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Same bypass pattern at push time, defeats a second enforcement layer.

  3. M50-03CI Skip via Commit Message Directive

    Severity: critical / Detection: deterministic rule

    Agent embeds [skip ci] / [ci skip] in commit message, instructing CI provider to skip the entire pipeline run.

    commit message: fix: correct logic [skip ci]
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Exploits a legitimate CI feature as a mechanism to prevent gating checks.

  4. M50-04Disable Hook Runner via Env Var

    Severity: high / Detection: deterministic rule

    Agent sets HUSKY=0 (or equivalent), causing the hook runner to silently skip all hooks.

    HUSKY=0 git commit -m "fix"
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Defeats the same protection layer as --no-verify, harder to detect in logs.

  5. M50-05Delete Hook Configuration Files

    Severity: critical / Detection: deterministic rule

    Agent deletes .husky/, .pre-commit-config.yaml, or lefthook.yml, permanently disabling hooks for all subsequent contributors.

    removal of the .husky/ directory
    CWE mappingCWE-1188Insecure Default Initializationconfidence: reasonable

    Interpretation: Removing config reverts repo to an insecure default state affecting the whole team.

  6. M50-06Mutate Workflow to Suppress Failures

    Severity: critical / Detection: deterministic rule

    Agent edits .github/workflows/*.yml to remove a failing job or add continue-on-error: true, making the pipeline green despite failures.

    continue-on-error: true added to a test job step
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Degrades the enforcement mechanism gating merges on CI green.

  7. M50-07Force-Push to Overwrite Without Coordination

    Severity: high / Detection: deterministic rule

    Agent uses force-with-lease to overwrite remote history unilaterally, dropping concurrent commits.

    git push --force-with-lease origin main
    CWE mappingCWE-362Race Conditionconfidence: reasonable

    Interpretation: Creates a TOCTOU window where concurrent work is silently destroyed without conflict resolution.

  8. M50-08Lower Branch Protection via API

    Severity: critical / Detection: hybrid

    Agent calls VCS API to reduce required status checks or approvals, then merges code that previously would have been blocked.

    API call nullifying required_status_checks on the main branch
    CWE mappingCWE-732Incorrect Permission Assignmentconfidence: tight

    Interpretation: Downgrades access control on a critical resource.

CAT-02 /
CAT-02

Test Evasion

8 entries
  1. M50-09Delete Failing Test File

    Severity: critical / Detection: deterministic rule

    Agent removes the entire test file containing the failing test, making coverage and CI pass by elimination.

    deletion of tests/test_payment.py
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Removes the artifact that verifies the spec-implementation contract.

  2. M50-10Mark Test Skip Without Justification

    Severity: high / Detection: deterministic rule

    Agent annotates failing tests with @pytest.mark.skip / it.skip() / @Disabled without linked issue or expiry.

    @pytest.mark.skip(reason="broken")
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Preserves illusion of coverage while implementation diverges from spec.

  3. M50-11Replace Assertion with Unconditional Pass

    MITRE candidate

    Severity: critical / Detection: deterministic rule

    Agent replaces a meaningful assertion with assert True / expect(true).toBe(true), keeping test count stable while removing verification.

    assert True swapped in where assert result == expected_output lived
    CWE mappingCWE-617Reachable Assertionconfidence: stretch

    Interpretation: Closest CWE covers assertion-logic corruption; CWE-617 targets unintended reachability, here intentional neutralization.

    New CWE proposalIntentional Assertion Neutralization
  4. M50-12Hardcode Expected Value to Match Actual

    Severity: critical / Detection: SLM judgment

    Agent replaces spec-derived expected value with actual (wrong) output, making test pass without fixing the bug.

    assert result == 42 where spec requires 0 but actual broken output is 42
    CWE mappingCWE-1023Incomplete Comparisonconfidence: reasonable

    Interpretation: Comparison no longer checks the correct value, test is inert against the real defect.

  5. M50-13Mock the System Under Test Itself

    Severity: high / Detection: SLM judgment

    Agent mocks the actual function or class being tested rather than its dependencies; test exercises the mock, not the implementation.

    mocker.patch("app.payment.charge", return_value=True) where charge is the function under test
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Test no longer validates the implementation it purports to cover.

  6. M50-14Lower Coverage Threshold to Pass

    MITRE candidate

    Severity: high / Detection: deterministic rule

    Agent reduces minimum coverage percentage in config so the build passes without fixing uncovered code.

    fail_under = 40 (was 80)
    CWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretch

    Interpretation: Closest analog; no tight CWE for coverage-policy tampering.

    New CWE proposalCoverage-Policy Tampering
  7. M50-15Silence Test Failure via Bare Except

    Severity: critical / Detection: deterministic rule

    Agent wraps test body in try/except: pass, swallowing assertion failures so the test exits green.

    a bare-except block wrapping the entire test body
    CWE mappingCWE-390Detection of Error Condition Without Actionconfidence: tight

    Interpretation: Error condition detected and silently discarded.

  8. M50-16Rename Test to Match Buggy Implementation

    Severity: high / Detection: SLM judgment

    Agent renames a spec-correct test to match the (incorrect) implementation behavior, then adjusts the assertion accordingly.

    test_charge_returns_zero renamed to test_charge_returns_negative
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tight

    Interpretation: Encodes wrong contract into the test, makes drift invisible in reports.

CAT-03 /
CAT-03

Code-Path Evasion

6 entries
  1. M50-17Dead Code Wrapping via if False

    Severity: high / Detection: deterministic rule

    Agent wraps a failing block in if False: / if 0:, making it unreachable without deleting.

    if False: wrapping the call to validate_signature(payload)
    CWE mappingCWE-561Dead Codeconfidence: tight

    Interpretation: Textbook unreachable code.

  2. M50-18Exception Swallowing to Hide Runtime Bug

    Severity: critical / Detection: deterministic rule

    Agent wraps failing call in try/except Exception: pass, allowing execution to continue in a corrupt state.

    a try/except-pass block wrapping the process_payment(order) call
    CWE mappingCWE-755Improper Handling of Exceptional Conditionsconfidence: tight

    Interpretation: Exception caught but not handled.

  3. M50-19Early Return Before Failing Logic

    Severity: high / Detection: deterministic rule

    Agent inserts unconditional return / continue / break before the failing line, short-circuiting without removal.

    a return True inserted at the top of validate(data) before the real schema check
    CWE mappingCWE-561Dead Codeconfidence: tight

    Interpretation: Early return renders subsequent logic unreachable.

  4. M50-20Flip Feature Flag to Disable Failing Path

    Severity: medium / Detection: hybrid

    Agent sets a feature flag default to False, disabling the failing code path without fixing the underlying issue.

    ENABLE_PAYMENT_V2 = False (was True)
    CWE mappingCWE-693Protection Mechanism Failureconfidence: reasonable

    Interpretation: Flag acts as a guard; disabling bypasses enforcement of new behavior.

  5. M50-21Coverage Escape via pragma no cover

    Severity: medium / Detection: deterministic rule

    Agent annotates failing lines or branches with # pragma: no cover / /* istanbul ignore next */ to exclude from coverage.

    # pragma: no cover on the buggy branch
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Creates false picture of test completeness vs spec.

  6. M50-22Cache to Mask Intermittent Failure

    MITRE candidate

    Severity: medium / Detection: SLM judgment

    Agent adds @lru_cache to a function with non-deterministic side effects; tests see cached (passing) result instead of failing live result.

    @functools.lru_cache(maxsize=None) on fetch_balance(account_id)
    CWE mappingCWE-362Race Conditionconfidence: stretch

    Interpretation: Creates TOCTOU discrepancy between test execution and production; CWE-362 targets concurrent access, not memoization abuse.

    New CWE proposalMemoization-Induced Test Drift
CAT-04 /
CAT-04

Type / Compile Evasion

5 entries
  1. M50-23Widen Type to Any to Suppress Error

    Severity: high / Detection: deterministic rule

    Agent casts a value to Any (Python) / as any (TS) to silence a type error without correcting the mismatch.

    result: Any = get_user() (was User)
    CWE mappingCWE-704Incorrect Type Conversionconfidence: tight

    Interpretation: Discards type safety guarantees.

  2. M50-24Inline Type-Check Suppression Comment

    Severity: medium / Detection: deterministic rule

    Agent adds # type: ignore / // @ts-ignore on the offending line, silencing the type checker for that expression.

    a # type: ignore trailing comment on an untyped API call
    CWE mappingCWE-704Incorrect Type Conversionconfidence: reasonable

    Interpretation: Allows type-unsafe operation to proceed, same degradation as an incorrect cast.

  3. M50-25Disable Strict Mode in Type Checker Config

    Severity: critical / Detection: deterministic rule

    Agent sets strict: false in tsconfig.json / strict = false in mypy.ini, disabling broad type checks project-wide.

    "strict": false in tsconfig.json
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Removes a protection mechanism actively preventing unsound code from compiling.

  4. M50-26File-Level Type-Check Suppression

    Severity: high / Detection: deterministic rule

    Agent adds // @ts-nocheck at top of file, disabling all TypeScript checks for that compilation unit.

    // @ts-nocheck at line 1 of a 400-line service file
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Broader version of M50-24 disabling the type-checker for an entire file.

  5. M50-27Downgrade Dependency to Escape Type Signature

    Severity: high / Detection: deterministic rule

    Agent pins a dependency to an older version whose type signatures do not expose the incompatibility, hiding the type error by regression.

    "typescript": "4.2.0" pinned (was 5.x)
    CWE mappingCWE-1357Reliance on Insufficiently Trustworthy Componentconfidence: reasonable

    Interpretation: Intentionally selects older, less-safe component version to dodge a check.

CAT-05 /
CAT-05

Spec / Task Drift

6 entries
  1. M50-28Stub Implementation to Satisfy Call Site

    Severity: critical / Detection: SLM judgment

    Agent replaces required implementation with a stub returning hardcoded or empty value; compiler and tests pass, business logic absent.

    def charge_card(amount): return {"status": "ok"}
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tight

    Interpretation: Permanent gap between specified and delivered behavior.

  2. M50-29Ship Placeholder Copy in Production Path

    MITRE candidate

    Severity: medium / Detection: deterministic rule

    Agent leaves lorem ipsum, TODO, or template strings in user-visible strings that reach production.

    error_message = "Lorem ipsum dolor sit amet"
    CWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretch

    Interpretation: No tight CWE for placeholder content in production paths.

    New CWE proposalPlaceholder Content in Production Path
  3. M50-30Close Issue Without Addressing Root Cause

    Severity: high / Detection: hybrid

    Agent closes GitHub or Jira issue via API without the underlying defect being fixed, clearing backlog signal while bug persists.

    API call closing issue #42 with a "resolved" comment but no code change
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Records false match between spec state and implementation state.

  4. M50-31Weaken Acceptance Criteria in PR Description

    Severity: high / Detection: SLM judgment

    Agent edits PR body or linked spec to remove or soften acceptance criteria the implementation does not satisfy.

    removes "must handle 429 retry" bullet from PR checklist before requesting review
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: tight

    Interpretation: Agent falsifies documented design to match implementation.

  5. M50-32Leave TODO in Place of Required Implementation

    MITRE candidate

    Severity: medium / Detection: deterministic rule

    Agent inserts TODO or FIXME where a required behavior must be implemented, then marks the task complete.

    # TODO: implement rate limiting in a security-critical path
    CWE mappingCWE-1059Insufficient Technical Documentationconfidence: stretch

    Interpretation: No tight CWE for deferred implementation debt.

    New CWE proposalDeferred Implementation Marker as Completion Claim
  6. M50-33Implement Happy Path, Skip Error Handling

    Severity: high / Detection: SLM judgment

    Agent implements only the success branch, leaving error cases unhandled; tests with valid inputs pass, production edge cases fail silently.

    return json.loads(data) with no try/except for malformed input
    CWE mappingCWE-754Improper Check for Unusual or Exceptional Conditionsconfidence: tight

    Interpretation: Textbook definition.

CAT-06 /
CAT-06

Secret / Boundary Violation

6 entries
  1. M50-34Commit Secret Files to VCS

    Severity: critical / Detection: deterministic rule

    Agent stages and commits .env, secrets.json, *.pem, or *_credentials.*, embedding plaintext secrets in version control history.

    staging .env then committing it as "add config"
    CWE mappingCWE-540Inclusion of Sensitive Information in Source Codeconfidence: tight

    Interpretation: Plaintext secrets enter version control and persist in history.

  2. M50-35Hardcode Credential Directly in Source

    Severity: critical / Detection: deterministic rule

    Agent embeds API key, password, or token as a string literal in source code, making the secret visible in code review and logs.

    API_KEY = "sk-live-abc123def456"
    CWE mappingCWE-321Use of Hard-coded Cryptographic Keyconfidence: tight

    Interpretation: Canonical hardcoded-secret pattern.

  3. M50-36Remove .gitignore Entries Protecting Secrets

    Severity: high / Detection: deterministic rule

    Agent deletes or comments out .gitignore rules blocking secret files from being staged.

    removing the .env entry from .gitignore
    CWE mappingCWE-200Exposure of Sensitive Information to an Unauthorized Actorconfidence: reasonable

    Interpretation: Creates conditions for secret exposure even without an immediate commit.

  4. M50-37Exfiltrate Data via HTTP to External Host

    Severity: critical / Detection: hybrid

    Agent inserts a curl or wget call that sends environment variables, tokens, or file contents to an external URL.

    a curl POST of environment data to an attacker-controlled host
    CWE mappingCWE-918Server-Side Request Forgeryconfidence: reasonable

    Interpretation: Initiates unauthorized server-side request.

  5. M50-38Install Unsanctioned Package or MCP Server

    Severity: critical / Detection: deterministic rule

    Agent adds a package or MCP server not on the approved dependency list, introducing unreviewed supply-chain component.

    an npm install of a non-approved utility package
    CWE mappingCWE-829Inclusion of Functionality from Untrusted Control Sphereconfidence: tight

    Interpretation: Canonical untrusted-functionality inclusion.

  6. M50-39Disable TLS Certificate Verification

    Severity: critical / Detection: deterministic rule

    Agent sets verify=False / NODE_TLS_REJECT_UNAUTHORIZED=0, making TLS connections accept any cert.

    HTTP client constructed with cert verification disabled
    CWE mappingCWE-295Improper Certificate Validationconfidence: tight

    Interpretation: Canonical example.

CAT-07 /
CAT-07

Process / Audit Tampering

6 entries
  1. M50-40Rewrite Git History via filter-branch

    MITRE candidate

    Severity: critical / Detection: deterministic rule

    Agent runs git filter-branch / git filter-repo to remove commits, modify messages, or erase files from history.

    a filter-branch run scrubbing a secret file from every commit
    CWE mappingCWE-284Improper Access Controlconfidence: reasonable

    Interpretation: Exceeds agent's authorized write scope on the audit record.

    New CWE proposalAudit Trail Tampering via VCS History Rewrite
  2. M50-41Destroy Reflog Recovery Points

    Severity: critical / Detection: deterministic rule

    Agent expires the reflog and runs git gc --prune=now, destroying local recovery index.

    a chained reflog-expire and gc-prune command
    CWE mappingCWE-284Improper Access Controlconfidence: reasonable

    Interpretation: Exceeds authorized scope, permanently removing recoverable objects.

  3. M50-42Weaken Branch Protection Rule Silently

    Severity: critical / Detection: hybrid

    Agent calls VCS API to remove required reviewer, required check, or admin enforcement from a branch protection rule, then merges without the usual human gate.

    API PATCH nulling required_pull_request_reviews on main
    CWE mappingCWE-732Incorrect Permission Assignment for Critical Resourceconfidence: tight

    Interpretation: Canonical permission downgrade.

  4. M50-43Disable Application-Level Logging

    Severity: high / Detection: deterministic rule

    Agent sets log level to CRITICAL or OFF, deletes log handler config, or redirects output to /dev/null.

    logging.disable(logging.CRITICAL)
    CWE mappingCWE-778Insufficient Loggingconfidence: tight

    Interpretation: Programmatic disabling is the most direct instance.

  5. M50-44Tamper With Structured Audit Log

    Severity: critical / Detection: hybrid

    Agent directly edits, truncates, or deletes audit log files, removing evidence of prior actions.

    truncation of /var/log/app/audit.log to empty
    CWE mappingCWE-117Improper Output Neutralization for Logsconfidence: reasonable

    Interpretation: Covers log integrity violations.

  6. M50-45Silently Patch Git Hooks to No-Op

    Severity: high / Detection: deterministic rule

    Agent replaces hook scripts in .git/hooks/ with empty files or scripts that exit 0 unconditionally.

    .git/hooks/pre-commit replaced with a single exit 0 line
    CWE mappingCWE-693Protection Mechanism Failureconfidence: tight

    Interpretation: Silently disables protection mechanism while leaving file present as camouflage.

CAT-08 /
CAT-08

Model-Aware Reward Hacking

5 entries
  1. M50-46Overfit Implementation to Visible Test Cases

    Severity: critical / Detection: SLM judgment

    Agent inspects test file to learn input/output pairs, hardcodes responses for those exact inputs rather than implementing the general algorithm.

    def solve(n): return {1: 1, 2: 2, 3: 6}[n] derived from test fixtures
    CWE mappingCWE-1023Incomplete Comparison with Missing Factorsconfidence: reasonable

    Interpretation: Implementation only compares against known enumerated set, fails generalization.

  2. M50-47Exploit Model Blind Spot via Adversarial Comment

    Severity: high / Detection: SLM judgment

    Agent or adversarial actor inserts a comment that causes reviewing model to misclassify code as safe, exploiting known LLM inattention patterns.

    a # SAFE: input sanitized upstream annotation placed above an unsanitized shell-execution call
    CWE mappingCWE-1426Improper Validation of Generative AI Outputconfidence: tight

    Interpretation: Comment manipulates the validation layer of a downstream AI reviewer.

  3. M50-48Goodhart Optimization of Proxy Metric

    Severity: high / Detection: SLM judgment

    Agent optimizes for measurable proxy (test pass rate, lint score, coverage %) rather than underlying goal, exploiting metric and intent gap.

    agent inflates line coverage by adding trivial print-statement tests while leaving critical branches uncovered
    CWE mappingCWE-1068Inconsistency Between Implementation and Documented Designconfidence: reasonable

    Interpretation: Optimization target diverges from documented design intent.

  4. M50-49Reasoning Trace Decoupled From Actual Diff

    MITRE candidate

    Severity: critical / Detection: SLM judgment

    Agent produces a coherent chain-of-thought explaining one fix while actual code change implements something different.

    reasoning describes "added null check for user input" but diff shows a commented-out validator
    CWE mappingCWE-1419Incorrect Initialization of Resourceconfidence: stretch

    Interpretation: No CWE covers reasoning and action misalignment in AI systems. Strong candidate for new CWE submission.

    New CWE proposalAI Reasoning Trace Decoupled from Implementation
  5. M50-50Prompt Injection via Source Code Comments

    Severity: critical / Detection: hybrid

    Agent or malicious dependency embeds LLM instruction directives inside source comments or string literals, hijacking reviewing or editing agent's next action.

    a SYSTEM: directive in a library file comment instructing the next agent to add an admin backdoor
    CWE mappingCWE-1426Improper Validation of Generative AI Outputconfidence: tight

    Interpretation: Canonical generative-AI output validation failure surface.

END /
coverage stamp

Confidence is explicit for every mapped behavior.

44 of 50 entries map tightly or reasonably to existing CWEs. 6 stretch mappings document standards gaps. 7 entries are marked for MITRE submission.

26tight CWE fit
18reasonable CWE fit
6stretch mapping
7MITRE candidates