How Sentrik Works — With and Without LLMs¶

Overview¶

Sentrik's core is entirely deterministic. Every feature in the scan, gate, report, traceability, and remediation pipeline runs locally using pattern matching, AST analysis, and arithmetic — no AI model required. LLM integration is an optional layer on top for teams that want smarter requirement generation and (in future) context-aware remediation.

The Core: Zero LLM Dependencies¶

How a scan works¶

sentrik scan
    |
    v
1. Load config (.sentrik/config.yaml)
    |
    v
2. Load standards packs (IEC 62304, OWASP, MISRA, etc.)
   Each pack is a YAML file with rules:
     - id, name, severity, clause, remediation_guidance
     - type: regex | required_pattern | file_policy | ast | documentation_obligation
    |
    v
3. Rules engine evaluates every rule against every file:
     regex         -> re.search(pattern, file_contents)
     required_pattern -> fails if pattern is NOT found
     file_policy   -> structural checks (max lines, docstring, imports)
     ast           -> Python AST analysis (complexity, nesting, mutable defaults)
     documentation_obligation -> always reported, never fails gate
    |
    v
4. Each match becomes a Finding:
     { rule_id, severity, clause, file, line, message, remediation_guidance }
    |
    v
5. Gate evaluation:
     Count findings by severity
     If any severity in gate_fail_on (default: critical, high) -> FAIL (exit 1)
     Obligations are excluded from gate
    |
    v
6. Output artifacts:
     findings.json, report.html, report.sarif.json, report.junit.xml, report.csv
     compliance-report.html (per-framework)
     trust-center.html (public-safe compliance page)
     scan_metrics.json, run_metadata.json

Every step is deterministic. Same code + same rules = same findings every time.

What each feature uses under the hood¶

Feature	Technique	External calls
Code scanning	Regex pattern matching	None
AST analysis	Python `ast` module	None
File policy checks	Line counting, import parsing	None
Compliance scoring	`(rules_passed / rules_total) * 100`	None
Gate pass/fail	Severity count vs. threshold	None
Auto-patching	Regex replace, line comment-out	None
Traceability	Token matching (file paths vs. work item titles)	None
Drift detection	File existence checks vs. requirements.yaml	None
Reports (HTML/SARIF/JUnit/CSV)	String template rendering	None
Compliance reports	Findings grouped by clause, template rendering	None
Trust center page	Aggregate scores, no finding details exposed	None
Dashboard	FastAPI serving static HTML + JSON APIs	None
Audit log	Append-only JSONL file writes	None
Work item sync	REST calls to Azure DevOps / GitHub / Jira APIs	DevOps APIs only

The Severity Rescorer (local heuristic, not ML)¶

Despite the name, this is a weighted heuristic scorer — not a machine learning model. It runs locally, calls no external APIs, and uses no ML libraries.

What it does¶

Re-scores findings to improve accuracy based on code context. Primarily targets non-deterministic findings (e.g., from an LLM-based scanner that produces confidence-weighted results).

How it scores¶

Six features, each weighted, combined into a single 0.0–1.0 score:

Feature	Weight	What it measures
Base severity	30%	Original severity (critical=1.0, high=0.8, medium=0.5, low=0.25, info=0.1)
Confidence	20%	Original scanner's confidence (0.0–1.0)
Code context	15%	Is the finding in an exception handler? Class def? Import?
Pattern risk	15%	Does the message/snippet match security keywords? (password, SQL, eval, pickle)
File risk	10%	Is the file in a high-risk path? (auth, payment, admin, API)
Density	10%	How many findings per lines-of-code in this file?

Score mapped to severity: 0.85+ = critical, 0.65+ = high, 0.40+ = medium, 0.20+ = low, below = info.

When it runs¶

Off by default. Enable with severity_rescoring_enabled: true in config (legacy ml_severity_enabled still works).
Runs after all rules have been evaluated, before suppression filtering.
Skips deterministic findings by default (configurable).
Never re-scores documentation obligations.
Requires a paid license tier (Trial/Team/Org/Enterprise).

What it is NOT¶

Not a neural network or trained model
Not an API call to any external service
Not Claude, GPT, or any LLM
Not required for any Sentrik feature to work

Confidence Scoring¶

Sentrik assigns a confidence value (0.0–1.0) to every finding. This runs in two layers:

Heuristic confidence (always on, no LLM)¶

The rules engine assigns confidence automatically based on where the match occurs in the source file:

Confidence	Context
1.0	Match in executable code (finding is `deterministic=True`)
0.7	Match in a test file (`test_` prefix, `_test.py` suffix, or `/tests/` directory)
0.5	Match inside a comment (language-aware: `#`, `//`, etc.)
0.4	Match inside a string literal or docstring

When confidence is less than 1.0, the finding is marked deterministic=False. Non-regex checks (AST, file_policy, required_pattern) always have confidence 1.0 because they are structurally verified.

LLM-powered confidence (opt-in)¶

For findings that are not deterministic, an LLM can re-score confidence with richer context analysis. This is provider-agnostic — configure any supported backend:

confidence_scoring_enabled: true
confidence_scoring_max_findings: 50   # cap per scan
llm_provider: anthropic               # anthropic, openai, or ollama
llm_model: claude-sonnet-4-20250514

Or via environment variables: GUARD_CONFIDENCE_SCORING_ENABLED=true, GUARD_LLM_PROVIDER, GUARD_LLM_MODEL, GUARD_LLM_BASE_URL.

Pipeline position¶

Confidence scoring runs in this order within the scan pipeline:

Rules engine evaluates all rules (assigns heuristic confidence)
Severity rescorer adjusts severity based on code context (opt-in)
LLM confidence scoring re-scores non-deterministic findings (opt-in)
Suppression filtering removes silenced findings
Gate evaluation counts remaining findings by severity

Key design points¶

Heuristic confidence is always computed — no configuration needed, no LLM calls.
LLM confidence only runs when explicitly enabled and a provider is configured.
The LLM never changes severity directly — it adjusts confidence, which downstream tools (like the severity rescorer) can use.
All LLM outputs are validated by the deterministic core before being accepted.

Optional LLM Integration Points¶

These are the only places where an LLM can be used. All are opt-in with local fallbacks.

1. Requirement generation (`sentrik generate-reqs`)¶

What it does: Analyzes untracked source files and generates requirement descriptions.

With LLM: Sends file contents to an LLM (configurable provider) to produce natural-language requirement titles, descriptions, and acceptance criteria.

Without LLM: Falls back to a stub generator that creates requirements from file names and function signatures. Still works — just less descriptive.

Config:

# No LLM config needed — stub generator is the default.
# LLM provider would be configured separately if desired.

2. Requirement verification (`requirement_verification` rule type)¶

What it does: Checks whether code behavior actually matches what a requirement says.

With LLM: Sends the requirement text + source code to an LLM and asks "does this code implement this requirement?"

Without LLM: Not available. This rule type is defined but not used in any current standards pack. All shipping packs use deterministic rule types only.

3. Future: LLM-powered remediation (not yet built)¶

Vision: When a finding is detected, an LLM reads the source file + the remediation guidance and generates a context-aware fix.

Safety loop:

Finding detected
  -> LLM generates fix using remediation guidance as prompt
    -> Sentrik scans the fix with the same rules engine
      -> If fix introduces new findings -> reject, retry (max 3)
      -> If fix passes -> propose as patch
      -> If file is safety-critical -> require human review regardless

Without LLM: Users read the remediation guidance in the compliance report and fix manually, or use the existing mechanical auto-patches (regex replace, comment-out).

Architecture Diagram¶

┌─────────────────────────────────────────────────────────────┐
│                    DETERMINISTIC CORE                        │
│                  (works without any LLM)                     │
│                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │  Rules    │   │   Gate   │   │ Reports  │   │  Audit  │ │
│  │  Engine   │──>│  Pass/   │──>│ HTML/CSV │──>│   Log   │ │
│  │ (regex,   │   │  Fail    │   │ SARIF/   │   │ (JSONL) │ │
│  │  AST,     │   │          │   │ JUnit    │   │         │ │
│  │  policy)  │   │          │   │ Comply   │   │         │ │
│  └──────────┘   └──────────┘   └──────────┘   └─────────┘ │
│       │                                                      │
│       v                                                      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ Auto-    │   │ Trace-   │   │  Drift   │   │ Work    │ │
│  │ Patch    │   │ ability  │   │ Detect   │   │ Item    │ │
│  │ (regex   │   │ (token   │   │ (file    │   │ Sync    │ │
│  │  replace)│   │  match)  │   │  exists?)│   │ (REST)  │ │
│  └──────────┘   └──────────┘   └──────────┘   └─────────┘ │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Severity Rescorer (opt-in, local heuristics)      │   │
│  │  Weighted features: base sev, confidence, code       │   │
│  │  context, pattern risk, file risk, density           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              OPTIONAL LLM LAYER (opt-in)                     │
│           (everything below works without this)              │
│                                                              │
│  ┌──────────────┐  ┌───────────────┐  ┌──────────────────┐ │
│  │  Requirement  │  │  Requirement  │  │  LLM-Powered     │ │
│  │  Generation   │  │  Verification │  │  Remediation     │ │
│  │  (fallback:   │  │  (not in any  │  │  (not yet built) │ │
│  │   stub gen)   │  │   pack yet)   │  │                  │ │
│  └──────────────┘  └───────────────┘  └──────────────────┘ │
│                                                              │
│  All LLM outputs validated by deterministic core before     │
│  being accepted — the rules engine scans LLM-generated      │
│  code the same way it scans human code.                     │
└─────────────────────────────────────────────────────────────┘

Key Principle¶

The deterministic core is the source of truth. If an LLM generates a fix, it gets scanned by the same rules engine. If an LLM writes a requirement, it gets verified by the same traceability system. The LLM is an assistant — the rules engine is the authority.

This matters for regulated industries: an FDA auditor can look at the compliance report and know that every finding was detected by a deterministic pattern match with a specific clause reference, not by an opaque AI judgment call.