How Sentrik Works — With and Without LLMs¶
Overview¶
Sentrik's core is entirely deterministic. Every feature in the scan, gate, report, traceability, and remediation pipeline runs locally using pattern matching, AST analysis, and arithmetic — no AI model required. LLM integration is an optional layer on top for teams that want smarter requirement generation and (in future) context-aware remediation.
The Core: Zero LLM Dependencies¶
How a scan works¶
sentrik scan
|
v
1. Load config (.sentrik/config.yaml)
|
v
2. Load standards packs (IEC 62304, OWASP, MISRA, etc.)
Each pack is a YAML file with rules:
- id, name, severity, clause, remediation_guidance
- type: regex | required_pattern | file_policy | ast | documentation_obligation
|
v
3. Rules engine evaluates every rule against every file:
regex -> re.search(pattern, file_contents)
required_pattern -> fails if pattern is NOT found
file_policy -> structural checks (max lines, docstring, imports)
ast -> Python AST analysis (complexity, nesting, mutable defaults)
documentation_obligation -> always reported, never fails gate
|
v
4. Each match becomes a Finding:
{ rule_id, severity, clause, file, line, message, remediation_guidance }
|
v
5. Gate evaluation:
Count findings by severity
If any severity in gate_fail_on (default: critical, high) -> FAIL (exit 1)
Obligations are excluded from gate
|
v
6. Output artifacts:
findings.json, report.html, report.sarif.json, report.junit.xml, report.csv
compliance-report.html (per-framework)
trust-center.html (public-safe compliance page)
scan_metrics.json, run_metadata.json
Every step is deterministic. Same code + same rules = same findings every time.
What each feature uses under the hood¶
| Feature | Technique | External calls |
|---|---|---|
| Code scanning | Regex pattern matching | None |
| AST analysis | Python ast module |
None |
| File policy checks | Line counting, import parsing | None |
| Compliance scoring | (rules_passed / rules_total) * 100 |
None |
| Gate pass/fail | Severity count vs. threshold | None |
| Auto-patching | Regex replace, line comment-out | None |
| Traceability | Token matching (file paths vs. work item titles) | None |
| Drift detection | File existence checks vs. requirements.yaml | None |
| Reports (HTML/SARIF/JUnit/CSV) | String template rendering | None |
| Compliance reports | Findings grouped by clause, template rendering | None |
| Trust center page | Aggregate scores, no finding details exposed | None |
| Dashboard | FastAPI serving static HTML + JSON APIs | None |
| Audit log | Append-only JSONL file writes | None |
| Work item sync | REST calls to Azure DevOps / GitHub / Jira APIs | DevOps APIs only |
The Severity Rescorer (local heuristic, not ML)¶
Despite the name, this is a weighted heuristic scorer — not a machine learning model. It runs locally, calls no external APIs, and uses no ML libraries.
What it does¶
Re-scores findings to improve accuracy based on code context. Primarily targets non-deterministic findings (e.g., from an LLM-based scanner that produces confidence-weighted results).
How it scores¶
Six features, each weighted, combined into a single 0.0–1.0 score:
| Feature | Weight | What it measures |
|---|---|---|
| Base severity | 30% | Original severity (critical=1.0, high=0.8, medium=0.5, low=0.25, info=0.1) |
| Confidence | 20% | Original scanner's confidence (0.0–1.0) |
| Code context | 15% | Is the finding in an exception handler? Class def? Import? |
| Pattern risk | 15% | Does the message/snippet match security keywords? (password, SQL, eval, pickle) |
| File risk | 10% | Is the file in a high-risk path? (auth, payment, admin, API) |
| Density | 10% | How many findings per lines-of-code in this file? |
Score mapped to severity: 0.85+ = critical, 0.65+ = high, 0.40+ = medium, 0.20+ = low, below = info.
When it runs¶
- Off by default. Enable with
severity_rescoring_enabled: truein config (legacyml_severity_enabledstill works). - Runs after all rules have been evaluated, before suppression filtering.
- Skips deterministic findings by default (configurable).
- Never re-scores documentation obligations.
- Requires a paid license tier (Trial/Team/Org/Enterprise).
What it is NOT¶
- Not a neural network or trained model
- Not an API call to any external service
- Not Claude, GPT, or any LLM
- Not required for any Sentrik feature to work
Confidence Scoring¶
Sentrik assigns a confidence value (0.0–1.0) to every finding. This runs in two layers:
Heuristic confidence (always on, no LLM)¶
The rules engine assigns confidence automatically based on where the match occurs in the source file:
| Confidence | Context |
|---|---|
| 1.0 | Match in executable code (finding is deterministic=True) |
| 0.7 | Match in a test file (test_ prefix, _test.py suffix, or /tests/ directory) |
| 0.5 | Match inside a comment (language-aware: #, //, etc.) |
| 0.4 | Match inside a string literal or docstring |
When confidence is less than 1.0, the finding is marked deterministic=False. Non-regex checks (AST, file_policy, required_pattern) always have confidence 1.0 because they are structurally verified.
LLM-powered confidence (opt-in)¶
For findings that are not deterministic, an LLM can re-score confidence with richer context analysis. This is provider-agnostic — configure any supported backend:
confidence_scoring_enabled: true
confidence_scoring_max_findings: 50 # cap per scan
llm_provider: anthropic # anthropic, openai, or ollama
llm_model: claude-sonnet-4-20250514
Or via environment variables: GUARD_CONFIDENCE_SCORING_ENABLED=true, GUARD_LLM_PROVIDER, GUARD_LLM_MODEL, GUARD_LLM_BASE_URL.
Pipeline position¶
Confidence scoring runs in this order within the scan pipeline:
- Rules engine evaluates all rules (assigns heuristic confidence)
- Severity rescorer adjusts severity based on code context (opt-in)
- LLM confidence scoring re-scores non-deterministic findings (opt-in)
- Suppression filtering removes silenced findings
- Gate evaluation counts remaining findings by severity
Key design points¶
- Heuristic confidence is always computed — no configuration needed, no LLM calls.
- LLM confidence only runs when explicitly enabled and a provider is configured.
- The LLM never changes severity directly — it adjusts confidence, which downstream tools (like the severity rescorer) can use.
- All LLM outputs are validated by the deterministic core before being accepted.
Optional LLM Integration Points¶
These are the only places where an LLM can be used. All are opt-in with local fallbacks.
1. Requirement generation (sentrik generate-reqs)¶
What it does: Analyzes untracked source files and generates requirement descriptions.
With LLM: Sends file contents to an LLM (configurable provider) to produce natural-language requirement titles, descriptions, and acceptance criteria.
Without LLM: Falls back to a stub generator that creates requirements from file names and function signatures. Still works — just less descriptive.
Config:
# No LLM config needed — stub generator is the default.
# LLM provider would be configured separately if desired.
2. Requirement verification (requirement_verification rule type)¶
What it does: Checks whether code behavior actually matches what a requirement says.
With LLM: Sends the requirement text + source code to an LLM and asks "does this code implement this requirement?"
Without LLM: Not available. This rule type is defined but not used in any current standards pack. All shipping packs use deterministic rule types only.
3. Future: LLM-powered remediation (not yet built)¶
Vision: When a finding is detected, an LLM reads the source file + the remediation guidance and generates a context-aware fix.
Safety loop:
Finding detected
-> LLM generates fix using remediation guidance as prompt
-> Sentrik scans the fix with the same rules engine
-> If fix introduces new findings -> reject, retry (max 3)
-> If fix passes -> propose as patch
-> If file is safety-critical -> require human review regardless
Without LLM: Users read the remediation guidance in the compliance report and fix manually, or use the existing mechanical auto-patches (regex replace, comment-out).
Architecture Diagram¶
┌─────────────────────────────────────────────────────────────┐
│ DETERMINISTIC CORE │
│ (works without any LLM) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Rules │ │ Gate │ │ Reports │ │ Audit │ │
│ │ Engine │──>│ Pass/ │──>│ HTML/CSV │──>│ Log │ │
│ │ (regex, │ │ Fail │ │ SARIF/ │ │ (JSONL) │ │
│ │ AST, │ │ │ │ JUnit │ │ │ │
│ │ policy) │ │ │ │ Comply │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │
│ │ │
│ v │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Auto- │ │ Trace- │ │ Drift │ │ Work │ │
│ │ Patch │ │ ability │ │ Detect │ │ Item │ │
│ │ (regex │ │ (token │ │ (file │ │ Sync │ │
│ │ replace)│ │ match) │ │ exists?)│ │ (REST) │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Severity Rescorer (opt-in, local heuristics) │ │
│ │ Weighted features: base sev, confidence, code │ │
│ │ context, pattern risk, file risk, density │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ OPTIONAL LLM LAYER (opt-in) │
│ (everything below works without this) │
│ │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ │
│ │ Requirement │ │ Requirement │ │ LLM-Powered │ │
│ │ Generation │ │ Verification │ │ Remediation │ │
│ │ (fallback: │ │ (not in any │ │ (not yet built) │ │
│ │ stub gen) │ │ pack yet) │ │ │ │
│ └──────────────┘ └───────────────┘ └──────────────────┘ │
│ │
│ All LLM outputs validated by deterministic core before │
│ being accepted — the rules engine scans LLM-generated │
│ code the same way it scans human code. │
└─────────────────────────────────────────────────────────────┘
Key Principle¶
The deterministic core is the source of truth. If an LLM generates a fix, it gets scanned by the same rules engine. If an LLM writes a requirement, it gets verified by the same traceability system. The LLM is an assistant — the rules engine is the authority.
This matters for regulated industries: an FDA auditor can look at the compliance report and know that every finding was detected by a deterministic pattern match with a specific clause reference, not by an opaque AI judgment call.