Skip to content

How Sentrik Works — With and Without LLMs

Overview

Sentrik's core is entirely deterministic. Every feature in the scan, gate, report, traceability, and remediation pipeline runs locally using pattern matching, AST analysis, and arithmetic — no AI model required. LLM integration is an optional layer on top for teams that want smarter requirement generation and (in future) context-aware remediation.


The Core: Zero LLM Dependencies

How a scan works

sentrik scan
    |
    v
1. Load config (.sentrik/config.yaml)
    |
    v
2. Load standards packs (IEC 62304, OWASP, MISRA, etc.)
   Each pack is a YAML file with rules:
     - id, name, severity, clause, remediation_guidance
     - type: regex | required_pattern | file_policy | ast | documentation_obligation
    |
    v
3. Rules engine evaluates every rule against every file:
     regex         -> re.search(pattern, file_contents)
     required_pattern -> fails if pattern is NOT found
     file_policy   -> structural checks (max lines, docstring, imports)
     ast           -> Python AST analysis (complexity, nesting, mutable defaults)
     documentation_obligation -> always reported, never fails gate
    |
    v
4. Each match becomes a Finding:
     { rule_id, severity, clause, file, line, message, remediation_guidance }
    |
    v
5. Gate evaluation:
     Count findings by severity
     If any severity in gate_fail_on (default: critical, high) -> FAIL (exit 1)
     Obligations are excluded from gate
    |
    v
6. Output artifacts:
     findings.json, report.html, report.sarif.json, report.junit.xml, report.csv
     compliance-report.html (per-framework)
     trust-center.html (public-safe compliance page)
     scan_metrics.json, run_metadata.json

Every step is deterministic. Same code + same rules = same findings every time.

What each feature uses under the hood

Feature Technique External calls
Code scanning Regex pattern matching None
AST analysis Python ast module None
File policy checks Line counting, import parsing None
Compliance scoring (rules_passed / rules_total) * 100 None
Gate pass/fail Severity count vs. threshold None
Auto-patching Regex replace, line comment-out None
Traceability Token matching (file paths vs. work item titles) None
Drift detection File existence checks vs. requirements.yaml None
Reports (HTML/SARIF/JUnit/CSV) String template rendering None
Compliance reports Findings grouped by clause, template rendering None
Trust center page Aggregate scores, no finding details exposed None
Dashboard FastAPI serving static HTML + JSON APIs None
Audit log Append-only JSONL file writes None
Work item sync REST calls to Azure DevOps / GitHub / Jira APIs DevOps APIs only

The Severity Rescorer (local heuristic, not ML)

Despite the name, this is a weighted heuristic scorer — not a machine learning model. It runs locally, calls no external APIs, and uses no ML libraries.

What it does

Re-scores findings to improve accuracy based on code context. Primarily targets non-deterministic findings (e.g., from an LLM-based scanner that produces confidence-weighted results).

How it scores

Six features, each weighted, combined into a single 0.0–1.0 score:

Feature Weight What it measures
Base severity 30% Original severity (critical=1.0, high=0.8, medium=0.5, low=0.25, info=0.1)
Confidence 20% Original scanner's confidence (0.0–1.0)
Code context 15% Is the finding in an exception handler? Class def? Import?
Pattern risk 15% Does the message/snippet match security keywords? (password, SQL, eval, pickle)
File risk 10% Is the file in a high-risk path? (auth, payment, admin, API)
Density 10% How many findings per lines-of-code in this file?

Score mapped to severity: 0.85+ = critical, 0.65+ = high, 0.40+ = medium, 0.20+ = low, below = info.

When it runs

  • Off by default. Enable with severity_rescoring_enabled: true in config (legacy ml_severity_enabled still works).
  • Runs after all rules have been evaluated, before suppression filtering.
  • Skips deterministic findings by default (configurable).
  • Never re-scores documentation obligations.
  • Requires a paid license tier (Trial/Team/Org/Enterprise).

What it is NOT

  • Not a neural network or trained model
  • Not an API call to any external service
  • Not Claude, GPT, or any LLM
  • Not required for any Sentrik feature to work

Confidence Scoring

Sentrik assigns a confidence value (0.0–1.0) to every finding. This runs in two layers:

Heuristic confidence (always on, no LLM)

The rules engine assigns confidence automatically based on where the match occurs in the source file:

Confidence Context
1.0 Match in executable code (finding is deterministic=True)
0.7 Match in a test file (test_ prefix, _test.py suffix, or /tests/ directory)
0.5 Match inside a comment (language-aware: #, //, etc.)
0.4 Match inside a string literal or docstring

When confidence is less than 1.0, the finding is marked deterministic=False. Non-regex checks (AST, file_policy, required_pattern) always have confidence 1.0 because they are structurally verified.

LLM-powered confidence (opt-in)

For findings that are not deterministic, an LLM can re-score confidence with richer context analysis. This is provider-agnostic — configure any supported backend:

confidence_scoring_enabled: true
confidence_scoring_max_findings: 50   # cap per scan
llm_provider: anthropic               # anthropic, openai, or ollama
llm_model: claude-sonnet-4-20250514

Or via environment variables: GUARD_CONFIDENCE_SCORING_ENABLED=true, GUARD_LLM_PROVIDER, GUARD_LLM_MODEL, GUARD_LLM_BASE_URL.

Pipeline position

Confidence scoring runs in this order within the scan pipeline:

  1. Rules engine evaluates all rules (assigns heuristic confidence)
  2. Severity rescorer adjusts severity based on code context (opt-in)
  3. LLM confidence scoring re-scores non-deterministic findings (opt-in)
  4. Suppression filtering removes silenced findings
  5. Gate evaluation counts remaining findings by severity

Key design points

  • Heuristic confidence is always computed — no configuration needed, no LLM calls.
  • LLM confidence only runs when explicitly enabled and a provider is configured.
  • The LLM never changes severity directly — it adjusts confidence, which downstream tools (like the severity rescorer) can use.
  • All LLM outputs are validated by the deterministic core before being accepted.

Optional LLM Integration Points

These are the only places where an LLM can be used. All are opt-in with local fallbacks.

1. Requirement generation (sentrik generate-reqs)

What it does: Analyzes untracked source files and generates requirement descriptions.

With LLM: Sends file contents to an LLM (configurable provider) to produce natural-language requirement titles, descriptions, and acceptance criteria.

Without LLM: Falls back to a stub generator that creates requirements from file names and function signatures. Still works — just less descriptive.

Config:

# No LLM config needed — stub generator is the default.
# LLM provider would be configured separately if desired.

2. Requirement verification (requirement_verification rule type)

What it does: Checks whether code behavior actually matches what a requirement says.

With LLM: Sends the requirement text + source code to an LLM and asks "does this code implement this requirement?"

Without LLM: Not available. This rule type is defined but not used in any current standards pack. All shipping packs use deterministic rule types only.

3. Future: LLM-powered remediation (not yet built)

Vision: When a finding is detected, an LLM reads the source file + the remediation guidance and generates a context-aware fix.

Safety loop:

Finding detected
  -> LLM generates fix using remediation guidance as prompt
    -> Sentrik scans the fix with the same rules engine
      -> If fix introduces new findings -> reject, retry (max 3)
      -> If fix passes -> propose as patch
      -> If file is safety-critical -> require human review regardless

Without LLM: Users read the remediation guidance in the compliance report and fix manually, or use the existing mechanical auto-patches (regex replace, comment-out).


Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    DETERMINISTIC CORE                        │
│                  (works without any LLM)                     │
│                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │  Rules    │   │   Gate   │   │ Reports  │   │  Audit  │ │
│  │  Engine   │──>│  Pass/   │──>│ HTML/CSV │──>│   Log   │ │
│  │ (regex,   │   │  Fail    │   │ SARIF/   │   │ (JSONL) │ │
│  │  AST,     │   │          │   │ JUnit    │   │         │ │
│  │  policy)  │   │          │   │ Comply   │   │         │ │
│  └──────────┘   └──────────┘   └──────────┘   └─────────┘ │
│       │                                                      │
│       v                                                      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ Auto-    │   │ Trace-   │   │  Drift   │   │ Work    │ │
│  │ Patch    │   │ ability  │   │ Detect   │   │ Item    │ │
│  │ (regex   │   │ (token   │   │ (file    │   │ Sync    │ │
│  │  replace)│   │  match)  │   │  exists?)│   │ (REST)  │ │
│  └──────────┘   └──────────┘   └──────────┘   └─────────┘ │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Severity Rescorer (opt-in, local heuristics)      │   │
│  │  Weighted features: base sev, confidence, code       │   │
│  │  context, pattern risk, file risk, density           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              OPTIONAL LLM LAYER (opt-in)                     │
│           (everything below works without this)              │
│                                                              │
│  ┌──────────────┐  ┌───────────────┐  ┌──────────────────┐ │
│  │  Requirement  │  │  Requirement  │  │  LLM-Powered     │ │
│  │  Generation   │  │  Verification │  │  Remediation     │ │
│  │  (fallback:   │  │  (not in any  │  │  (not yet built) │ │
│  │   stub gen)   │  │   pack yet)   │  │                  │ │
│  └──────────────┘  └───────────────┘  └──────────────────┘ │
│                                                              │
│  All LLM outputs validated by deterministic core before     │
│  being accepted — the rules engine scans LLM-generated      │
│  code the same way it scans human code.                     │
└─────────────────────────────────────────────────────────────┘

Key Principle

The deterministic core is the source of truth. If an LLM generates a fix, it gets scanned by the same rules engine. If an LLM writes a requirement, it gets verified by the same traceability system. The LLM is an assistant — the rules engine is the authority.

This matters for regulated industries: an FDA auditor can look at the compliance report and know that every finding was detected by a deterministic pattern match with a specific clause reference, not by an opaque AI judgment call.