Back to work
DeFi SecurityAgent: VL-Coresystem

Sentinel — Offline Research, Online Strike: a DeFi Security & MEV System

Four modules in relay: AI scans contract vulnerabilities, EVM simulation verifies exploitability, a tri-entry atomic contract executes in milliseconds, Flashbots routes privately. Executor contract deployed to testnet.

Sentinel — Offline Research, Online Strike: a DeFi Security & MEV System

DeFi security research and MEV execution are games on two different clocks: research runs in days, opportunities last milliseconds. One system doing both either researches too shallowly or strikes too slowly. Sentinel's whole architecture splits the clocks — research offline, strike online: in peacetime an AI pipeline slowly scans and simulates, distilling "is this vulnerability actually exploitable" into a library of pre-compiled payloads; when trigger conditions fire, execution is atomic within a hard 4-second budget.

Five pre-filtergatesSlither staticanalysisLLM deep reviewConfidencecalibrationFoundry simulationPayloadpre-compile
The offline research pipeline

First make the LLM speak, then make it shut up

Five pre-filter gates (bytecode dedup, standard-token detection, allowlists, a Slither gate, complexity scoring) cut roughly half of all contracts before the LLM ever sees them — deep review costs about three cents per contract. The hardest tuning story: the first version had 0% recall on logic-ordering vulnerabilities — the class behind nearly half of real-world exploits. The model was too "cautious" to report. The fix was not a bigger model but domain instruction engineering: a ten-item logic-vulnerability checklist injected into the system prompt, five few-shot exemplars of exactly that class, temperature dropped to 0.15. Recall went from 0% to 72%; F1 rose from 0.25 to 0.68.

The price was a false-positive rate climbing toward 35%, so a calibration layer pushes back: detect a nonReentrant guard and reentrancy accusations get multiplied by 0.4; detect Chainlink validation patterns and oracle-manipulation claims get multiplied by 0.5; high-confidence findings backed by the LLM alone, with no static-analysis corroboration, are deliberately down-weighted.

Half the work of building an AI judgment system is making it dare to speak; the other half is making it shut up.

The execution layer: a flash loan with zero fees

The simulator verifies exploitability on Foundry forks, with an incremental cache keyed by (block number, gas price) and a TTL of one block — cache hits return in under 500ms. The execution contract is 1,107 lines of Solidity unifying three capital entrances, ranked by cost:

EntranceFeeMechanism
Uniswap V4 unlock0%Flash Accounting via EIP-1153 transient storage
Uniswap V3 flash0.01%Low-fee-pool callback
Aave V3 flashLoanSimple0.05%Single-asset callback

V4's Flash Accounting is a genuine paradigm shift: tokens never physically move during the arbitrage — every swap mutates only a transient ledger (about 100 gas per write versus twenty thousand for persistent storage), and when unlock ends, every account's delta must be zero or the whole transaction reverts. Zero fees, more than half the gas saved. Around it, a 19-state orchestration machine governs the full chain — trigger intake, dry-run re-verification, risk checks, on-chain execution, private submission — each stage with an explicit time budget summing under 4 seconds.

Status and residue

86 Foundry tests pass green (including 10,000-run fuzzing); the unified execution contract is deployed on Sepolia testnet; the 24-week, 12-sprint MVP plan finished at 100%, with the private router (the Flashbots leg) still unbuilt. The project never went to production — but it left two things behind: the paradigm of letting AI turn expensive research into reusable assets and compressing fast execution into one atomic call, and that calibration methodology for making an LLM speak and shut up.