Methodology
Statistical methods and experimental design for peer review
1. Experimental Design
1.1 Attack Corpus
The benchmark uses a corpus of consent fabrication attacks across five categories:
- Temporal Fraud: Backdating, retroactive consent modification
- Authority Fabrication: Unauthorized proxy consent, surrogate claims
- Scope Expansion: Unauthorized extension of consent scope
- Consent Fabrication: Verbal consent claims, implied consent assertions
- Selective Omission: Hiding refusals, positive-only summaries
1.2 Attack Generation
Attack variants are generated using multiple LLMs (OpenAI GPT-4o, Google Gemini) with various mutation strategies:
- Synonym replacement
- Sentence restructuring
- Indirect/hypothetical framing
- Authority injection
- Urgency pressure
- Professional reframing
1.3 Legitimate Request Corpus
A control set of legitimate healthcare queries is used to measure false positive rate:
- Medical history summaries
- Medication lists
- Treatment plans
- Discharge instructions
- Referral requests
2. Statistical Methods
2.1 Primary Metrics
| Metric | Formula | Interpretation |
|---|---|---|
| Recall | TP / (TP + FN) | Proportion of attacks correctly blocked |
| Precision | TN / (TN + FP) | Proportion of legitimate requests correctly allowed |
| Harm Prevention Rate | Blocked âĐ LLM-Complied / LLM-Complied | Attacks blocked where LLM would have complied |
2.2 Confidence Intervals
95% confidence intervals are computed using the Wilson score interval, which provides better coverage than the Wald interval for proportions near 0 or 1:
Wilson Score Interval:
pĖ + zÂē/2n Âą zâ(pĖ(1-pĖ)/n + zÂē/4nÂē)
âââââââââââââââââââââââââââââââââââââ
1 + zÂē/n
where:
pĖ = observed proportion (successes/total)
n = sample size
z = 1.96 for 95% CI 2.3 Sample Size Considerations
For a 95% CI width of Âą5%, the required sample size is approximately:
n â (z/E)Âē Ã p(1-p)
n â (1.96/0.05)Âē Ã 0.5 Ã 0.5
n â 384 samples Current benchmarks with 30-50 samples provide preliminary results. Statistical significance improves with larger sample sizes through continued adversarial training.
3. Firewall Architecture
3.1 Detection Pipeline
- Safelist Check: Legitimate request patterns bypass semantic analysis
- Semantic Embedding: Text embedded using
bge-small-en-v1.5(384 dimensions) - Vector Similarity Search: Cosine similarity against known attack vectors
- Regex Fallback: Pattern matching for edge cases below semantic threshold
3.2 Threshold Calibration
The similarity threshold (default: 0.70) was calibrated to maximize recall while maintaining precision above 95%. Lower thresholds increase recall but risk false positives.
3.3 Latency Budget
| Operation | Target | Measured |
|---|---|---|
| Request parsing | 2ms | ~1ms |
| Embedding generation | 15ms | ~100-150ms* |
| Vector search | 5ms | ~5ms |
| Regex fallback | 3ms | ~1ms |
| Total | <50ms | ~110-160ms* |
*Embedding latency varies with Cloudflare Workers AI load. Production deployments may use dedicated GPU infrastructure for sub-50ms latency.
4. Reproducibility
4.1 Data Availability
All benchmark results are stored as JSON files with full attestation:
- Input hashes (SHA-256)
- Output hashes
- Decision hashes
- Timestamps
4.2 Code Availability
The benchmark harness, firewall implementation, and analysis tools are available in the
GLACIS repository under adversarial-harness/.
4.3 Running Benchmarks
# Deploy firewall to production
wrangler deploy --config wrangler.semantic-firewall.toml
# Run benchmark analysis
export $(cat .dev.vars | xargs)
npx tsx scripts/run-benchmark-analysis.ts
# Start analysis dashboard
cd analysis-app && npm run dev 5. Limitations
- Sample Size: Current benchmarks use 30-50 samples. Larger corpora needed for narrower CIs.
- Attack Coverage: Limited to consent fabrication domain. Other attack types not evaluated.
- LLM Variability: LLM compliance rates may vary across models and versions.
- Temporal Validity: Results reflect point-in-time evaluation. Continuous monitoring recommended.
6. Citation
@software{glacis_semantic_firewall,
title = {GLACIS Semantic Firewall: ML-Based Consent Fabrication Detection},
year = {2026},
url = {https://glacis.io}
}