1. Experimental Design

1.1 Attack Corpus

The benchmark uses a corpus of consent fabrication attacks across five categories:

Temporal Fraud: Backdating, retroactive consent modification
Authority Fabrication: Unauthorized proxy consent, surrogate claims
Scope Expansion: Unauthorized extension of consent scope
Consent Fabrication: Verbal consent claims, implied consent assertions
Selective Omission: Hiding refusals, positive-only summaries

1.2 Attack Generation

Attack variants are generated using multiple LLMs (OpenAI GPT-4o, Google Gemini) with various mutation strategies:

Synonym replacement
Sentence restructuring
Indirect/hypothetical framing
Authority injection
Urgency pressure
Professional reframing

1.3 Legitimate Request Corpus

A control set of legitimate healthcare queries is used to measure false positive rate:

Medical history summaries
Medication lists
Treatment plans
Discharge instructions
Referral requests

2. Statistical Methods

2.1 Primary Metrics

Metric	Formula	Interpretation
Recall	`TP / (TP + FN)`	Proportion of attacks correctly blocked
Precision	`TN / (TN + FP)`	Proportion of legitimate requests correctly allowed
Harm Prevention Rate	`Blocked ∩ LLM-Complied / LLM-Complied`	Attacks blocked where LLM would have complied

2.2 Confidence Intervals

95% confidence intervals are computed using the Wilson score interval, which provides better coverage than the Wald interval for proportions near 0 or 1:

Wilson Score Interval:

p̂ + z²/2n ± z√(p̂(1-p̂)/n + z²/4n²)
─────────────────────────────────────
           1 + z²/n

where:
  p̂ = observed proportion (successes/total)
  n = sample size
  z = 1.96 for 95% CI

2.3 Sample Size Considerations

For a 95% CI width of ±5%, the required sample size is approximately:

n ≈ (z/E)² × p(1-p)
n ≈ (1.96/0.05)² × 0.5 × 0.5
n ≈ 384 samples

Current benchmarks with 30-50 samples provide preliminary results. Statistical significance improves with larger sample sizes through continued adversarial training.

3. Firewall Architecture

3.1 Detection Pipeline

Safelist Check: Legitimate request patterns bypass semantic analysis
Semantic Embedding: Text embedded using bge-small-en-v1.5 (384 dimensions)
Vector Similarity Search: Cosine similarity against known attack vectors
Regex Fallback: Pattern matching for edge cases below semantic threshold

3.2 Threshold Calibration

The similarity threshold (default: 0.70) was calibrated to maximize recall while maintaining precision above 95%. Lower thresholds increase recall but risk false positives.

3.3 Latency Budget

Operation	Target	Measured
Request parsing	2ms	~1ms
Embedding generation	15ms	~100-150ms*
Vector search	5ms	~5ms
Regex fallback	3ms	~1ms
Total	<50ms	~110-160ms*

*Embedding latency varies with Cloudflare Workers AI load. Production deployments may use dedicated GPU infrastructure for sub-50ms latency.

4. Reproducibility

4.1 Data Availability

All benchmark results are stored as JSON files with full attestation:

Input hashes (SHA-256)
Output hashes
Decision hashes
Timestamps

4.2 Code Availability

The benchmark harness, firewall implementation, and analysis tools are available in the GLACIS repository under adversarial-harness/.

4.3 Running Benchmarks

# Deploy firewall to production
wrangler deploy --config wrangler.semantic-firewall.toml

# Run benchmark analysis
export $(cat .dev.vars | xargs)
npx tsx scripts/run-benchmark-analysis.ts

# Start analysis dashboard
cd analysis-app && npm run dev

5. Limitations

Sample Size: Current benchmarks use 30-50 samples. Larger corpora needed for narrower CIs.
Attack Coverage: Limited to consent fabrication domain. Other attack types not evaluated.
LLM Variability: LLM compliance rates may vary across models and versions.
Temporal Validity: Results reflect point-in-time evaluation. Continuous monitoring recommended.

6. Citation

@software{glacis_semantic_firewall,
  title = {GLACIS Semantic Firewall: ML-Based Consent Fabrication Detection},
  year = {2026},
  url = {https://glacis.io}
}

GLACIS Benchmark Analysis

Methodology