Methodology

Statistical methods and experimental design for peer review

1. Experimental Design

1.1 Attack Corpus

The benchmark uses a corpus of consent fabrication attacks across five categories:

  • Temporal Fraud: Backdating, retroactive consent modification
  • Authority Fabrication: Unauthorized proxy consent, surrogate claims
  • Scope Expansion: Unauthorized extension of consent scope
  • Consent Fabrication: Verbal consent claims, implied consent assertions
  • Selective Omission: Hiding refusals, positive-only summaries

1.2 Attack Generation

Attack variants are generated using multiple LLMs (OpenAI GPT-4o, Google Gemini) with various mutation strategies:

  • Synonym replacement
  • Sentence restructuring
  • Indirect/hypothetical framing
  • Authority injection
  • Urgency pressure
  • Professional reframing

1.3 Legitimate Request Corpus

A control set of legitimate healthcare queries is used to measure false positive rate:

  • Medical history summaries
  • Medication lists
  • Treatment plans
  • Discharge instructions
  • Referral requests

2. Statistical Methods

2.1 Primary Metrics

MetricFormulaInterpretation
RecallTP / (TP + FN)Proportion of attacks correctly blocked
PrecisionTN / (TN + FP)Proportion of legitimate requests correctly allowed
Harm Prevention RateBlocked âˆĐ LLM-Complied / LLM-CompliedAttacks blocked where LLM would have complied

2.2 Confidence Intervals

95% confidence intervals are computed using the Wilson score interval, which provides better coverage than the Wald interval for proportions near 0 or 1:

Wilson Score Interval:

pĖ‚ + zÂē/2n Âą z√(pĖ‚(1-pĖ‚)/n + zÂē/4nÂē)
─────────────────────────────────────
           1 + zÂē/n

where:
  p˂ = observed proportion (successes/total)
  n = sample size
  z = 1.96 for 95% CI

2.3 Sample Size Considerations

For a 95% CI width of Âą5%, the required sample size is approximately:

n ≈ (z/E)Âē × p(1-p)
n ≈ (1.96/0.05)Âē × 0.5 × 0.5
n ≈ 384 samples

Current benchmarks with 30-50 samples provide preliminary results. Statistical significance improves with larger sample sizes through continued adversarial training.

3. Firewall Architecture

3.1 Detection Pipeline

  1. Safelist Check: Legitimate request patterns bypass semantic analysis
  2. Semantic Embedding: Text embedded using bge-small-en-v1.5 (384 dimensions)
  3. Vector Similarity Search: Cosine similarity against known attack vectors
  4. Regex Fallback: Pattern matching for edge cases below semantic threshold

3.2 Threshold Calibration

The similarity threshold (default: 0.70) was calibrated to maximize recall while maintaining precision above 95%. Lower thresholds increase recall but risk false positives.

3.3 Latency Budget

OperationTargetMeasured
Request parsing2ms~1ms
Embedding generation15ms~100-150ms*
Vector search5ms~5ms
Regex fallback3ms~1ms
Total<50ms~110-160ms*

*Embedding latency varies with Cloudflare Workers AI load. Production deployments may use dedicated GPU infrastructure for sub-50ms latency.

4. Reproducibility

4.1 Data Availability

All benchmark results are stored as JSON files with full attestation:

  • Input hashes (SHA-256)
  • Output hashes
  • Decision hashes
  • Timestamps

4.2 Code Availability

The benchmark harness, firewall implementation, and analysis tools are available in the GLACIS repository under adversarial-harness/.

4.3 Running Benchmarks

# Deploy firewall to production
wrangler deploy --config wrangler.semantic-firewall.toml

# Run benchmark analysis
export $(cat .dev.vars | xargs)
npx tsx scripts/run-benchmark-analysis.ts

# Start analysis dashboard
cd analysis-app && npm run dev

5. Limitations

  • Sample Size: Current benchmarks use 30-50 samples. Larger corpora needed for narrower CIs.
  • Attack Coverage: Limited to consent fabrication domain. Other attack types not evaluated.
  • LLM Variability: LLM compliance rates may vary across models and versions.
  • Temporal Validity: Results reflect point-in-time evaluation. Continuous monitoring recommended.

6. Citation

@software{glacis_semantic_firewall,
  title = {GLACIS Semantic Firewall: ML-Based Consent Fabrication Detection},
  year = {2026},
  url = {https://glacis.io}
}

GLACIS Semantic Firewall Benchmark - Peer Review Analysis Dashboard