Status

This project is ongoing. There is no public deployment yet, but it is available for a live demo. If you are interested in seeing it in action, reach out at admin@alvinalmodal.dev.

The Problem

Most home network content filters rely on static blocklists. They block known domains and let everything else through. This breaks down fast. New sites appear daily. Content on a domain changes over time. A page that was safe last week might not be safe today.

AI-based classification can solve this by evaluating actual page content instead of just matching domain names. But the practical question is: do you run the AI locally on your own hardware, or call a cloud API? Local models are free per request but less capable. Cloud APIs are more accurate but cost money on every classification.

This project exists to answer that question empirically instead of guessing.

Research Questions

Accuracy How does detection of inappropriate content (explicit images, hate speech, gambling, violence) differ between local and cloud AI?
Adaptability How quickly can each architecture adapt to newly emerging threats given local update frequencies vs continuous cloud feeds?
Bypass Resistance How resistant is the transparent proxy architecture to VPNs, DNS-over-HTTPS (DoH), DNS-over-TLS (DoT), and proxy-avoidance networks?
Cost-Effectiveness At what usage volume do on-device AI solutions become more cost-effective than cloud API subscriptions?

What I Built

The Adaptive Web Filter is a transparent network proxy that intercepts web traffic, extracts page content, and classifies it using multiple AI providers simultaneously. It is not just a filter. It is a research platform for comparing how different AI approaches perform at content classification in a real home network.

The system sits between your devices and the internet as a transparent proxy. When someone visits a website, the proxy captures the page, queues it for analysis, and every active AI provider evaluates the content independently. Results are stored, compared, and used to build accuracy benchmarks over time.

Architecture

                         Home Network
                        ┌──────────┐
                        │ Browser  │
                        └────┬─────┘
                             │ HTTP/HTTPS
                        ┌────▼─────┐
                        │mitmproxy │
                        │ Filter   │
                        └──┬───┬───┘
               Cache hit   │   │   Queue URL + HTML
              ┌────────────┘   └────────────────┐
              │                                  │
        ┌─────▼─────┐                    ┌───────▼──────┐
        │   Valkey   │                    │Python Worker │
        │ URL Cache  │                    │              │
        │ Queue      │                    │ Static HTML? │
        │ Pub/Sub    │                    │  Y: markdownify (~1ms)
        └────────────┘                    │  N: crawl4ai (~4-60s)
                                          │              │
                                          │ LLM Classify │
                                          └──┬───┬───┬───┘
                                             │   │   │
                                    ┌────────┘   │   └────────┐
                                    │            │             │
                              ┌─────▼──┐  ┌─────▼───┐  ┌─────▼─────┐
                              │ Ollama │  │ OpenAI  │  │ Anthropic │
                              │(Local) │  │(Cloud)  │  │ (Cloud)   │
                              └────────┘  └─────────┘  └───────────┘
                                             │
                              ┌───────────────┼───────────────┐
                              │               │               │
                        ┌─────▼──────┐  ┌─────▼────┐  ┌──────▼─────┐
                        │ PostgreSQL │  │ .NET API │  │  Angular   │
                        │ URL Data   │  │ REST+SSE │  │ Dashboard  │
                        │ Rules      │  └──────────┘  └────────────┘
                        │ Configs    │
                        └────────────┘

Request Lifecycle

Browser ──► mitmproxy ──► Valkey cache check
                           │
              ┌────────────┴────────────┐
              │                         │
         Cache Hit                 Cache Miss
         (blocked)              Pass response through
              │                 Queue URL + HTML
              ▼                         │
         403 Block              Python Worker dequeues
          Page                          │
                          ┌─────────────┴──────────────┐
                          │                            │
                    Static HTML                   JS-heavy site
                    markdownify (~1ms)          crawl4ai + Chromium
                          │                            │
                          └─────────┬──────────────────┘
                                    │
                          Each active LLM config
                          classifies content
                                    │
                     ┌──────────────┼──────────────┐
                     │              │              │
                  Safe        NeedsReview      Unsafe
                  Cache it    Cache it      Auto-block rule
                                            Cache "blocked"
                                                  │
                              Publish url_analysis_completed
                                                  │
                              .NET API SSE ──► Angular reload

How It Works

A browser request passes through mitmproxy, which intercepts the HTML response
The system checks Valkey cache for a previous classification. Cache hit means instant block or allow
On cache miss, the URL and captured HTML are queued in Valkey for background processing
A Python worker dequeues the URL and extracts readable content. Static HTML goes through markdownify (near instant). JavaScript-heavy sites go through crawl4ai with a headless browser
The extracted markdown is sent to every active LLM provider for classification (Safe, Unsafe, or NeedsReview)
Results are stored in PostgreSQL with token counts, latency, and cost data
If the primary provider classifies a URL as Unsafe and auto-blocking is enabled, a filter rule is created automatically
Valkey publishes an event, the .NET API forwards it via SSE, and the Angular dashboard updates in real time

Key Technical Decisions

mitmproxy as a transparent relay: Handles TLS interception cleanly, skips static assets (images, CSS, JS, fonts) to reduce noise, and captures metadata useful for bypass detection (VPN, DNS-over-HTTPS, DNS-over-TLS)
Smart content extraction: Static sites use markdownify at roughly 1ms per page. JS-heavy SPAs use crawl4ai with stealth mode and anti-bot evasion. The worker picks the right method automatically based on visible text length in the raw HTML
Multi-provider evaluation: Every URL is classified by all active LLM configs simultaneously. This gives direct comparison data instead of testing one provider at a time
Valkey for caching, queuing, and pub/sub: Filter rules and URL classifications are cached so repeat visits are instant. The same Valkey instance handles the work queue and SSE pub/sub notifications
Human review for ground truth: Automated accuracy metrics are meaningless without a baseline. The human review page lets you mark each classification as correct or incorrect, and the system calculates precision, recall, and F1 scores from that data
Clean Architecture in .NET: Domain, Application, Infrastructure, and API layers are separated. The Python worker and mitmproxy addon are independent services communicating through Valkey and PostgreSQL
OpenTelemetry across all services: Traces propagate from the proxy through the Python worker to the .NET API with correlation IDs. Custom metrics track queue depth, extraction latency, classification cost, and token usage per provider

Observability

All telemetry (traces, metrics, logs) is shipped via OpenTelemetry to SigNoz.

Metric	Description
`awf.queue.processed`	URLs dequeued and processed
`awf.extraction.count`	Extraction attempts by method
`awf.extraction.latency`	Extraction latency in milliseconds
`awf.analysis.count`	LLM evaluations by provider
`awf.analysis.latency`	LLM inference latency in milliseconds
`awf.llm.cost_usd`	Cumulative LLM API cost
`awf.llm.token_usage`	Token usage per evaluation

Current Results

Early benchmarks show that local models (Ollama with smaller quantized models) are surprisingly competitive on clear-cut cases like explicit content and gambling sites. Where they fall behind is on ambiguous content that requires nuance, like distinguishing educational health content from adult material, or satire from actual hate speech.

Cloud APIs handle these edge cases better but at a measurable cost per classification. The research dashboard tracks cumulative spend per provider, making it straightforward to calculate at what traffic volume the local approach becomes more cost-effective.

The agreement rate between providers is a useful signal. When all three providers agree on a classification, they are almost always correct. Disagreement is a strong indicator that the content is ambiguous and might benefit from human review.

What I Learned

Building this system taught me that content classification is less about the AI model and more about the pipeline around it. The extraction quality matters as much as the model quality. A perfect classifier fed garbage input will produce garbage output. Getting clean, representative text from modern websites (SPAs, lazy-loaded content, anti-bot protections) turned out to be the harder engineering problem. The AI classification part, once you have clean input, is relatively straightforward regardless of whether you use a local or cloud model.