PII Detection
Semantic Router provides built-in Personally Identifiable Information (PII) detection to protect sensitive data in user queries. The system uses fine-tuned BERT models to identify and handle various types of PII according to configurable policies.
Overview​
The PII detection system:
- Identifies common PII types in user queries
 - Enforces model-specific PII policies
 - Blocks or masks sensitive information based on configuration
 - Filters model candidates based on PII compliance
 - Logs policy violations for monitoring
 
Supported PII Types​
The system can detect the following PII types:
| PII Type | Description | Examples | 
|---|---|---|
PERSON | Person names | "John Smith", "Mary Johnson" | 
EMAIL_ADDRESS | Email addresses | "user@example.com" | 
PHONE_NUMBER | Phone numbers | "+1-555-123-4567", "(555) 123-4567" | 
US_SSN | US Social Security Numbers | "123-45-6789" | 
STREET_ADDRESS | Physical addresses | "123 Main St, New York, NY" | 
GPE | Geopolitical entities | Countries, states, cities | 
ORGANIZATION | Organization names | "Microsoft", "OpenAI" | 
CREDIT_CARD | Credit card numbers | "4111-1111-1111-1111" | 
US_DRIVER_LICENSE | US Driver's License | "D123456789" | 
IBAN_CODE | International Bank Account Number | "GB82 WEST 1234 5698 7654 32" | 
IP_ADDRESS | IP addresses | "192.168.1.1", "2001:db8::1" | 
DOMAIN_NAME | Domain/website names | "example.com", "google.com" | 
DATE_TIME | Date/time information | "2024-01-15", "January 15th" | 
AGE | Age information | "25 years old", "born in 1990" | 
NRP | Nationality/Religious/Political groups | "American", "Christian", "Democrat" | 
ZIP_CODE | ZIP/postal codes | "10001", "SW1A 1AA" | 
Configuration​
Basic PII Detection​
Enable PII detection in your configuration:
# config/config.yaml
classifier:
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_model"
    threshold: 0.7                 # Detection sensitivity (0.0-1.0)
    use_cpu: true                  # Run on CPU
    pii_mapping_path: "config/pii_type_mapping.json"  # Path to PII type mapping
Model-Specific PII Policies​
Configure different PII policies for different models:
# vLLM endpoints configuration
vllm_endpoints:
  - name: secure-model
    address: "127.0.0.1"
    port: 8080
    models: ["secure-llm"]
  - name: general-model
    address: "127.0.0.1"
    port: 8081
    models: ["general-llm"]
# Model-specific configurations
model_config:
  secure-llm:
    pii_policy:
      allow_by_default: false      # Block all PII by default
      pii_types:                   # Only allow these specific types
        - "EMAIL_ADDRESS"
        - "GPE"
        - "ORGANIZATION"
  general-llm:
    pii_policy:
      allow_by_default: true       # Allow all PII by default
      pii_types: []                # Not used when allow_by_default is true
How PII Detection Works​
The PII detection system works as follows:
- Detection: The PII classifier model analyzes incoming text to identify PII types
 - Policy Check: The system checks if the detected PII types are allowed for the target model
 - Routing Decision: Models that don't allow the detected PII types are filtered out
 - Logging: All PII detections and policy decisions are logged for monitoring
 
API Integration​
PII detection is automatically integrated into the routing process. When a request is made to the router, the system:
- Analyzes the input text for PII using the configured classifier
 - Checks PII policies for candidate models
 - Filters out models that don't allow the detected PII types
 - Routes to an appropriate model that can handle the PII
 
Classification Endpoint​
You can also check PII detection directly using the classification API:
curl -X POST http://localhost:8080/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{
    "text": "My email is john.doe@example.com and I live in New York"
  }'
The response includes PII information along with category classification results.
Monitoring and Metrics​
The system exposes PII-related metrics:
# Prometheus metrics
pii_detections_total{type="EMAIL_ADDRESS"} 45
pii_detections_total{type="PERSON"} 23
pii_policy_violations_total{model="secure-model"} 12
pii_requests_blocked_total 8
pii_requests_masked_total 15
Best Practices​
1. Threshold Tuning​
- Start with 
threshold: 0.7for balanced accuracy - Increase to 
0.8-0.9for high-security environments - Decrease to 
0.5-0.6for broader detection 
2. Policy Design​
- Use 
allow_by_default: falsefor sensitive models - Explicitly list allowed PII types for clarity
 - Consider different policies for different use cases
 
3. Action Selection​
- Use 
blockfor high-security scenarios - Use 
maskwhen processing is still needed - Use 
allowwith logging for audit requirements 
4. Model Filtering​
- Configure PII policies to automatically filter model candidates
 - Ensure at least one model can handle each PII scenario
 - Test policy combinations thoroughly
 
Troubleshooting​
Common Issues​
High False Positives
- Lower the detection threshold
 - Review training data for edge cases
 - Consider custom model fine-tuning
 
Missed PII Detection
- Increase detection sensitivity
 - Check if PII type is supported
 - Verify model is properly loaded
 
Policy Conflicts
- Ensure at least one model allows detected PII types
 - Check 
allow_by_defaultsettings - Review 
pii_types_allowedlists 
Debug Mode​
Enable detailed PII logging:
logging:
  level: debug
  pii_detection: true
This will log all PII detection decisions and policy evaluations.