This document describes the security and data masking features implemented for the Langfuse workflow visualization system.
The security features ensure that sensitive information is properly masked in traces while maintaining the functionality of the observability system. This includes PII filtering, data retention policies, access controls, and secure credential management.
The system supports several masking strategies:
- REPLACE: Replace sensitive data with a fixed string (e.g.,
********for passwords) - PARTIAL: Show partial information while masking sensitive parts (e.g.,
test@***.comfor emails) - REDACT: Completely remove sensitive data
- HASH: Hash sensitive data for anonymization
The system includes default rules for common sensitive data types:
| Field Path | Strategy | Description |
|---|---|---|
user.email |
PARTIAL | Email addresses |
user.phone |
PARTIAL | Phone numbers |
user.ssn |
REPLACE | Social Security Numbers |
payment.card_number |
PARTIAL | Credit card numbers |
payment.cvv |
REPLACE | CVV codes |
auth.password |
REPLACE | Passwords |
auth.api_key |
PARTIAL | API keys |
location.address |
PARTIAL | Street addresses |
business.profit_margin |
REPLACE | Profit margins |
You can add custom masking rules by extending the MaskingRule class:
from config.langfuse_data_masking import MaskingRule, MaskingStrategy
custom_rule = MaskingRule(
field_path="custom.sensitive_field",
strategy=MaskingStrategy.REPLACE,
replacement="[SENSITIVE]",
description="Custom sensitive field"
)Add the following environment variables to configure security features:
# Data Masking
LANGFUSE_ENABLE_DATA_MASKING=true
LANGFUSE_PII_FILTERING_ENABLED=true
# Data Retention
LANGFUSE_DATA_RETENTION_DAYS=90
# Access Control
LANGFUSE_ALLOWED_TRACE_TYPES=simulation,agent_decision,collaboration
LANGFUSE_AUDIT_TRACE_ACCESS=true
# Encryption
LANGFUSE_ENCRYPTION_ENABLED=falsefrom config.langfuse_config import LangfuseConfig
config = LangfuseConfig(
# ... other config
enable_data_masking=True,
pii_filtering_enabled=True,
data_retention_days=90,
audit_trace_access=True,
allowed_trace_types=["simulation", "agent_decision"]
)You can restrict which trace types are allowed based on user roles:
from config.langfuse_data_masking import get_secure_trace_manager
manager = get_secure_trace_manager()
manager.set_access_control("simulation", ["admin", "analyst"])
manager.set_access_control("agent_decision", ["admin"])When creating traces, provide user roles for access control:
from config.langfuse_integration import get_langfuse_integration
integration = get_langfuse_integration()
trace_id = integration.create_simulation_trace(
event_data={"type": "market_event", "data": {...}},
user_roles=["admin"]
)Configure how long different types of data should be retained:
manager = get_secure_trace_manager()
manager.set_retention_policy("user.email", 30) # 30 days
manager.set_retention_policy("payment.card_number", 365) # 1 yearThe system can automatically remove expired data based on retention policies. This is handled during trace creation and processing.
Credentials are stored securely using environment variables:
LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
LANGFUSE_SECRET_KEY=sk-lf-your-secret-keyThe system validates credentials before use:
from config.langfuse_data_masking import get_secure_trace_manager
manager = get_secure_trace_manager()
is_valid = manager.validate_credentials({
"public_key": "pk-lf-...",
"secret_key": "sk-lf-..."
})Enable audit logging to track trace access:
LANGFUSE_AUDIT_TRACE_ACCESS=trueThe system logs the following audit events:
- Trace creation
- Trace access
- Data masking operations
- Access control violations
pytest tests/test_langfuse_security.py -vThe test suite covers:
- Data masking functionality
- Access control mechanisms
- Credential validation
- Retention policy enforcement
- Integration with Langfuse traces
- Only grant access to trace types that users actually need
- Use specific roles rather than broad permissions
- Only collect and store necessary data
- Apply masking to all potentially sensitive fields
- Set appropriate retention policies
- Regularly review masking rules
- Monitor audit logs for suspicious activity
- Update retention policies as needed
- Store credentials in secure environment variables
- Use strong, unique keys for production
- Regularly rotate credentials
- Traces not appearing: Check access controls and user roles
- Data not masked: Verify masking rules are correctly configured
- Performance impact: Monitor masking overhead and adjust rules if needed
Enable debug logging to troubleshoot security features:
import logging
logging.getLogger('config.langfuse_data_masking').setLevel(logging.DEBUG)These security features help with compliance for:
- GDPR: Data minimization and user consent
- PCI DSS: Payment information protection
- HIPAA: Healthcare data protection (if applicable)
- SOX: Audit trail requirements
- Encryption at rest for stored traces
- Advanced pattern matching for custom data types
- Integration with external secret management systems
- Real-time alerting for security violations