Documentation
Anomaly Detection Guide
Anomaly detection is a key feature of Sparvi that helps you identify unusual patterns, outliers, and unexpected changes in your data. This guide explains how Sparvi's anomaly detection works and how you can leverage it to maintain high data quality.
Understanding Anomaly Detection in Sparvi
Anomaly detection in Sparvi works by:
- Analyzing current data to find statistical outliers and pattern deviations
- Comparing with historical profiles to detect changes over time
- Applying domain-specific rules to identify business anomalies
- Ranking anomalies by severity to prioritize the most important issues
Sparvi can detect anomalies in several ways:
- Automatically during regular profiling operations
- Comparatively when historical data is provided
- Rule-based through custom validation rules
- Time-series based by analyzing trends and seasonality patterns
Types of Anomalies
Sparvi detects different types of anomalies:
Statistical Outliers
Values that deviate significantly from the statistical norm:
- Numeric Outliers: Values far from the mean/median (typically beyond 3 standard deviations)
- Temporal Outliers: Unusual spikes or drops in time-series data
- Frequency Outliers: Unusual distribution of categorical values
Data Quality Anomalies
Issues that indicate potential data quality problems:
- Null Rate Changes: Unexpected increases in null values
- Duplicate Rate Changes: Unexpected increases in duplicate records
- Format Inconsistencies: Deviations from established data formats
- Referential Integrity Issues: Invalid references between tables
Schema Anomalies
Changes in the structure of your data:
- Column Additions/Removals: New or missing columns
- Data Type Changes: Changes in column data types
- Constraint Changes: Changes in column nullability or uniqueness
Business Logic Anomalies
Violations of business rules or expectations:
- Range Violations: Values outside of acceptable business ranges
- Process Breaks: Indications of broken business processes
- Relationship Anomalies: Unusual relationships between entities
Detecting Anomalies with Sparvi Cloud
Sparvi Cloud provides automated anomaly detection through the web interface:
- Automatic Detection: Anomalies are detected automatically during data profiling
- Historical Comparison: Sparvi Cloud tracks changes over time and alerts you to unusual patterns
- Anomaly Dashboard: View all detected anomalies in the Anomaly Detection Dashboard
- Configurable Thresholds: Customize detection sensitivity for different data patterns
- Severity Ranking: Anomalies are automatically ranked by severity to help you prioritize
- Business Impact Analysis: See which teams and processes are affected by detected anomalies
- Notifications: Get alerted via Slack, email, or webhooks when anomalies are detected
Anomaly Types in Detail
Row Count Anomalies
Unusual changes in the number of rows in your tables. Sparvi detects when row counts increase or decrease beyond expected thresholds, which could indicate:
- Data pipeline failures
- Missing data loads
- Unexpected data deletions
- Data quality issues upstream
Example: "Row count decreased by 25% (from 1000 to 750)" - marked as high severity
Null Rate Anomalies
Unexpected changes in null values across your columns. Sparvi monitors null rates and alerts when they deviate from normal patterns, which could indicate:
- Missing data from upstream sources
- Changes in data collection processes
- ETL pipeline issues
- Schema changes affecting data population
Example: "Null rate for 'email' increased from 2% to 15%" - marked as medium severity
Distribution Anomalies
Changes in the distribution of values within your data. Sparvi detects when value distributions shift significantly, which could indicate:
- Skewed data processing
- Changes in business operations
- Data collection biases
- Incorrect default values
Example: "Unusual distribution of 'status' values (98% are now 'pending')" - marked as high severity
Statistical Outliers
Values that are statistical outliers compared to historical patterns. Sparvi identifies values that fall outside expected ranges, which could indicate:
- Data entry errors
- System bugs generating invalid values
- Legitimate but unusual business events
- Integration issues with external systems
Example: "Found 5 outlier values in 'amount' (beyond 3 std deviations)" - marked as medium severity
Format Anomalies
Inconsistencies in data formats across your columns. Sparvi detects when data doesn't match established patterns, which could indicate:
- Changes in data collection processes
- Integration issues with external systems
- Data validation failures
- Migration or import errors
Example: "15% of 'phone_number' values don't match the established pattern" - marked as medium severity
Schema Shift Anomalies
Changes in your table schema structure. Sparvi monitors for schema changes including:
- New columns added
- Columns removed
- Data type changes
- Constraint modifications
Examples:
- "New column 'discount_code' was added" - marked as medium severity
- "Data type changed from 'DECIMAL(10,2)' to 'INTEGER'" - marked as high severity
Working with Anomalies in Sparvi Cloud
Filtering and Viewing Anomalies
In the Anomaly Detection Dashboard, you can:
- Filter anomalies by severity (high, medium, low)
- Filter by anomaly type (row count, null rate, distribution, etc.)
- Filter by affected columns or tables
- View anomalies over specific time periods
- Search for specific anomalies by description
Taking Action on Anomalies
When Sparvi Cloud detects an anomaly, you can:
- Create an Issue: Convert the anomaly into a tracked issue with business context
- Assign to Team Members: Use @mentions to notify the right people
- Add Comments: Collaborate with your team to investigate and resolve
- Track Resolution: Monitor the status of anomalies from detection to resolution
- Document Findings: Keep a record of what caused the anomaly and how it was fixed
Anomaly Dashboard
The Anomaly Detection Dashboard provides:
- Visual Trends: See anomaly patterns over time
- Severity Breakdown: Understand the distribution of high, medium, and low severity anomalies
- Type Analysis: View which anomaly types are most common in your data
- Table Heatmap: Identify which tables have the most data quality issues
- Business Impact: See which teams and processes are affected
Configuring Anomaly Detection
Custom Thresholds
In Sparvi Cloud, you can configure detection sensitivity through the settings interface:
Row Count Changes: Set the percentage threshold for row count changes (default: 20%)
- Example: Alert only on 30% or greater changes
Null Rate Changes: Configure sensitivity for null rate changes (default: 10%)
- Example: Alert on 5% changes for critical columns like email
Distribution Changes: Set thresholds for value distribution shifts (default: 20%)
- Example: More sensitive monitoring for status or category columns
Expected Patterns
Define expected data formats for your columns:
- Email format: Standard email validation patterns
- Phone numbers: Custom phone number formats for your region
- Postal codes: ZIP code or postal code patterns
- Custom formats: Define your own regex patterns for specialized data
Column-Specific Sensitivity
Configure different sensitivity levels for different columns:
- Critical columns: Higher sensitivity (e.g., customer email, payment amounts)
- Reference columns: Standard sensitivity (e.g., descriptions, notes)
- Audit columns: Lower sensitivity (e.g., created_by, updated_at)
Best Practices
- Start Simple: Begin with default settings and refine thresholds as you learn your data patterns
- Historical Comparison: Sparvi Cloud automatically maintains historical profiles for meaningful anomaly detection
- Adjust Thresholds: Customize thresholds based on your data's natural variability
- Column Sensitivity: Set different sensitivity levels for critical columns vs. less important ones
- Severity Triage: Focus on high-severity anomalies first, then address medium and low ones
- Regular Monitoring: Sparvi Cloud runs profiles on automated schedules to establish normal patterns
- Document Findings: Use issue tracking to keep a record of anomalies and their resolutions
- Team Collaboration: Use @mentions and comments to involve the right people quickly
- Pattern Library: Build a library of expected patterns for format validation
- Notification Strategy: Configure Slack, email, or webhook alerts for critical anomalies
Next Steps
After mastering anomaly detection:
- Set up Validation Rules to complement anomaly detection with business-specific checks
- Explore Data Profiling to understand comprehensive data quality metrics
- Learn about Sparvi Cloud Database Connections to connect your data sources