This commit is contained in:
Iliyan Angelov
2025-09-19 11:58:53 +03:00
parent 306b20e24a
commit 6b247e5b9f
11423 changed files with 1500615 additions and 778 deletions

View File

@@ -0,0 +1,633 @@
# Analytics & Predictive Insights API Documentation
## Overview
The Analytics & Predictive Insights module provides comprehensive analytics capabilities for incident management, including advanced KPIs, predictive analytics, ML-based anomaly detection, and cost impact analysis.
## Features
- **Advanced KPIs**: MTTA, MTTR, incident recurrence rate, availability metrics
- **Predictive Analytics**: ML-based incident prediction, severity prediction, resolution time prediction
- **Anomaly Detection**: Statistical, temporal, and pattern-based anomaly detection
- **Cost Analysis**: Downtime cost, lost revenue, penalty cost analysis
- **Dashboards**: Configurable dashboards with heatmaps and visualizations
- **Heatmaps**: Time-based incident frequency, resolution time, and cost impact visualizations
## API Endpoints
### Base URL
```
/api/analytics/
```
## KPI Metrics
### List KPI Metrics
```http
GET /api/analytics/kpi-metrics/
```
**Query Parameters:**
- `metric_type`: Filter by metric type (MTTA, MTTR, INCIDENT_COUNT, etc.)
- `is_active`: Filter by active status (true/false)
- `is_system_metric`: Filter by system metric status (true/false)
- `created_after`: Filter by creation date (ISO 8601)
- `created_before`: Filter by creation date (ISO 8601)
**Response:**
```json
{
"count": 10,
"next": null,
"previous": null,
"results": [
{
"id": "uuid",
"name": "Mean Time to Acknowledge",
"description": "Average time to acknowledge incidents",
"metric_type": "MTTA",
"aggregation_type": "AVERAGE",
"incident_categories": ["Infrastructure", "Application"],
"incident_severities": ["HIGH", "CRITICAL"],
"incident_priorities": ["P1", "P2"],
"calculation_formula": null,
"time_window_hours": 24,
"is_active": true,
"is_system_metric": true,
"created_by_username": "admin",
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:00:00Z",
"measurement_count": 100,
"latest_measurement": {
"value": "15.5",
"unit": "minutes",
"calculated_at": "2024-01-01T12:00:00Z",
"incident_count": 25
}
}
]
}
```
### Get KPI Metric Details
```http
GET /api/analytics/kpi-metrics/{id}/
```
### Create KPI Metric
```http
POST /api/analytics/kpi-metrics/
```
**Request Body:**
```json
{
"name": "Custom MTTR",
"description": "Custom Mean Time to Resolve metric",
"metric_type": "MTTR",
"aggregation_type": "AVERAGE",
"incident_categories": ["Infrastructure"],
"incident_severities": ["HIGH", "CRITICAL"],
"time_window_hours": 48,
"is_active": true
}
```
### Get KPI Measurements
```http
GET /api/analytics/kpi-metrics/{id}/measurements/
```
**Query Parameters:**
- `start_date`: Filter by measurement period start (ISO 8601)
- `end_date`: Filter by measurement period end (ISO 8601)
### Get KPI Summary
```http
GET /api/analytics/kpi-metrics/summary/
```
**Response:**
```json
[
{
"metric_type": "MTTA",
"metric_name": "Mean Time to Acknowledge",
"current_value": "15.5",
"unit": "minutes",
"trend": "down",
"trend_percentage": "-5.2",
"period_start": "2024-01-01T00:00:00Z",
"period_end": "2024-01-01T24:00:00Z",
"incident_count": 25,
"target_value": null,
"target_met": true
}
]
```
## KPI Measurements
### List KPI Measurements
```http
GET /api/analytics/kpi-measurements/
```
**Query Parameters:**
- `metric_id`: Filter by metric ID
- `start_date`: Filter by measurement period start
- `end_date`: Filter by measurement period end
## Incident Recurrence Analysis
### List Recurrence Analyses
```http
GET /api/analytics/recurrence-analyses/
```
**Query Parameters:**
- `recurrence_type`: Filter by recurrence type
- `min_confidence`: Filter by minimum confidence score
- `is_resolved`: Filter by resolution status
### Get Unresolved Recurrence Analyses
```http
GET /api/analytics/recurrence-analyses/unresolved/
```
## Predictive Models
### List Predictive Models
```http
GET /api/analytics/predictive-models/
```
**Query Parameters:**
- `model_type`: Filter by model type
- `status`: Filter by model status
### Create Predictive Model
```http
POST /api/analytics/predictive-models/
```
**Request Body:**
```json
{
"name": "Incident Severity Predictor",
"description": "Predicts incident severity based on historical data",
"model_type": "SEVERITY_PREDICTION",
"algorithm_type": "RANDOM_FOREST",
"model_config": {
"n_estimators": 100,
"max_depth": 10
},
"feature_columns": ["title_length", "description_length", "category"],
"target_column": "severity",
"training_data_period_days": 90,
"min_training_samples": 100
}
```
### Train Model
```http
POST /api/analytics/predictive-models/{id}/train/
```
### Get Model Performance
```http
GET /api/analytics/predictive-models/{id}/performance/
```
## Anomaly Detection
### List Anomaly Detections
```http
GET /api/analytics/anomaly-detections/
```
**Query Parameters:**
- `anomaly_type`: Filter by anomaly type
- `severity`: Filter by severity level
- `status`: Filter by status
- `start_date`: Filter by detection date
- `end_date`: Filter by detection date
### Get Anomaly Summary
```http
GET /api/analytics/anomaly-detections/summary/
```
**Response:**
```json
{
"total_anomalies": 50,
"critical_anomalies": 5,
"high_anomalies": 15,
"medium_anomalies": 20,
"low_anomalies": 10,
"unresolved_anomalies": 12,
"false_positive_rate": "8.5",
"average_resolution_time": "2:30:00"
}
```
### Acknowledge Anomaly
```http
POST /api/analytics/anomaly-detections/{id}/acknowledge/
```
### Resolve Anomaly
```http
POST /api/analytics/anomaly-detections/{id}/resolve/
```
## Cost Impact Analysis
### List Cost Analyses
```http
GET /api/analytics/cost-analyses/
```
**Query Parameters:**
- `cost_type`: Filter by cost type
- `is_validated`: Filter by validation status
- `start_date`: Filter by creation date
- `end_date`: Filter by creation date
### Get Cost Summary
```http
GET /api/analytics/cost-analyses/summary/
```
**Response:**
```json
{
"total_cost": "125000.00",
"currency": "USD",
"downtime_cost": "75000.00",
"lost_revenue": "40000.00",
"penalty_cost": "10000.00",
"resource_cost": "0.00",
"total_downtime_hours": "150.5",
"total_affected_users": 5000,
"cost_per_hour": "830.56",
"cost_per_user": "25.00"
}
```
## Dashboard Configurations
### List Dashboard Configurations
```http
GET /api/analytics/dashboard-configurations/
```
**Query Parameters:**
- `dashboard_type`: Filter by dashboard type
- `is_active`: Filter by active status
### Create Dashboard Configuration
```http
POST /api/analytics/dashboard-configurations/
```
**Request Body:**
```json
{
"name": "Executive Dashboard",
"description": "High-level metrics for executives",
"dashboard_type": "EXECUTIVE",
"layout_config": {
"rows": 2,
"columns": 3
},
"widget_configs": [
{
"type": "kpi_summary",
"position": {"row": 0, "column": 0},
"size": {"width": 2, "height": 1}
},
{
"type": "anomaly_summary",
"position": {"row": 0, "column": 2},
"size": {"width": 1, "height": 1}
}
],
"is_public": false,
"allowed_roles": ["executive", "manager"],
"auto_refresh_enabled": true,
"refresh_interval_seconds": 300
}
```
### Get Dashboard Data
```http
GET /api/analytics/dashboard/{id}/data/
```
**Response:**
```json
{
"kpi_summary": [...],
"anomaly_summary": {...},
"cost_summary": {...},
"insight_summary": {...},
"recent_anomalies": [...],
"recent_insights": [...],
"heatmap_data": [...],
"last_updated": "2024-01-01T12:00:00Z"
}
```
## Heatmap Data
### List Heatmap Data
```http
GET /api/analytics/heatmap-data/
```
**Query Parameters:**
- `heatmap_type`: Filter by heatmap type
- `time_granularity`: Filter by time granularity
## Predictive Insights
### List Predictive Insights
```http
GET /api/analytics/predictive-insights/
```
**Query Parameters:**
- `insight_type`: Filter by insight type
- `confidence_level`: Filter by confidence level
- `is_acknowledged`: Filter by acknowledgment status
- `is_validated`: Filter by validation status
- `include_expired`: Include expired insights (true/false)
### Acknowledge Insight
```http
POST /api/analytics/predictive-insights/{id}/acknowledge/
```
### Get Insight Summary
```http
GET /api/analytics/predictive-insights/summary/
```
**Response:**
```json
{
"total_insights": 25,
"high_confidence_insights": 8,
"medium_confidence_insights": 12,
"low_confidence_insights": 5,
"acknowledged_insights": 15,
"validated_insights": 10,
"expired_insights": 3,
"average_accuracy": "0.85",
"active_models": 4
}
```
## Data Models
### KPI Metric
```json
{
"id": "uuid",
"name": "string",
"description": "string",
"metric_type": "MTTA|MTTR|MTBF|MTBSI|AVAILABILITY|INCIDENT_COUNT|RESOLUTION_RATE|ESCALATION_RATE|CUSTOM",
"aggregation_type": "AVERAGE|MEDIAN|MIN|MAX|SUM|COUNT|PERCENTILE_95|PERCENTILE_99",
"incident_categories": ["string"],
"incident_severities": ["string"],
"incident_priorities": ["string"],
"calculation_formula": "string",
"time_window_hours": "integer",
"is_active": "boolean",
"is_system_metric": "boolean",
"created_by": "uuid",
"created_at": "datetime",
"updated_at": "datetime"
}
```
### Predictive Model
```json
{
"id": "uuid",
"name": "string",
"description": "string",
"model_type": "ANOMALY_DETECTION|INCIDENT_PREDICTION|SEVERITY_PREDICTION|RESOLUTION_TIME_PREDICTION|ESCALATION_PREDICTION|COST_PREDICTION",
"algorithm_type": "ISOLATION_FOREST|LSTM|RANDOM_FOREST|XGBOOST|SVM|NEURAL_NETWORK|ARIMA|PROPHET",
"model_config": "object",
"feature_columns": ["string"],
"target_column": "string",
"training_data_period_days": "integer",
"min_training_samples": "integer",
"accuracy_score": "float",
"precision_score": "float",
"recall_score": "float",
"f1_score": "float",
"status": "TRAINING|ACTIVE|INACTIVE|RETRAINING|ERROR",
"version": "string",
"model_file_path": "string",
"last_trained_at": "datetime",
"training_duration_seconds": "integer",
"training_samples_count": "integer",
"auto_retrain_enabled": "boolean",
"retrain_frequency_days": "integer",
"performance_threshold": "float",
"created_by": "uuid",
"created_at": "datetime",
"updated_at": "datetime"
}
```
### Anomaly Detection
```json
{
"id": "uuid",
"model": "uuid",
"anomaly_type": "STATISTICAL|TEMPORAL|PATTERN|THRESHOLD|BEHAVIORAL",
"severity": "LOW|MEDIUM|HIGH|CRITICAL",
"status": "DETECTED|INVESTIGATING|CONFIRMED|FALSE_POSITIVE|RESOLVED",
"confidence_score": "float",
"anomaly_score": "float",
"threshold_used": "float",
"detected_at": "datetime",
"time_window_start": "datetime",
"time_window_end": "datetime",
"related_incidents": ["uuid"],
"affected_services": ["string"],
"affected_metrics": ["string"],
"description": "string",
"root_cause_analysis": "string",
"impact_assessment": "string",
"actions_taken": ["string"],
"resolved_at": "datetime",
"resolved_by": "uuid",
"metadata": "object"
}
```
### Cost Impact Analysis
```json
{
"id": "uuid",
"incident": "uuid",
"cost_type": "DOWNTIME|LOST_REVENUE|PENALTY|RESOURCE_COST|REPUTATION_COST|COMPLIANCE_COST",
"cost_amount": "decimal",
"currency": "string",
"calculation_method": "string",
"calculation_details": "object",
"downtime_hours": "decimal",
"affected_users": "integer",
"revenue_impact": "decimal",
"business_unit": "string",
"service_tier": "string",
"is_validated": "boolean",
"validated_by": "uuid",
"validated_at": "datetime",
"validation_notes": "string",
"created_at": "datetime",
"updated_at": "datetime"
}
```
## Management Commands
### Calculate KPIs
```bash
python manage.py calculate_kpis [--metric-id METRIC_ID] [--time-window HOURS] [--force]
```
### Run Anomaly Detection
```bash
python manage.py run_anomaly_detection [--model-id MODEL_ID] [--time-window HOURS]
```
### Train Predictive Models
```bash
python manage.py train_predictive_models [--model-id MODEL_ID] [--force]
```
## Error Handling
All endpoints return appropriate HTTP status codes and error messages:
- `400 Bad Request`: Invalid request data
- `401 Unauthorized`: Authentication required
- `403 Forbidden`: Insufficient permissions
- `404 Not Found`: Resource not found
- `500 Internal Server Error`: Server error
**Error Response Format:**
```json
{
"error": "Error message",
"details": "Additional error details",
"code": "ERROR_CODE"
}
```
## Authentication
All endpoints require authentication. Use one of the following methods:
1. **Token Authentication**: Include `Authorization: Token <token>` header
2. **Session Authentication**: Use Django session authentication
3. **SSO Authentication**: Use configured SSO providers
## Rate Limiting
API endpoints are rate-limited to prevent abuse:
- 1000 requests per hour per user
- 100 requests per minute per user
## Pagination
List endpoints support pagination:
- `page`: Page number (default: 1)
- `page_size`: Items per page (default: 20, max: 100)
## Filtering and Sorting
Most list endpoints support:
- **Filtering**: Use query parameters to filter results
- **Sorting**: Use `ordering` parameter (e.g., `ordering=-created_at`)
- **Search**: Use `search` parameter for text search
## Webhooks
The analytics module supports webhooks for real-time notifications:
- **Anomaly Detected**: Triggered when new anomalies are detected
- **KPI Threshold Breached**: Triggered when KPI values exceed thresholds
- **Model Training Completed**: Triggered when model training finishes
- **Cost Threshold Exceeded**: Triggered when cost impact exceeds thresholds
## Integration Examples
### Python Client Example
```python
import requests
# Get KPI summary
response = requests.get(
'https://api.example.com/api/analytics/kpi-metrics/summary/',
headers={'Authorization': 'Token your-token-here'}
)
kpi_summary = response.json()
# Create predictive model
model_data = {
'name': 'Incident Predictor',
'description': 'Predicts incident occurrence',
'model_type': 'INCIDENT_PREDICTION',
'algorithm_type': 'RANDOM_FOREST',
'model_config': {'n_estimators': 100}
}
response = requests.post(
'https://api.example.com/api/analytics/predictive-models/',
json=model_data,
headers={'Authorization': 'Token your-token-here'}
)
model = response.json()
```
### JavaScript Client Example
```javascript
// Get dashboard data
fetch('/api/analytics/dashboard/123/data/', {
headers: {
'Authorization': 'Token your-token-here',
'Content-Type': 'application/json'
}
})
.then(response => response.json())
.then(data => {
console.log('Dashboard data:', data);
// Update dashboard UI
});
```
## Best Practices
1. **Use appropriate time windows** for KPI calculations
2. **Monitor model performance** and retrain when accuracy drops
3. **Validate cost analyses** before using for business decisions
4. **Set up alerts** for critical anomalies and threshold breaches
5. **Regular cleanup** of expired insights and old measurements
6. **Use pagination** for large datasets
7. **Cache frequently accessed data** to improve performance
## Support
For technical support or questions about the Analytics & Predictive Insights API:
- **Documentation**: This API documentation
- **Issues**: Report issues through the project repository
- **Contact**: Contact the development team for assistance