Updates
This commit is contained in:
@@ -0,0 +1,363 @@
|
||||
# Incident Intelligence API Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The Incident Intelligence module provides AI-driven capabilities for incident management, including:
|
||||
|
||||
- **AI-driven incident classification** using NLP to categorize incidents from free text
|
||||
- **Automated severity suggestion** based on impact analysis
|
||||
- **Correlation engine** for linking related incidents and problem detection
|
||||
- **Duplication detection** for merging incidents that describe the same outage
|
||||
|
||||
## Features
|
||||
|
||||
### 1. AI-Driven Incident Classification
|
||||
|
||||
Automatically classifies incidents into categories and subcategories based on their content:
|
||||
|
||||
- **Categories**: Infrastructure, Application, Security, User Experience, Data, Integration
|
||||
- **Subcategories**: Specific types within each category (e.g., API_ISSUE, DATABASE_ISSUE)
|
||||
- **Confidence Scoring**: AI confidence level for each classification
|
||||
- **Keyword Extraction**: Identifies relevant keywords from incident text
|
||||
- **Sentiment Analysis**: Analyzes the sentiment of incident descriptions
|
||||
- **Urgency Detection**: Identifies urgency indicators in the text
|
||||
|
||||
### 2. Automated Severity Suggestion
|
||||
|
||||
Suggests incident severity based on multiple factors:
|
||||
|
||||
- **User Impact Analysis**: Number of affected users and impact level
|
||||
- **Business Impact Assessment**: Revenue and operational impact
|
||||
- **Technical Impact Evaluation**: System and infrastructure impact
|
||||
- **Text Analysis**: Severity indicators in incident descriptions
|
||||
- **Confidence Scoring**: AI confidence in severity suggestions
|
||||
|
||||
### 3. Correlation Engine
|
||||
|
||||
Links related incidents and detects patterns:
|
||||
|
||||
- **Correlation Types**: Same Service, Same Component, Temporal, Pattern Match, Dependency, Cascade
|
||||
- **Problem Detection**: Identifies when correlations suggest larger problems
|
||||
- **Time-based Analysis**: Considers temporal proximity of incidents
|
||||
- **Service Similarity**: Analyzes shared services and components
|
||||
- **Pattern Recognition**: Detects recurring issues and trends
|
||||
|
||||
### 4. Duplication Detection
|
||||
|
||||
Identifies and manages duplicate incidents:
|
||||
|
||||
- **Duplication Types**: Exact, Near Duplicate, Similar, Potential Duplicate
|
||||
- **Similarity Analysis**: Text, temporal, and service similarity scoring
|
||||
- **Merge Recommendations**: Suggests actions (Merge, Link, Review, No Action)
|
||||
- **Confidence Scoring**: AI confidence in duplication detection
|
||||
- **Shared Elements**: Identifies common elements between incidents
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Incidents
|
||||
|
||||
#### Create Incident
|
||||
```http
|
||||
POST /api/incidents/incidents/
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"title": "Database Connection Timeout",
|
||||
"description": "Users are experiencing timeouts when trying to access the database",
|
||||
"free_text": "Database is down, can't connect, getting timeout errors",
|
||||
"affected_users": 150,
|
||||
"business_impact": "Critical business operations are affected",
|
||||
"reporter": 1
|
||||
}
|
||||
```
|
||||
|
||||
#### Get Incident Analysis
|
||||
```http
|
||||
GET /api/incidents/incidents/{id}/analysis/
|
||||
```
|
||||
|
||||
Returns comprehensive AI analysis including:
|
||||
- Classification results
|
||||
- Severity suggestions
|
||||
- Correlations with other incidents
|
||||
- Potential duplicates
|
||||
- Associated patterns
|
||||
|
||||
#### Trigger AI Analysis
|
||||
```http
|
||||
POST /api/incidents/incidents/{id}/analyze/
|
||||
```
|
||||
|
||||
Manually triggers AI analysis for a specific incident.
|
||||
|
||||
#### Get Incident Statistics
|
||||
```http
|
||||
GET /api/incidents/incidents/stats/
|
||||
```
|
||||
|
||||
Returns statistics including:
|
||||
- Total incidents by status and severity
|
||||
- Average resolution time
|
||||
- AI processing statistics
|
||||
- Duplicate and correlation counts
|
||||
|
||||
### Correlations
|
||||
|
||||
#### Get Correlations
|
||||
```http
|
||||
GET /api/incidents/correlations/
|
||||
```
|
||||
|
||||
#### Get Problem Indicators
|
||||
```http
|
||||
GET /api/incidents/correlations/problem_indicators/
|
||||
```
|
||||
|
||||
Returns correlations that indicate larger problems.
|
||||
|
||||
### Duplications
|
||||
|
||||
#### Get Duplications
|
||||
```http
|
||||
GET /api/incidents/duplications/
|
||||
```
|
||||
|
||||
#### Approve Merge
|
||||
```http
|
||||
POST /api/incidents/duplications/{id}/approve_merge/
|
||||
```
|
||||
|
||||
#### Reject Merge
|
||||
```http
|
||||
POST /api/incidents/duplications/{id}/reject_merge/
|
||||
```
|
||||
|
||||
### Patterns
|
||||
|
||||
#### Get Patterns
|
||||
```http
|
||||
GET /api/incidents/patterns/
|
||||
```
|
||||
|
||||
#### Get Active Patterns
|
||||
```http
|
||||
GET /api/incidents/patterns/active_patterns/
|
||||
```
|
||||
|
||||
#### Resolve Pattern
|
||||
```http
|
||||
POST /api/incidents/patterns/{id}/resolve_pattern/
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
### Incident
|
||||
- **id**: UUID primary key
|
||||
- **title**: Incident title
|
||||
- **description**: Detailed description
|
||||
- **free_text**: Original free text from user
|
||||
- **category**: AI-classified category
|
||||
- **subcategory**: AI-classified subcategory
|
||||
- **severity**: Current severity level
|
||||
- **suggested_severity**: AI-suggested severity
|
||||
- **status**: Current status (Open, In Progress, Resolved, Closed)
|
||||
- **assigned_to**: Assigned user
|
||||
- **reporter**: User who reported the incident
|
||||
- **affected_users**: Number of affected users
|
||||
- **business_impact**: Business impact description
|
||||
- **ai_processed**: Whether AI analysis has been completed
|
||||
- **is_duplicate**: Whether this is a duplicate incident
|
||||
|
||||
### IncidentClassification
|
||||
- **incident**: Related incident
|
||||
- **predicted_category**: AI-predicted category
|
||||
- **predicted_subcategory**: AI-predicted subcategory
|
||||
- **confidence_score**: AI confidence (0.0-1.0)
|
||||
- **alternative_categories**: Alternative predictions
|
||||
- **extracted_keywords**: Keywords extracted from text
|
||||
- **sentiment_score**: Sentiment analysis score (-1 to 1)
|
||||
- **urgency_indicators**: Detected urgency indicators
|
||||
|
||||
### SeveritySuggestion
|
||||
- **incident**: Related incident
|
||||
- **suggested_severity**: AI-suggested severity
|
||||
- **confidence_score**: AI confidence (0.0-1.0)
|
||||
- **user_impact_score**: User impact score (0.0-1.0)
|
||||
- **business_impact_score**: Business impact score (0.0-1.0)
|
||||
- **technical_impact_score**: Technical impact score (0.0-1.0)
|
||||
- **reasoning**: AI explanation for suggestion
|
||||
- **impact_factors**: Factors that influenced the severity
|
||||
|
||||
### IncidentCorrelation
|
||||
- **primary_incident**: Primary incident in correlation
|
||||
- **related_incident**: Related incident
|
||||
- **correlation_type**: Type of correlation
|
||||
- **confidence_score**: Correlation confidence (0.0-1.0)
|
||||
- **correlation_strength**: Strength of correlation
|
||||
- **shared_keywords**: Keywords shared between incidents
|
||||
- **time_difference**: Time difference between incidents
|
||||
- **similarity_score**: Overall similarity score
|
||||
- **is_problem_indicator**: Whether this suggests a larger problem
|
||||
|
||||
### DuplicationDetection
|
||||
- **incident_a**: First incident in pair
|
||||
- **incident_b**: Second incident in pair
|
||||
- **duplication_type**: Type of duplication
|
||||
- **similarity_score**: Overall similarity score
|
||||
- **confidence_score**: Duplication confidence (0.0-1.0)
|
||||
- **text_similarity**: Text similarity score
|
||||
- **temporal_proximity**: Temporal proximity score
|
||||
- **service_similarity**: Service similarity score
|
||||
- **recommended_action**: Recommended action (Merge, Link, Review, No Action)
|
||||
- **status**: Current status (Detected, Reviewed, Merged, Rejected)
|
||||
|
||||
### IncidentPattern
|
||||
- **name**: Pattern name
|
||||
- **pattern_type**: Type of pattern (Recurring, Seasonal, Trend, Anomaly)
|
||||
- **description**: Pattern description
|
||||
- **frequency**: How often the pattern occurs
|
||||
- **affected_services**: Services affected by the pattern
|
||||
- **common_keywords**: Common keywords in pattern incidents
|
||||
- **incidents**: Related incidents
|
||||
- **confidence_score**: Pattern confidence (0.0-1.0)
|
||||
- **is_active**: Whether the pattern is active
|
||||
- **is_resolved**: Whether the pattern is resolved
|
||||
|
||||
## AI Components
|
||||
|
||||
### IncidentClassifier
|
||||
- **Categories**: Predefined categories with keywords
|
||||
- **Keyword Extraction**: Extracts relevant keywords from text
|
||||
- **Sentiment Analysis**: Analyzes sentiment of incident text
|
||||
- **Urgency Detection**: Identifies urgency indicators
|
||||
- **Confidence Scoring**: Provides confidence scores for classifications
|
||||
|
||||
### SeverityAnalyzer
|
||||
- **Impact Analysis**: Analyzes user, business, and technical impact
|
||||
- **Severity Indicators**: Identifies severity keywords in text
|
||||
- **Weighted Scoring**: Combines multiple factors for severity suggestion
|
||||
- **Reasoning Generation**: Provides explanations for severity suggestions
|
||||
|
||||
### IncidentCorrelationEngine
|
||||
- **Similarity Analysis**: Calculates various similarity metrics
|
||||
- **Temporal Analysis**: Considers time-based correlations
|
||||
- **Service Analysis**: Analyzes service and component similarities
|
||||
- **Problem Detection**: Identifies patterns that suggest larger problems
|
||||
- **Cluster Detection**: Groups related incidents into clusters
|
||||
|
||||
### DuplicationDetector
|
||||
- **Text Similarity**: Multiple text similarity algorithms
|
||||
- **Temporal Proximity**: Time-based duplication detection
|
||||
- **Service Similarity**: Service and component similarity
|
||||
- **Metadata Similarity**: Similarity based on incident metadata
|
||||
- **Merge Recommendations**: Suggests appropriate actions
|
||||
|
||||
## Background Processing
|
||||
|
||||
The module uses Celery for background processing of AI analysis:
|
||||
|
||||
### Tasks
|
||||
- **process_incident_ai**: Processes a single incident with AI analysis
|
||||
- **batch_process_incidents_ai**: Processes multiple incidents
|
||||
- **find_correlations**: Finds correlations for an incident
|
||||
- **find_duplicates**: Finds duplicates for an incident
|
||||
- **detect_all_duplicates**: Batch duplicate detection
|
||||
- **correlate_all_incidents**: Batch correlation analysis
|
||||
- **merge_duplicate_incidents**: Merges duplicate incidents
|
||||
|
||||
### Processing Logs
|
||||
All AI processing activities are logged in the `AIProcessingLog` model for audit and debugging purposes.
|
||||
|
||||
## Setup and Configuration
|
||||
|
||||
### 1. Install Dependencies
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Run Migrations
|
||||
```bash
|
||||
python manage.py makemigrations incident_intelligence
|
||||
python manage.py migrate
|
||||
```
|
||||
|
||||
### 3. Create Sample Data
|
||||
```bash
|
||||
python manage.py setup_incident_intelligence --create-sample-data --create-patterns
|
||||
```
|
||||
|
||||
### 4. Run AI Analysis
|
||||
```bash
|
||||
python manage.py setup_incident_intelligence --run-ai-analysis
|
||||
```
|
||||
|
||||
### 5. Start Celery Worker
|
||||
```bash
|
||||
celery -A core worker -l info
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Creating an Incident with AI Analysis
|
||||
```python
|
||||
from incident_intelligence.models import Incident
|
||||
from incident_intelligence.tasks import process_incident_ai
|
||||
|
||||
# Create incident
|
||||
incident = Incident.objects.create(
|
||||
title="API Response Slow",
|
||||
description="The user service API is responding slowly",
|
||||
free_text="API is slow, taking forever to respond",
|
||||
affected_users=50,
|
||||
business_impact="User experience is degraded"
|
||||
)
|
||||
|
||||
# Trigger AI analysis
|
||||
process_incident_ai.delay(incident.id)
|
||||
```
|
||||
|
||||
### Finding Correlations
|
||||
```python
|
||||
from incident_intelligence.ai.correlation import IncidentCorrelationEngine
|
||||
|
||||
engine = IncidentCorrelationEngine()
|
||||
correlations = engine.find_related_incidents(incident_data, all_incidents)
|
||||
```
|
||||
|
||||
### Detecting Duplicates
|
||||
```python
|
||||
from incident_intelligence.ai.duplication import DuplicationDetector
|
||||
|
||||
detector = DuplicationDetector()
|
||||
duplicates = detector.find_duplicate_candidates(incident_data, all_incidents)
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Batch Processing**: Use batch operations for large datasets
|
||||
- **Caching**: Consider caching frequently accessed data
|
||||
- **Indexing**: Database indexes are configured for optimal query performance
|
||||
- **Background Tasks**: AI processing runs asynchronously to avoid blocking requests
|
||||
- **Rate Limiting**: Consider implementing rate limiting for API endpoints
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Authentication**: All endpoints require authentication
|
||||
- **Authorization**: Users can only access incidents they have permission to view
|
||||
- **Data Privacy**: Sensitive information is handled according to data classification levels
|
||||
- **Audit Logging**: All AI processing activities are logged for audit purposes
|
||||
|
||||
## Monitoring and Maintenance
|
||||
|
||||
- **Processing Logs**: Monitor AI processing logs for errors and performance
|
||||
- **Model Performance**: Track AI model accuracy and update as needed
|
||||
- **Database Maintenance**: Regular cleanup of old processing logs and resolved incidents
|
||||
- **Health Checks**: Monitor Celery workers and Redis for background processing health
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- **Machine Learning Models**: Integration with more sophisticated ML models
|
||||
- **Real-time Processing**: Real-time incident analysis and correlation
|
||||
- **Advanced NLP**: More sophisticated natural language processing
|
||||
- **Predictive Analytics**: Predictive incident analysis and prevention
|
||||
- **Integration APIs**: APIs for integrating with external incident management systems
|
||||
Reference in New Issue
Block a user