# Incident Intelligence API Documentation ## Overview The Incident Intelligence module provides AI-driven capabilities for incident management, including: - **AI-driven incident classification** using NLP to categorize incidents from free text - **Automated severity suggestion** based on impact analysis - **Correlation engine** for linking related incidents and problem detection - **Duplication detection** for merging incidents that describe the same outage ## Features ### 1. AI-Driven Incident Classification Automatically classifies incidents into categories and subcategories based on their content: - **Categories**: Infrastructure, Application, Security, User Experience, Data, Integration - **Subcategories**: Specific types within each category (e.g., API_ISSUE, DATABASE_ISSUE) - **Confidence Scoring**: AI confidence level for each classification - **Keyword Extraction**: Identifies relevant keywords from incident text - **Sentiment Analysis**: Analyzes the sentiment of incident descriptions - **Urgency Detection**: Identifies urgency indicators in the text ### 2. Automated Severity Suggestion Suggests incident severity based on multiple factors: - **User Impact Analysis**: Number of affected users and impact level - **Business Impact Assessment**: Revenue and operational impact - **Technical Impact Evaluation**: System and infrastructure impact - **Text Analysis**: Severity indicators in incident descriptions - **Confidence Scoring**: AI confidence in severity suggestions ### 3. Correlation Engine Links related incidents and detects patterns: - **Correlation Types**: Same Service, Same Component, Temporal, Pattern Match, Dependency, Cascade - **Problem Detection**: Identifies when correlations suggest larger problems - **Time-based Analysis**: Considers temporal proximity of incidents - **Service Similarity**: Analyzes shared services and components - **Pattern Recognition**: Detects recurring issues and trends ### 4. Duplication Detection Identifies and manages duplicate incidents: - **Duplication Types**: Exact, Near Duplicate, Similar, Potential Duplicate - **Similarity Analysis**: Text, temporal, and service similarity scoring - **Merge Recommendations**: Suggests actions (Merge, Link, Review, No Action) - **Confidence Scoring**: AI confidence in duplication detection - **Shared Elements**: Identifies common elements between incidents ## API Endpoints ### Incidents #### Create Incident ```http POST /api/incidents/incidents/ Content-Type: application/json { "title": "Database Connection Timeout", "description": "Users are experiencing timeouts when trying to access the database", "free_text": "Database is down, can't connect, getting timeout errors", "affected_users": 150, "business_impact": "Critical business operations are affected", "reporter": 1 } ``` #### Get Incident Analysis ```http GET /api/incidents/incidents/{id}/analysis/ ``` Returns comprehensive AI analysis including: - Classification results - Severity suggestions - Correlations with other incidents - Potential duplicates - Associated patterns #### Trigger AI Analysis ```http POST /api/incidents/incidents/{id}/analyze/ ``` Manually triggers AI analysis for a specific incident. #### Get Incident Statistics ```http GET /api/incidents/incidents/stats/ ``` Returns statistics including: - Total incidents by status and severity - Average resolution time - AI processing statistics - Duplicate and correlation counts ### Correlations #### Get Correlations ```http GET /api/incidents/correlations/ ``` #### Get Problem Indicators ```http GET /api/incidents/correlations/problem_indicators/ ``` Returns correlations that indicate larger problems. ### Duplications #### Get Duplications ```http GET /api/incidents/duplications/ ``` #### Approve Merge ```http POST /api/incidents/duplications/{id}/approve_merge/ ``` #### Reject Merge ```http POST /api/incidents/duplications/{id}/reject_merge/ ``` ### Patterns #### Get Patterns ```http GET /api/incidents/patterns/ ``` #### Get Active Patterns ```http GET /api/incidents/patterns/active_patterns/ ``` #### Resolve Pattern ```http POST /api/incidents/patterns/{id}/resolve_pattern/ ``` ## Data Models ### Incident - **id**: UUID primary key - **title**: Incident title - **description**: Detailed description - **free_text**: Original free text from user - **category**: AI-classified category - **subcategory**: AI-classified subcategory - **severity**: Current severity level - **suggested_severity**: AI-suggested severity - **status**: Current status (Open, In Progress, Resolved, Closed) - **assigned_to**: Assigned user - **reporter**: User who reported the incident - **affected_users**: Number of affected users - **business_impact**: Business impact description - **ai_processed**: Whether AI analysis has been completed - **is_duplicate**: Whether this is a duplicate incident ### IncidentClassification - **incident**: Related incident - **predicted_category**: AI-predicted category - **predicted_subcategory**: AI-predicted subcategory - **confidence_score**: AI confidence (0.0-1.0) - **alternative_categories**: Alternative predictions - **extracted_keywords**: Keywords extracted from text - **sentiment_score**: Sentiment analysis score (-1 to 1) - **urgency_indicators**: Detected urgency indicators ### SeveritySuggestion - **incident**: Related incident - **suggested_severity**: AI-suggested severity - **confidence_score**: AI confidence (0.0-1.0) - **user_impact_score**: User impact score (0.0-1.0) - **business_impact_score**: Business impact score (0.0-1.0) - **technical_impact_score**: Technical impact score (0.0-1.0) - **reasoning**: AI explanation for suggestion - **impact_factors**: Factors that influenced the severity ### IncidentCorrelation - **primary_incident**: Primary incident in correlation - **related_incident**: Related incident - **correlation_type**: Type of correlation - **confidence_score**: Correlation confidence (0.0-1.0) - **correlation_strength**: Strength of correlation - **shared_keywords**: Keywords shared between incidents - **time_difference**: Time difference between incidents - **similarity_score**: Overall similarity score - **is_problem_indicator**: Whether this suggests a larger problem ### DuplicationDetection - **incident_a**: First incident in pair - **incident_b**: Second incident in pair - **duplication_type**: Type of duplication - **similarity_score**: Overall similarity score - **confidence_score**: Duplication confidence (0.0-1.0) - **text_similarity**: Text similarity score - **temporal_proximity**: Temporal proximity score - **service_similarity**: Service similarity score - **recommended_action**: Recommended action (Merge, Link, Review, No Action) - **status**: Current status (Detected, Reviewed, Merged, Rejected) ### IncidentPattern - **name**: Pattern name - **pattern_type**: Type of pattern (Recurring, Seasonal, Trend, Anomaly) - **description**: Pattern description - **frequency**: How often the pattern occurs - **affected_services**: Services affected by the pattern - **common_keywords**: Common keywords in pattern incidents - **incidents**: Related incidents - **confidence_score**: Pattern confidence (0.0-1.0) - **is_active**: Whether the pattern is active - **is_resolved**: Whether the pattern is resolved ## AI Components ### IncidentClassifier - **Categories**: Predefined categories with keywords - **Keyword Extraction**: Extracts relevant keywords from text - **Sentiment Analysis**: Analyzes sentiment of incident text - **Urgency Detection**: Identifies urgency indicators - **Confidence Scoring**: Provides confidence scores for classifications ### SeverityAnalyzer - **Impact Analysis**: Analyzes user, business, and technical impact - **Severity Indicators**: Identifies severity keywords in text - **Weighted Scoring**: Combines multiple factors for severity suggestion - **Reasoning Generation**: Provides explanations for severity suggestions ### IncidentCorrelationEngine - **Similarity Analysis**: Calculates various similarity metrics - **Temporal Analysis**: Considers time-based correlations - **Service Analysis**: Analyzes service and component similarities - **Problem Detection**: Identifies patterns that suggest larger problems - **Cluster Detection**: Groups related incidents into clusters ### DuplicationDetector - **Text Similarity**: Multiple text similarity algorithms - **Temporal Proximity**: Time-based duplication detection - **Service Similarity**: Service and component similarity - **Metadata Similarity**: Similarity based on incident metadata - **Merge Recommendations**: Suggests appropriate actions ## Background Processing The module uses Celery for background processing of AI analysis: ### Tasks - **process_incident_ai**: Processes a single incident with AI analysis - **batch_process_incidents_ai**: Processes multiple incidents - **find_correlations**: Finds correlations for an incident - **find_duplicates**: Finds duplicates for an incident - **detect_all_duplicates**: Batch duplicate detection - **correlate_all_incidents**: Batch correlation analysis - **merge_duplicate_incidents**: Merges duplicate incidents ### Processing Logs All AI processing activities are logged in the `AIProcessingLog` model for audit and debugging purposes. ## Setup and Configuration ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Run Migrations ```bash python manage.py makemigrations incident_intelligence python manage.py migrate ``` ### 3. Create Sample Data ```bash python manage.py setup_incident_intelligence --create-sample-data --create-patterns ``` ### 4. Run AI Analysis ```bash python manage.py setup_incident_intelligence --run-ai-analysis ``` ### 5. Start Celery Worker ```bash celery -A core worker -l info ``` ## Usage Examples ### Creating an Incident with AI Analysis ```python from incident_intelligence.models import Incident from incident_intelligence.tasks import process_incident_ai # Create incident incident = Incident.objects.create( title="API Response Slow", description="The user service API is responding slowly", free_text="API is slow, taking forever to respond", affected_users=50, business_impact="User experience is degraded" ) # Trigger AI analysis process_incident_ai.delay(incident.id) ``` ### Finding Correlations ```python from incident_intelligence.ai.correlation import IncidentCorrelationEngine engine = IncidentCorrelationEngine() correlations = engine.find_related_incidents(incident_data, all_incidents) ``` ### Detecting Duplicates ```python from incident_intelligence.ai.duplication import DuplicationDetector detector = DuplicationDetector() duplicates = detector.find_duplicate_candidates(incident_data, all_incidents) ``` ## Performance Considerations - **Batch Processing**: Use batch operations for large datasets - **Caching**: Consider caching frequently accessed data - **Indexing**: Database indexes are configured for optimal query performance - **Background Tasks**: AI processing runs asynchronously to avoid blocking requests - **Rate Limiting**: Consider implementing rate limiting for API endpoints ## Security Considerations - **Authentication**: All endpoints require authentication - **Authorization**: Users can only access incidents they have permission to view - **Data Privacy**: Sensitive information is handled according to data classification levels - **Audit Logging**: All AI processing activities are logged for audit purposes ## Monitoring and Maintenance - **Processing Logs**: Monitor AI processing logs for errors and performance - **Model Performance**: Track AI model accuracy and update as needed - **Database Maintenance**: Regular cleanup of old processing logs and resolved incidents - **Health Checks**: Monitor Celery workers and Redis for background processing health ## Future Enhancements - **Machine Learning Models**: Integration with more sophisticated ML models - **Real-time Processing**: Real-time incident analysis and correlation - **Advanced NLP**: More sophisticated natural language processing - **Predictive Analytics**: Predictive incident analysis and prevention - **Integration APIs**: APIs for integrating with external incident management systems