Files
ETB/ETB-API/incident_intelligence/Documentations/INCIDENT_INTELLIGENCE_API.md
Iliyan Angelov 6b247e5b9f Updates
2025-09-19 11:58:53 +03:00

12 KiB

Incident Intelligence API Documentation

Overview

The Incident Intelligence module provides AI-driven capabilities for incident management, including:

  • AI-driven incident classification using NLP to categorize incidents from free text
  • Automated severity suggestion based on impact analysis
  • Correlation engine for linking related incidents and problem detection
  • Duplication detection for merging incidents that describe the same outage

Features

1. AI-Driven Incident Classification

Automatically classifies incidents into categories and subcategories based on their content:

  • Categories: Infrastructure, Application, Security, User Experience, Data, Integration
  • Subcategories: Specific types within each category (e.g., API_ISSUE, DATABASE_ISSUE)
  • Confidence Scoring: AI confidence level for each classification
  • Keyword Extraction: Identifies relevant keywords from incident text
  • Sentiment Analysis: Analyzes the sentiment of incident descriptions
  • Urgency Detection: Identifies urgency indicators in the text

2. Automated Severity Suggestion

Suggests incident severity based on multiple factors:

  • User Impact Analysis: Number of affected users and impact level
  • Business Impact Assessment: Revenue and operational impact
  • Technical Impact Evaluation: System and infrastructure impact
  • Text Analysis: Severity indicators in incident descriptions
  • Confidence Scoring: AI confidence in severity suggestions

3. Correlation Engine

Links related incidents and detects patterns:

  • Correlation Types: Same Service, Same Component, Temporal, Pattern Match, Dependency, Cascade
  • Problem Detection: Identifies when correlations suggest larger problems
  • Time-based Analysis: Considers temporal proximity of incidents
  • Service Similarity: Analyzes shared services and components
  • Pattern Recognition: Detects recurring issues and trends

4. Duplication Detection

Identifies and manages duplicate incidents:

  • Duplication Types: Exact, Near Duplicate, Similar, Potential Duplicate
  • Similarity Analysis: Text, temporal, and service similarity scoring
  • Merge Recommendations: Suggests actions (Merge, Link, Review, No Action)
  • Confidence Scoring: AI confidence in duplication detection
  • Shared Elements: Identifies common elements between incidents

API Endpoints

Incidents

Create Incident

POST /api/incidents/incidents/
Content-Type: application/json

{
    "title": "Database Connection Timeout",
    "description": "Users are experiencing timeouts when trying to access the database",
    "free_text": "Database is down, can't connect, getting timeout errors",
    "affected_users": 150,
    "business_impact": "Critical business operations are affected",
    "reporter": 1
}

Get Incident Analysis

GET /api/incidents/incidents/{id}/analysis/

Returns comprehensive AI analysis including:

  • Classification results
  • Severity suggestions
  • Correlations with other incidents
  • Potential duplicates
  • Associated patterns

Trigger AI Analysis

POST /api/incidents/incidents/{id}/analyze/

Manually triggers AI analysis for a specific incident.

Get Incident Statistics

GET /api/incidents/incidents/stats/

Returns statistics including:

  • Total incidents by status and severity
  • Average resolution time
  • AI processing statistics
  • Duplicate and correlation counts

Correlations

Get Correlations

GET /api/incidents/correlations/

Get Problem Indicators

GET /api/incidents/correlations/problem_indicators/

Returns correlations that indicate larger problems.

Duplications

Get Duplications

GET /api/incidents/duplications/

Approve Merge

POST /api/incidents/duplications/{id}/approve_merge/

Reject Merge

POST /api/incidents/duplications/{id}/reject_merge/

Patterns

Get Patterns

GET /api/incidents/patterns/

Get Active Patterns

GET /api/incidents/patterns/active_patterns/

Resolve Pattern

POST /api/incidents/patterns/{id}/resolve_pattern/

Data Models

Incident

  • id: UUID primary key
  • title: Incident title
  • description: Detailed description
  • free_text: Original free text from user
  • category: AI-classified category
  • subcategory: AI-classified subcategory
  • severity: Current severity level
  • suggested_severity: AI-suggested severity
  • status: Current status (Open, In Progress, Resolved, Closed)
  • assigned_to: Assigned user
  • reporter: User who reported the incident
  • affected_users: Number of affected users
  • business_impact: Business impact description
  • ai_processed: Whether AI analysis has been completed
  • is_duplicate: Whether this is a duplicate incident

IncidentClassification

  • incident: Related incident
  • predicted_category: AI-predicted category
  • predicted_subcategory: AI-predicted subcategory
  • confidence_score: AI confidence (0.0-1.0)
  • alternative_categories: Alternative predictions
  • extracted_keywords: Keywords extracted from text
  • sentiment_score: Sentiment analysis score (-1 to 1)
  • urgency_indicators: Detected urgency indicators

SeveritySuggestion

  • incident: Related incident
  • suggested_severity: AI-suggested severity
  • confidence_score: AI confidence (0.0-1.0)
  • user_impact_score: User impact score (0.0-1.0)
  • business_impact_score: Business impact score (0.0-1.0)
  • technical_impact_score: Technical impact score (0.0-1.0)
  • reasoning: AI explanation for suggestion
  • impact_factors: Factors that influenced the severity

IncidentCorrelation

  • primary_incident: Primary incident in correlation
  • related_incident: Related incident
  • correlation_type: Type of correlation
  • confidence_score: Correlation confidence (0.0-1.0)
  • correlation_strength: Strength of correlation
  • shared_keywords: Keywords shared between incidents
  • time_difference: Time difference between incidents
  • similarity_score: Overall similarity score
  • is_problem_indicator: Whether this suggests a larger problem

DuplicationDetection

  • incident_a: First incident in pair
  • incident_b: Second incident in pair
  • duplication_type: Type of duplication
  • similarity_score: Overall similarity score
  • confidence_score: Duplication confidence (0.0-1.0)
  • text_similarity: Text similarity score
  • temporal_proximity: Temporal proximity score
  • service_similarity: Service similarity score
  • recommended_action: Recommended action (Merge, Link, Review, No Action)
  • status: Current status (Detected, Reviewed, Merged, Rejected)

IncidentPattern

  • name: Pattern name
  • pattern_type: Type of pattern (Recurring, Seasonal, Trend, Anomaly)
  • description: Pattern description
  • frequency: How often the pattern occurs
  • affected_services: Services affected by the pattern
  • common_keywords: Common keywords in pattern incidents
  • incidents: Related incidents
  • confidence_score: Pattern confidence (0.0-1.0)
  • is_active: Whether the pattern is active
  • is_resolved: Whether the pattern is resolved

AI Components

IncidentClassifier

  • Categories: Predefined categories with keywords
  • Keyword Extraction: Extracts relevant keywords from text
  • Sentiment Analysis: Analyzes sentiment of incident text
  • Urgency Detection: Identifies urgency indicators
  • Confidence Scoring: Provides confidence scores for classifications

SeverityAnalyzer

  • Impact Analysis: Analyzes user, business, and technical impact
  • Severity Indicators: Identifies severity keywords in text
  • Weighted Scoring: Combines multiple factors for severity suggestion
  • Reasoning Generation: Provides explanations for severity suggestions

IncidentCorrelationEngine

  • Similarity Analysis: Calculates various similarity metrics
  • Temporal Analysis: Considers time-based correlations
  • Service Analysis: Analyzes service and component similarities
  • Problem Detection: Identifies patterns that suggest larger problems
  • Cluster Detection: Groups related incidents into clusters

DuplicationDetector

  • Text Similarity: Multiple text similarity algorithms
  • Temporal Proximity: Time-based duplication detection
  • Service Similarity: Service and component similarity
  • Metadata Similarity: Similarity based on incident metadata
  • Merge Recommendations: Suggests appropriate actions

Background Processing

The module uses Celery for background processing of AI analysis:

Tasks

  • process_incident_ai: Processes a single incident with AI analysis
  • batch_process_incidents_ai: Processes multiple incidents
  • find_correlations: Finds correlations for an incident
  • find_duplicates: Finds duplicates for an incident
  • detect_all_duplicates: Batch duplicate detection
  • correlate_all_incidents: Batch correlation analysis
  • merge_duplicate_incidents: Merges duplicate incidents

Processing Logs

All AI processing activities are logged in the AIProcessingLog model for audit and debugging purposes.

Setup and Configuration

1. Install Dependencies

pip install -r requirements.txt

2. Run Migrations

python manage.py makemigrations incident_intelligence
python manage.py migrate

3. Create Sample Data

python manage.py setup_incident_intelligence --create-sample-data --create-patterns

4. Run AI Analysis

python manage.py setup_incident_intelligence --run-ai-analysis

5. Start Celery Worker

celery -A core worker -l info

Usage Examples

Creating an Incident with AI Analysis

from incident_intelligence.models import Incident
from incident_intelligence.tasks import process_incident_ai

# Create incident
incident = Incident.objects.create(
    title="API Response Slow",
    description="The user service API is responding slowly",
    free_text="API is slow, taking forever to respond",
    affected_users=50,
    business_impact="User experience is degraded"
)

# Trigger AI analysis
process_incident_ai.delay(incident.id)

Finding Correlations

from incident_intelligence.ai.correlation import IncidentCorrelationEngine

engine = IncidentCorrelationEngine()
correlations = engine.find_related_incidents(incident_data, all_incidents)

Detecting Duplicates

from incident_intelligence.ai.duplication import DuplicationDetector

detector = DuplicationDetector()
duplicates = detector.find_duplicate_candidates(incident_data, all_incidents)

Performance Considerations

  • Batch Processing: Use batch operations for large datasets
  • Caching: Consider caching frequently accessed data
  • Indexing: Database indexes are configured for optimal query performance
  • Background Tasks: AI processing runs asynchronously to avoid blocking requests
  • Rate Limiting: Consider implementing rate limiting for API endpoints

Security Considerations

  • Authentication: All endpoints require authentication
  • Authorization: Users can only access incidents they have permission to view
  • Data Privacy: Sensitive information is handled according to data classification levels
  • Audit Logging: All AI processing activities are logged for audit purposes

Monitoring and Maintenance

  • Processing Logs: Monitor AI processing logs for errors and performance
  • Model Performance: Track AI model accuracy and update as needed
  • Database Maintenance: Regular cleanup of old processing logs and resolved incidents
  • Health Checks: Monitor Celery workers and Redis for background processing health

Future Enhancements

  • Machine Learning Models: Integration with more sophisticated ML models
  • Real-time Processing: Real-time incident analysis and correlation
  • Advanced NLP: More sophisticated natural language processing
  • Predictive Analytics: Predictive incident analysis and prevention
  • Integration APIs: APIs for integrating with external incident management systems