ETB/ETB-API/incident_intelligence/Documentations/INCIDENT_INTELLIGENCE_API.md

# Incident Intelligence API Documentation

## Overview

The Incident Intelligence module provides AI-driven capabilities for incident management, including:

- **AI-driven incident classification** using NLP to categorize incidents from free text
- **Automated severity suggestion** based on impact analysis
- **Correlation engine** for linking related incidents and problem detection
- **Duplication detection** for merging incidents that describe the same outage

## Features

### 1. AI-Driven Incident Classification

Automatically classifies incidents into categories and subcategories based on their content:

- **Categories**: Infrastructure, Application, Security, User Experience, Data, Integration
- **Subcategories**: Specific types within each category (e.g., API_ISSUE, DATABASE_ISSUE)
- **Confidence Scoring**: AI confidence level for each classification
- **Keyword Extraction**: Identifies relevant keywords from incident text
- **Sentiment Analysis**: Analyzes the sentiment of incident descriptions
- **Urgency Detection**: Identifies urgency indicators in the text

### 2. Automated Severity Suggestion

Suggests incident severity based on multiple factors:

- **User Impact Analysis**: Number of affected users and impact level
- **Business Impact Assessment**: Revenue and operational impact
- **Technical Impact Evaluation**: System and infrastructure impact
- **Text Analysis**: Severity indicators in incident descriptions
- **Confidence Scoring**: AI confidence in severity suggestions

### 3. Correlation Engine

Links related incidents and detects patterns:

- **Correlation Types**: Same Service, Same Component, Temporal, Pattern Match, Dependency, Cascade
- **Problem Detection**: Identifies when correlations suggest larger problems
- **Time-based Analysis**: Considers temporal proximity of incidents
- **Service Similarity**: Analyzes shared services and components
- **Pattern Recognition**: Detects recurring issues and trends

### 4. Duplication Detection

Identifies and manages duplicate incidents:

- **Duplication Types**: Exact, Near Duplicate, Similar, Potential Duplicate
- **Similarity Analysis**: Text, temporal, and service similarity scoring
- **Merge Recommendations**: Suggests actions (Merge, Link, Review, No Action)
- **Confidence Scoring**: AI confidence in duplication detection
- **Shared Elements**: Identifies common elements between incidents

## API Endpoints

### Incidents

#### Create Incident
```http
POST /api/incidents/incidents/
Content-Type: application/json

{
    "title": "Database Connection Timeout",
    "description": "Users are experiencing timeouts when trying to access the database",
    "free_text": "Database is down, can't connect, getting timeout errors",
    "affected_users": 150,
    "business_impact": "Critical business operations are affected",
    "reporter": 1
}
```

#### Get Incident Analysis
```http
GET /api/incidents/incidents/{id}/analysis/
```

Returns comprehensive AI analysis including:
- Classification results
- Severity suggestions
- Correlations with other incidents
- Potential duplicates
- Associated patterns

#### Trigger AI Analysis
```http
POST /api/incidents/incidents/{id}/analyze/
```

Manually triggers AI analysis for a specific incident.

#### Get Incident Statistics
```http
GET /api/incidents/incidents/stats/
```

Returns statistics including:
- Total incidents by status and severity
- Average resolution time
- AI processing statistics
- Duplicate and correlation counts

### Correlations

#### Get Correlations
```http
GET /api/incidents/correlations/
```

#### Get Problem Indicators
```http
GET /api/incidents/correlations/problem_indicators/
```

Returns correlations that indicate larger problems.

### Duplications

#### Get Duplications
```http
GET /api/incidents/duplications/
```

#### Approve Merge
```http
POST /api/incidents/duplications/{id}/approve_merge/
```

#### Reject Merge
```http
POST /api/incidents/duplications/{id}/reject_merge/
```

### Patterns

#### Get Patterns
```http
GET /api/incidents/patterns/
```

#### Get Active Patterns
```http
GET /api/incidents/patterns/active_patterns/
```

#### Resolve Pattern
```http
POST /api/incidents/patterns/{id}/resolve_pattern/
```

## Data Models

### Incident
- **id**: UUID primary key
- **title**: Incident title
- **description**: Detailed description
- **free_text**: Original free text from user
- **category**: AI-classified category
- **subcategory**: AI-classified subcategory
- **severity**: Current severity level
- **suggested_severity**: AI-suggested severity
- **status**: Current status (Open, In Progress, Resolved, Closed)
- **assigned_to**: Assigned user
- **reporter**: User who reported the incident
- **affected_users**: Number of affected users
- **business_impact**: Business impact description
- **ai_processed**: Whether AI analysis has been completed
- **is_duplicate**: Whether this is a duplicate incident

### IncidentClassification
- **incident**: Related incident
- **predicted_category**: AI-predicted category
- **predicted_subcategory**: AI-predicted subcategory
- **confidence_score**: AI confidence (0.0-1.0)
- **alternative_categories**: Alternative predictions
- **extracted_keywords**: Keywords extracted from text
- **sentiment_score**: Sentiment analysis score (-1 to 1)
- **urgency_indicators**: Detected urgency indicators

### SeveritySuggestion
- **incident**: Related incident
- **suggested_severity**: AI-suggested severity
- **confidence_score**: AI confidence (0.0-1.0)
- **user_impact_score**: User impact score (0.0-1.0)
- **business_impact_score**: Business impact score (0.0-1.0)
- **technical_impact_score**: Technical impact score (0.0-1.0)
- **reasoning**: AI explanation for suggestion
- **impact_factors**: Factors that influenced the severity

### IncidentCorrelation
- **primary_incident**: Primary incident in correlation
- **related_incident**: Related incident
- **correlation_type**: Type of correlation
- **confidence_score**: Correlation confidence (0.0-1.0)
- **correlation_strength**: Strength of correlation
- **shared_keywords**: Keywords shared between incidents
- **time_difference**: Time difference between incidents
- **similarity_score**: Overall similarity score
- **is_problem_indicator**: Whether this suggests a larger problem

### DuplicationDetection
- **incident_a**: First incident in pair
- **incident_b**: Second incident in pair
- **duplication_type**: Type of duplication
- **similarity_score**: Overall similarity score
- **confidence_score**: Duplication confidence (0.0-1.0)
- **text_similarity**: Text similarity score
- **temporal_proximity**: Temporal proximity score
- **service_similarity**: Service similarity score
- **recommended_action**: Recommended action (Merge, Link, Review, No Action)
- **status**: Current status (Detected, Reviewed, Merged, Rejected)

### IncidentPattern
- **name**: Pattern name
- **pattern_type**: Type of pattern (Recurring, Seasonal, Trend, Anomaly)
- **description**: Pattern description
- **frequency**: How often the pattern occurs
- **affected_services**: Services affected by the pattern
- **common_keywords**: Common keywords in pattern incidents
- **incidents**: Related incidents
- **confidence_score**: Pattern confidence (0.0-1.0)
- **is_active**: Whether the pattern is active
- **is_resolved**: Whether the pattern is resolved

## AI Components

### IncidentClassifier
- **Categories**: Predefined categories with keywords
- **Keyword Extraction**: Extracts relevant keywords from text
- **Sentiment Analysis**: Analyzes sentiment of incident text
- **Urgency Detection**: Identifies urgency indicators
- **Confidence Scoring**: Provides confidence scores for classifications

### SeverityAnalyzer
- **Impact Analysis**: Analyzes user, business, and technical impact
- **Severity Indicators**: Identifies severity keywords in text
- **Weighted Scoring**: Combines multiple factors for severity suggestion
- **Reasoning Generation**: Provides explanations for severity suggestions

### IncidentCorrelationEngine
- **Similarity Analysis**: Calculates various similarity metrics
- **Temporal Analysis**: Considers time-based correlations
- **Service Analysis**: Analyzes service and component similarities
- **Problem Detection**: Identifies patterns that suggest larger problems
- **Cluster Detection**: Groups related incidents into clusters

### DuplicationDetector
- **Text Similarity**: Multiple text similarity algorithms
- **Temporal Proximity**: Time-based duplication detection
- **Service Similarity**: Service and component similarity
- **Metadata Similarity**: Similarity based on incident metadata
- **Merge Recommendations**: Suggests appropriate actions

## Background Processing

The module uses Celery for background processing of AI analysis:

### Tasks
- **process_incident_ai**: Processes a single incident with AI analysis
- **batch_process_incidents_ai**: Processes multiple incidents
- **find_correlations**: Finds correlations for an incident
- **find_duplicates**: Finds duplicates for an incident
- **detect_all_duplicates**: Batch duplicate detection
- **correlate_all_incidents**: Batch correlation analysis
- **merge_duplicate_incidents**: Merges duplicate incidents

### Processing Logs
All AI processing activities are logged in the `AIProcessingLog` model for audit and debugging purposes.

## Setup and Configuration

### 1. Install Dependencies
```bash
pip install -r requirements.txt
```

### 2. Run Migrations
```bash
python manage.py makemigrations incident_intelligence
python manage.py migrate
```

### 3. Create Sample Data
```bash
python manage.py setup_incident_intelligence --create-sample-data --create-patterns
```

### 4. Run AI Analysis
```bash
python manage.py setup_incident_intelligence --run-ai-analysis
```

### 5. Start Celery Worker
```bash
celery -A core worker -l info
```

## Usage Examples

### Creating an Incident with AI Analysis
```python
from incident_intelligence.models import Incident
from incident_intelligence.tasks import process_incident_ai

# Create incident
incident = Incident.objects.create(
    title="API Response Slow",
    description="The user service API is responding slowly",
    free_text="API is slow, taking forever to respond",
    affected_users=50,
    business_impact="User experience is degraded"
)

# Trigger AI analysis
process_incident_ai.delay(incident.id)
```

### Finding Correlations
```python
from incident_intelligence.ai.correlation import IncidentCorrelationEngine

engine = IncidentCorrelationEngine()
correlations = engine.find_related_incidents(incident_data, all_incidents)
```

### Detecting Duplicates
```python
from incident_intelligence.ai.duplication import DuplicationDetector

detector = DuplicationDetector()
duplicates = detector.find_duplicate_candidates(incident_data, all_incidents)
```

## Performance Considerations

- **Batch Processing**: Use batch operations for large datasets
- **Caching**: Consider caching frequently accessed data
- **Indexing**: Database indexes are configured for optimal query performance
- **Background Tasks**: AI processing runs asynchronously to avoid blocking requests
- **Rate Limiting**: Consider implementing rate limiting for API endpoints

## Security Considerations

- **Authentication**: All endpoints require authentication
- **Authorization**: Users can only access incidents they have permission to view
- **Data Privacy**: Sensitive information is handled according to data classification levels
- **Audit Logging**: All AI processing activities are logged for audit purposes

## Monitoring and Maintenance

- **Processing Logs**: Monitor AI processing logs for errors and performance
- **Model Performance**: Track AI model accuracy and update as needed
- **Database Maintenance**: Regular cleanup of old processing logs and resolved incidents
- **Health Checks**: Monitor Celery workers and Redis for background processing health

## Future Enhancements

- **Machine Learning Models**: Integration with more sophisticated ML models
- **Real-time Processing**: Real-time incident analysis and correlation
- **Advanced NLP**: More sophisticated natural language processing
- **Predictive Analytics**: Predictive incident analysis and prevention
- **Integration APIs**: APIs for integrating with external incident management systems