Files
ETB/ETB-API/knowledge_learning/Documentations/KNOWLEDGE_LEARNING_API.md
Iliyan Angelov 6b247e5b9f Updates
2025-09-19 11:58:53 +03:00

19 KiB

Knowledge & Learning API Documentation

Overview

The Knowledge & Learning module provides comprehensive functionality for automated postmortem generation, knowledge base management, and intelligent incident recommendations. This module helps organizations learn from incidents and build institutional knowledge to prevent future issues.

Features

  • Automated Postmortems: Generate postmortems automatically from incident data
  • Knowledge Base: Manage and search knowledge articles, runbooks, and troubleshooting guides
  • Recommendation Engine: Suggest similar incidents, solutions, and experts
  • Learning Patterns: Identify and track patterns from incident data
  • Usage Analytics: Track how knowledge is being used and its effectiveness

API Endpoints

Postmortems

List Postmortems

GET /api/knowledge/postmortems/

Query Parameters:

  • status: Filter by status (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)
  • severity: Filter by severity (LOW, MEDIUM, HIGH, CRITICAL)
  • is_automated: Filter by automation status (true/false)
  • owner: Filter by owner username
  • search: Search in title, executive_summary, root_cause_analysis
  • ordering: Order by created_at, updated_at, due_date, severity

Response:

{
  "count": 25,
  "next": "http://api.example.com/api/knowledge/postmortems/?page=2",
  "previous": null,
  "results": [
    {
      "id": "uuid",
      "title": "Postmortem: Database Outage",
      "incident": "uuid",
      "incident_title": "Database Connection Timeout",
      "status": "PUBLISHED",
      "severity": "HIGH",
      "owner_username": "john.doe",
      "completion_percentage": 95.0,
      "is_overdue": false,
      "created_at": "2024-01-15T10:30:00Z",
      "due_date": "2024-01-22T10:30:00Z"
    }
  ]
}

Get Postmortem Details

GET /api/knowledge/postmortems/{id}/

Response:

{
  "id": "uuid",
  "title": "Postmortem: Database Outage",
  "incident": "uuid",
  "incident_title": "Database Connection Timeout",
  "executive_summary": "On January 15, 2024, a high severity incident occurred...",
  "timeline": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "event": "Incident reported",
      "description": "Database connection timeout detected",
      "actor": "monitoring.system"
    }
  ],
  "root_cause_analysis": "The root cause was identified as...",
  "impact_assessment": "The incident affected 500 users...",
  "lessons_learned": "Key lessons learned include...",
  "action_items": [
    {
      "title": "Update database connection pool settings",
      "description": "Increase connection pool size to handle peak load",
      "priority": "HIGH",
      "assignee": "database.team",
      "due_date": "2024-01-29T00:00:00Z",
      "category": "Technical Improvement"
    }
  ],
  "is_automated": true,
  "generation_confidence": 0.85,
  "auto_generated_sections": ["executive_summary", "timeline", "root_cause_analysis"],
  "status": "PUBLISHED",
  "severity": "HIGH",
  "owner": "uuid",
  "owner_username": "john.doe",
  "reviewers": ["uuid1", "uuid2"],
  "reviewer_usernames": ["jane.smith", "bob.wilson"],
  "approver": "uuid",
  "approver_username": "alice.johnson",
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-16T14:20:00Z",
  "published_at": "2024-01-16T14:20:00Z",
  "due_date": "2024-01-22T10:30:00Z",
  "related_incidents": ["uuid1", "uuid2"],
  "affected_services": ["database", "api"],
  "affected_teams": ["database.team", "platform.team"],
  "completion_percentage": 95.0,
  "is_overdue": false
}

Create Postmortem

POST /api/knowledge/postmortems/

Request Body:

{
  "title": "Postmortem: New Incident",
  "incident": "uuid",
  "executive_summary": "Executive summary...",
  "severity": "HIGH",
  "owner": "uuid",
  "due_date": "2024-01-29T00:00:00Z"
}

Generate Automated Postmortem

POST /api/knowledge/postmortems/{id}/generate_automated/

Response:

{
  "message": "Postmortem generated successfully",
  "confidence_score": 0.85
}

Approve Postmortem

POST /api/knowledge/postmortems/{id}/approve/

Publish Postmortem

POST /api/knowledge/postmortems/{id}/publish/

Get Overdue Postmortems

GET /api/knowledge/postmortems/overdue/

Get Postmortem Statistics

GET /api/knowledge/postmortems/statistics/

Response:

{
  "total_postmortems": 150,
  "by_status": {
    "DRAFT": 25,
    "IN_REVIEW": 15,
    "APPROVED": 20,
    "PUBLISHED": 85,
    "ARCHIVED": 5
  },
  "by_severity": {
    "LOW": 10,
    "MEDIUM": 45,
    "HIGH": 70,
    "CRITICAL": 25
  },
  "automated_percentage": 75.5,
  "overdue_count": 8,
  "avg_completion_time": "5 days, 12:30:00"
}

Knowledge Base Articles

List Knowledge Base Articles

GET /api/knowledge/knowledge-articles/

Query Parameters:

  • article_type: Filter by type (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)
  • category: Filter by category
  • subcategory: Filter by subcategory
  • status: Filter by status (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)
  • is_featured: Filter featured articles (true/false)
  • difficulty_level: Filter by difficulty (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)
  • search: Search in title, content, summary, tags, search_keywords
  • ordering: Order by created_at, updated_at, view_count, title

Get Knowledge Base Article

GET /api/knowledge/knowledge-articles/{slug}/

Note: This endpoint automatically increments the view count.

Create Knowledge Base Article

POST /api/knowledge/knowledge-articles/

Request Body:

{
  "title": "Database Troubleshooting Guide",
  "content": "This guide covers common database issues...",
  "summary": "Comprehensive guide for database troubleshooting",
  "article_type": "TROUBLESHOOTING",
  "category": "Database",
  "subcategory": "Performance",
  "tags": ["database", "troubleshooting", "performance"],
  "difficulty_level": "INTERMEDIATE",
  "related_services": ["database", "api"],
  "related_components": ["postgresql", "connection-pool"]
}

Rate Knowledge Base Article

POST /api/knowledge/knowledge-articles/{slug}/rate/

Request Body:

{
  "rating": 4,
  "feedback": "Very helpful guide, saved me hours of debugging"
}

Bookmark Knowledge Base Article

POST /api/knowledge/knowledge-articles/{slug}/bookmark/

Search Knowledge Base

POST /api/knowledge/knowledge-articles/search/

Request Body:

{
  "query": "database connection timeout",
  "article_types": ["RUNBOOK", "TROUBLESHOOTING"],
  "categories": ["Database"],
  "difficulty_levels": ["INTERMEDIATE", "ADVANCED"],
  "limit": 20,
  "offset": 0
}

Response:

{
  "results": [
    {
      "id": "uuid",
      "title": "Database Connection Troubleshooting",
      "slug": "database-connection-troubleshooting",
      "summary": "Guide for resolving database connection issues",
      "article_type": "TROUBLESHOOTING",
      "category": "Database",
      "similarity_score": 0.85,
      "relevance_score": 0.92,
      "popularity_score": 0.75,
      "matching_keywords": ["database", "connection", "timeout"]
    }
  ],
  "total_count": 15,
  "query": "database connection timeout",
  "filters": {
    "article_types": ["RUNBOOK", "TROUBLESHOOTING"],
    "categories": ["Database"],
    "difficulty_levels": ["INTERMEDIATE", "ADVANCED"]
  },
  "pagination": {
    "limit": 20,
    "offset": 0,
    "has_more": false
  }
}

Get Articles Due for Review

GET /api/knowledge/knowledge-articles/due_for_review/
GET /api/knowledge/knowledge-articles/popular/

Get Knowledge Base Statistics

GET /api/knowledge/knowledge-articles/statistics/

Incident Recommendations

List Recommendations

GET /api/knowledge/recommendations/

Query Parameters:

  • recommendation_type: Filter by type (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)
  • confidence_level: Filter by confidence (LOW, MEDIUM, HIGH, VERY_HIGH)
  • is_applied: Filter by application status (true/false)
  • incident: Filter by incident ID
  • search: Search in title, description, reasoning
  • ordering: Order by created_at, confidence_score, similarity_score

Get Recommendation Details

GET /api/knowledge/recommendations/{id}/

Apply Recommendation

POST /api/knowledge/recommendations/{id}/apply/

Rate Recommendation Effectiveness

POST /api/knowledge/recommendations/{id}/rate_effectiveness/

Request Body:

{
  "rating": 4
}

Generate Recommendations for Incident

POST /api/knowledge/recommendations/generate_for_incident/

Request Body:

{
  "incident_id": "uuid",
  "recommendation_types": ["SIMILAR_INCIDENT", "KNOWLEDGE_ARTICLE", "SOLUTION"],
  "max_recommendations": 5,
  "min_confidence": 0.6
}

Response:

{
  "message": "Recommendations generated successfully",
  "recommendations": [
    {
      "id": "uuid",
      "title": "Similar Incident: Database Timeout Issue",
      "type": "SIMILAR_INCIDENT",
      "confidence_score": 0.85,
      "similarity_score": 0.78
    }
  ]
}

Get Recommendation Statistics

GET /api/knowledge/recommendations/statistics/

Learning Patterns

List Learning Patterns

GET /api/knowledge/learning-patterns/

Query Parameters:

  • pattern_type: Filter by type (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)
  • is_validated: Filter by validation status (true/false)
  • search: Search in name, description, triggers, actions
  • ordering: Order by created_at, confidence_score, frequency, success_rate

Get Learning Pattern Details

GET /api/knowledge/learning-patterns/{id}/

Validate Learning Pattern

POST /api/knowledge/learning-patterns/{id}/validate/

Request Body:

{
  "validation_notes": "This pattern has been validated by the expert team"
}

Apply Learning Pattern

POST /api/knowledge/learning-patterns/{id}/apply/

Get Learning Pattern Statistics

GET /api/knowledge/learning-patterns/statistics/

Automated Postmortem Generation

List Generation Logs

GET /api/knowledge/postmortem-generations/

Query Parameters:

  • status: Filter by status (PENDING, PROCESSING, COMPLETED, FAILED, REVIEW_REQUIRED)
  • incident: Filter by incident ID
  • generation_trigger: Filter by trigger type
  • ordering: Order by started_at, completed_at, processing_time

Get Generation Log Details

GET /api/knowledge/postmortem-generations/{id}/

Generate Postmortem for Incident

POST /api/knowledge/postmortem-generations/generate_postmortem/

Request Body:

{
  "incident_id": "uuid",
  "include_timeline": true,
  "include_logs": true,
  "generation_trigger": "manual"
}

Response:

{
  "message": "Postmortem generation initiated",
  "generation_id": "uuid"
}

Data Models

Postmortem

  • id: UUID (Primary Key)
  • title: String (200 chars)
  • incident: Foreign Key to Incident
  • executive_summary: Text
  • timeline: JSON Array
  • root_cause_analysis: Text
  • impact_assessment: Text
  • lessons_learned: Text
  • action_items: JSON Array
  • is_automated: Boolean
  • generation_confidence: Float (0.0-1.0)
  • auto_generated_sections: JSON Array
  • status: Choice (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)
  • severity: Choice (LOW, MEDIUM, HIGH, CRITICAL)
  • owner: Foreign Key to User
  • reviewers: Many-to-Many to User
  • approver: Foreign Key to User
  • created_at: DateTime
  • updated_at: DateTime
  • published_at: DateTime
  • due_date: DateTime
  • related_incidents: Many-to-Many to Incident
  • affected_services: JSON Array
  • affected_teams: JSON Array

KnowledgeBaseArticle

  • id: UUID (Primary Key)
  • title: String (200 chars)
  • slug: Slug (Unique)
  • content: Text
  • summary: Text
  • tags: JSON Array
  • article_type: Choice (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)
  • category: String (100 chars)
  • subcategory: String (100 chars)
  • related_services: JSON Array
  • related_components: JSON Array
  • status: Choice (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)
  • is_featured: Boolean
  • view_count: Positive Integer
  • author: Foreign Key to User
  • last_updated_by: Foreign Key to User
  • maintainer: Foreign Key to User
  • created_at: DateTime
  • updated_at: DateTime
  • last_reviewed: DateTime
  • next_review_due: DateTime
  • related_incidents: Many-to-Many to Incident
  • source_postmortems: Many-to-Many to Postmortem
  • confluence_url: URL
  • wiki_url: URL
  • external_references: JSON Array
  • search_keywords: JSON Array
  • difficulty_level: Choice (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)

IncidentRecommendation

  • id: UUID (Primary Key)
  • incident: Foreign Key to Incident
  • recommendation_type: Choice (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)
  • title: String (200 chars)
  • description: Text
  • similarity_score: Float (0.0-1.0)
  • confidence_level: Choice (LOW, MEDIUM, HIGH, VERY_HIGH)
  • confidence_score: Float (0.0-1.0)
  • related_incident: Foreign Key to Incident
  • knowledge_article: Foreign Key to KnowledgeBaseArticle
  • suggested_expert: Foreign Key to User
  • suggested_actions: JSON Array
  • expected_outcome: Text
  • time_to_implement: Duration
  • is_applied: Boolean
  • applied_at: DateTime
  • applied_by: Foreign Key to User
  • effectiveness_rating: Positive Integer (1-5)
  • reasoning: Text
  • matching_factors: JSON Array
  • model_version: String (50 chars)
  • created_at: DateTime
  • updated_at: DateTime

LearningPattern

  • id: UUID (Primary Key)
  • name: String (200 chars)
  • pattern_type: Choice (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)
  • description: Text
  • frequency: Positive Integer
  • success_rate: Float (0.0-1.0)
  • confidence_score: Float (0.0-1.0)
  • triggers: JSON Array
  • actions: JSON Array
  • outcomes: JSON Array
  • source_incidents: Many-to-Many to Incident
  • source_postmortems: Many-to-Many to Postmortem
  • is_validated: Boolean
  • validated_by: Foreign Key to User
  • validation_notes: Text
  • times_applied: Positive Integer
  • last_applied: DateTime
  • created_at: DateTime
  • updated_at: DateTime

Management Commands

Generate Postmortems

python manage.py generate_postmortems --days 7 --severity HIGH --force

Options:

  • --days: Number of days back to look for resolved incidents (default: 7)
  • --severity: Only generate for specific severity levels
  • --force: Force generation even if postmortem exists
  • --dry-run: Show what would be generated without creating

Generate Recommendations

python manage.py generate_recommendations --days 1 --status OPEN --max-recommendations 5

Options:

  • --days: Number of days back to look for incidents (default: 1)
  • --status: Only generate for specific incident status (default: OPEN)
  • --severity: Only generate for specific severity levels
  • --force: Force generation even if recommendations exist
  • --max-recommendations: Maximum recommendations per incident (default: 5)
  • --dry-run: Show what would be generated without creating

Update Learning Patterns

python manage.py update_learning_patterns --days 30 --min-frequency 3

Options:

  • --days: Number of days back to analyze (default: 30)
  • --min-frequency: Minimum frequency to create pattern (default: 3)
  • --dry-run: Show what patterns would be created/updated

Integration Points

Incident Intelligence Module

  • Incident Model: Primary relationship for postmortems and recommendations
  • Incident Resolution: Triggers automatic postmortem generation
  • Incident Classification: Used for similarity matching in recommendations

Analytics & Predictive Insights Module

  • KPI Calculations: Postmortem completion rates, knowledge base usage
  • Pattern Detection: Integration with learning patterns for trend analysis
  • Predictive Models: Use learning patterns for incident prediction

Automation & Orchestration Module

  • Runbook Integration: Knowledge base articles can be linked to runbooks
  • Automated Actions: Postmortem action items can trigger automation workflows

Security Module

  • Access Control: Knowledge base articles respect data classification levels
  • Audit Logging: All knowledge base usage is tracked for compliance

Best Practices

Postmortem Management

  1. Automated Generation: Enable automatic postmortem generation for high-severity incidents
  2. Review Process: Implement a structured review and approval workflow
  3. Action Item Tracking: Ensure action items are assigned and tracked to completion
  4. Timeline Accuracy: Verify and enhance auto-generated timelines with human input

Knowledge Base Management

  1. Content Quality: Regularly review and update knowledge base articles
  2. Search Optimization: Use relevant tags and keywords for better discoverability
  3. User Feedback: Collect and act on user ratings and feedback
  4. Review Schedule: Set up regular review cycles for knowledge base articles

Recommendation Engine

  1. Confidence Thresholds: Set appropriate confidence thresholds for different use cases
  2. Feedback Loop: Collect effectiveness ratings to improve recommendation quality
  3. Pattern Validation: Regularly validate learning patterns with subject matter experts
  4. Continuous Learning: Update models based on new incident data and outcomes

Learning Patterns

  1. Pattern Validation: Have experts validate patterns before they're used for recommendations
  2. Success Tracking: Monitor the success rate of applied patterns
  3. Pattern Evolution: Update patterns as new data becomes available
  4. Knowledge Sharing: Share validated patterns across teams and organizations

Error Handling

The API returns appropriate HTTP status codes and error messages:

  • 400 Bad Request: Invalid request data or parameters
  • 401 Unauthorized: Authentication required
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Resource not found
  • 500 Internal Server Error: Server-side error

Error Response Format:

{
  "error": "Error message describing what went wrong",
  "details": "Additional error details if available",
  "code": "ERROR_CODE"
}

Rate Limiting

API endpoints are rate-limited to prevent abuse:

  • Read Operations: 1000 requests per hour per user
  • Write Operations: 100 requests per hour per user
  • Search Operations: 500 requests per hour per user

Authentication

All API endpoints require authentication using one of the following methods:

  • Token Authentication: Include Authorization: Token <token> header
  • Session Authentication: Use Django session authentication
  • SSO Authentication: Use configured SSO providers

Permissions

  • Read Access: All authenticated users can read published knowledge base articles
  • Write Access: Users need appropriate permissions to create/edit postmortems and articles
  • Admin Access: Only admin users can manage learning patterns and system settings
  • Data Classification: Access to sensitive content is controlled by data classification levels