gnx/ETB

Files

Iliyan Angelov 6b247e5b9f Updates

2025-09-19 11:58:53 +03:00

19 KiB

Raw Blame History

Knowledge & Learning API Documentation

Overview

The Knowledge & Learning module provides comprehensive functionality for automated postmortem generation, knowledge base management, and intelligent incident recommendations. This module helps organizations learn from incidents and build institutional knowledge to prevent future issues.

Features

Automated Postmortems: Generate postmortems automatically from incident data
Knowledge Base: Manage and search knowledge articles, runbooks, and troubleshooting guides
Recommendation Engine: Suggest similar incidents, solutions, and experts
Learning Patterns: Identify and track patterns from incident data
Usage Analytics: Track how knowledge is being used and its effectiveness

API Endpoints

Postmortems

List Postmortems

GET /api/knowledge/postmortems/

Query Parameters:

status: Filter by status (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)
severity: Filter by severity (LOW, MEDIUM, HIGH, CRITICAL)
is_automated: Filter by automation status (true/false)
owner: Filter by owner username
search: Search in title, executive_summary, root_cause_analysis
ordering: Order by created_at, updated_at, due_date, severity

Response:

{
  "count": 25,
  "next": "http://api.example.com/api/knowledge/postmortems/?page=2",
  "previous": null,
  "results": [
    {
      "id": "uuid",
      "title": "Postmortem: Database Outage",
      "incident": "uuid",
      "incident_title": "Database Connection Timeout",
      "status": "PUBLISHED",
      "severity": "HIGH",
      "owner_username": "john.doe",
      "completion_percentage": 95.0,
      "is_overdue": false,
      "created_at": "2024-01-15T10:30:00Z",
      "due_date": "2024-01-22T10:30:00Z"
    }
  ]
}

Get Postmortem Details

GET /api/knowledge/postmortems/{id}/

Response:

{
  "id": "uuid",
  "title": "Postmortem: Database Outage",
  "incident": "uuid",
  "incident_title": "Database Connection Timeout",
  "executive_summary": "On January 15, 2024, a high severity incident occurred...",
  "timeline": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "event": "Incident reported",
      "description": "Database connection timeout detected",
      "actor": "monitoring.system"
    }
  ],
  "root_cause_analysis": "The root cause was identified as...",
  "impact_assessment": "The incident affected 500 users...",
  "lessons_learned": "Key lessons learned include...",
  "action_items": [
    {
      "title": "Update database connection pool settings",
      "description": "Increase connection pool size to handle peak load",
      "priority": "HIGH",
      "assignee": "database.team",
      "due_date": "2024-01-29T00:00:00Z",
      "category": "Technical Improvement"
    }
  ],
  "is_automated": true,
  "generation_confidence": 0.85,
  "auto_generated_sections": ["executive_summary", "timeline", "root_cause_analysis"],
  "status": "PUBLISHED",
  "severity": "HIGH",
  "owner": "uuid",
  "owner_username": "john.doe",
  "reviewers": ["uuid1", "uuid2"],
  "reviewer_usernames": ["jane.smith", "bob.wilson"],
  "approver": "uuid",
  "approver_username": "alice.johnson",
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-16T14:20:00Z",
  "published_at": "2024-01-16T14:20:00Z",
  "due_date": "2024-01-22T10:30:00Z",
  "related_incidents": ["uuid1", "uuid2"],
  "affected_services": ["database", "api"],
  "affected_teams": ["database.team", "platform.team"],
  "completion_percentage": 95.0,
  "is_overdue": false
}

Create Postmortem

POST /api/knowledge/postmortems/

Request Body:

{
  "title": "Postmortem: New Incident",
  "incident": "uuid",
  "executive_summary": "Executive summary...",
  "severity": "HIGH",
  "owner": "uuid",
  "due_date": "2024-01-29T00:00:00Z"
}

Generate Automated Postmortem

POST /api/knowledge/postmortems/{id}/generate_automated/

Response:

{
  "message": "Postmortem generated successfully",
  "confidence_score": 0.85
}

Approve Postmortem

POST /api/knowledge/postmortems/{id}/approve/

Publish Postmortem

POST /api/knowledge/postmortems/{id}/publish/

Get Overdue Postmortems

GET /api/knowledge/postmortems/overdue/

Get Postmortem Statistics

GET /api/knowledge/postmortems/statistics/

Response:

{
  "total_postmortems": 150,
  "by_status": {
    "DRAFT": 25,
    "IN_REVIEW": 15,
    "APPROVED": 20,
    "PUBLISHED": 85,
    "ARCHIVED": 5
  },
  "by_severity": {
    "LOW": 10,
    "MEDIUM": 45,
    "HIGH": 70,
    "CRITICAL": 25
  },
  "automated_percentage": 75.5,
  "overdue_count": 8,
  "avg_completion_time": "5 days, 12:30:00"
}

Knowledge Base Articles

List Knowledge Base Articles

GET /api/knowledge/knowledge-articles/

Query Parameters:

article_type: Filter by type (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)
category: Filter by category
subcategory: Filter by subcategory
status: Filter by status (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)
is_featured: Filter featured articles (true/false)
difficulty_level: Filter by difficulty (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)
search: Search in title, content, summary, tags, search_keywords
ordering: Order by created_at, updated_at, view_count, title

Get Knowledge Base Article

GET /api/knowledge/knowledge-articles/{slug}/

Note: This endpoint automatically increments the view count.

Create Knowledge Base Article

POST /api/knowledge/knowledge-articles/

Request Body:

{
  "title": "Database Troubleshooting Guide",
  "content": "This guide covers common database issues...",
  "summary": "Comprehensive guide for database troubleshooting",
  "article_type": "TROUBLESHOOTING",
  "category": "Database",
  "subcategory": "Performance",
  "tags": ["database", "troubleshooting", "performance"],
  "difficulty_level": "INTERMEDIATE",
  "related_services": ["database", "api"],
  "related_components": ["postgresql", "connection-pool"]
}

Rate Knowledge Base Article

POST /api/knowledge/knowledge-articles/{slug}/rate/

Request Body:

{
  "rating": 4,
  "feedback": "Very helpful guide, saved me hours of debugging"
}

Bookmark Knowledge Base Article

POST /api/knowledge/knowledge-articles/{slug}/bookmark/

Search Knowledge Base

POST /api/knowledge/knowledge-articles/search/

Request Body:

{
  "query": "database connection timeout",
  "article_types": ["RUNBOOK", "TROUBLESHOOTING"],
  "categories": ["Database"],
  "difficulty_levels": ["INTERMEDIATE", "ADVANCED"],
  "limit": 20,
  "offset": 0
}

Response:

{
  "results": [
    {
      "id": "uuid",
      "title": "Database Connection Troubleshooting",
      "slug": "database-connection-troubleshooting",
      "summary": "Guide for resolving database connection issues",
      "article_type": "TROUBLESHOOTING",
      "category": "Database",
      "similarity_score": 0.85,
      "relevance_score": 0.92,
      "popularity_score": 0.75,
      "matching_keywords": ["database", "connection", "timeout"]
    }
  ],
  "total_count": 15,
  "query": "database connection timeout",
  "filters": {
    "article_types": ["RUNBOOK", "TROUBLESHOOTING"],
    "categories": ["Database"],
    "difficulty_levels": ["INTERMEDIATE", "ADVANCED"]
  },
  "pagination": {
    "limit": 20,
    "offset": 0,
    "has_more": false
  }
}

Get Articles Due for Review

GET /api/knowledge/knowledge-articles/due_for_review/

Get Popular Articles

GET /api/knowledge/knowledge-articles/popular/

Get Knowledge Base Statistics

GET /api/knowledge/knowledge-articles/statistics/

Incident Recommendations

List Recommendations

GET /api/knowledge/recommendations/

Query Parameters:

recommendation_type: Filter by type (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)
confidence_level: Filter by confidence (LOW, MEDIUM, HIGH, VERY_HIGH)
is_applied: Filter by application status (true/false)
incident: Filter by incident ID
search: Search in title, description, reasoning
ordering: Order by created_at, confidence_score, similarity_score

Get Recommendation Details

GET /api/knowledge/recommendations/{id}/

Apply Recommendation

POST /api/knowledge/recommendations/{id}/apply/

Rate Recommendation Effectiveness

POST /api/knowledge/recommendations/{id}/rate_effectiveness/

Request Body:

{
  "rating": 4
}

Generate Recommendations for Incident

POST /api/knowledge/recommendations/generate_for_incident/

Request Body:

{
  "incident_id": "uuid",
  "recommendation_types": ["SIMILAR_INCIDENT", "KNOWLEDGE_ARTICLE", "SOLUTION"],
  "max_recommendations": 5,
  "min_confidence": 0.6
}

Response:

{
  "message": "Recommendations generated successfully",
  "recommendations": [
    {
      "id": "uuid",
      "title": "Similar Incident: Database Timeout Issue",
      "type": "SIMILAR_INCIDENT",
      "confidence_score": 0.85,
      "similarity_score": 0.78
    }
  ]
}

Get Recommendation Statistics

GET /api/knowledge/recommendations/statistics/

Learning Patterns

List Learning Patterns

GET /api/knowledge/learning-patterns/

Query Parameters:

pattern_type: Filter by type (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)
is_validated: Filter by validation status (true/false)
search: Search in name, description, triggers, actions
ordering: Order by created_at, confidence_score, frequency, success_rate

Get Learning Pattern Details

GET /api/knowledge/learning-patterns/{id}/

Validate Learning Pattern

POST /api/knowledge/learning-patterns/{id}/validate/

Request Body:

{
  "validation_notes": "This pattern has been validated by the expert team"
}

Apply Learning Pattern

POST /api/knowledge/learning-patterns/{id}/apply/

Get Learning Pattern Statistics

GET /api/knowledge/learning-patterns/statistics/

Automated Postmortem Generation

List Generation Logs

GET /api/knowledge/postmortem-generations/

Query Parameters:

status: Filter by status (PENDING, PROCESSING, COMPLETED, FAILED, REVIEW_REQUIRED)
incident: Filter by incident ID
generation_trigger: Filter by trigger type
ordering: Order by started_at, completed_at, processing_time

Get Generation Log Details

GET /api/knowledge/postmortem-generations/{id}/

Generate Postmortem for Incident

POST /api/knowledge/postmortem-generations/generate_postmortem/

Request Body:

{
  "incident_id": "uuid",
  "include_timeline": true,
  "include_logs": true,
  "generation_trigger": "manual"
}

Response:

{
  "message": "Postmortem generation initiated",
  "generation_id": "uuid"
}

Data Models

Postmortem

id: UUID (Primary Key)
title: String (200 chars)
incident: Foreign Key to Incident
executive_summary: Text
timeline: JSON Array
root_cause_analysis: Text
impact_assessment: Text
lessons_learned: Text
action_items: JSON Array
is_automated: Boolean
generation_confidence: Float (0.0-1.0)
auto_generated_sections: JSON Array
status: Choice (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)
severity: Choice (LOW, MEDIUM, HIGH, CRITICAL)
owner: Foreign Key to User
reviewers: Many-to-Many to User
approver: Foreign Key to User
created_at: DateTime
updated_at: DateTime
published_at: DateTime
due_date: DateTime
related_incidents: Many-to-Many to Incident
affected_services: JSON Array
affected_teams: JSON Array

KnowledgeBaseArticle

id: UUID (Primary Key)
title: String (200 chars)
slug: Slug (Unique)
content: Text
summary: Text
tags: JSON Array
article_type: Choice (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)
category: String (100 chars)
subcategory: String (100 chars)
related_services: JSON Array
related_components: JSON Array
status: Choice (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)
is_featured: Boolean
view_count: Positive Integer
author: Foreign Key to User
last_updated_by: Foreign Key to User
maintainer: Foreign Key to User
created_at: DateTime
updated_at: DateTime
last_reviewed: DateTime
next_review_due: DateTime
related_incidents: Many-to-Many to Incident
source_postmortems: Many-to-Many to Postmortem
confluence_url: URL
wiki_url: URL
external_references: JSON Array
search_keywords: JSON Array
difficulty_level: Choice (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)

IncidentRecommendation

id: UUID (Primary Key)
incident: Foreign Key to Incident
recommendation_type: Choice (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)
title: String (200 chars)
description: Text
similarity_score: Float (0.0-1.0)
confidence_level: Choice (LOW, MEDIUM, HIGH, VERY_HIGH)
confidence_score: Float (0.0-1.0)
related_incident: Foreign Key to Incident
knowledge_article: Foreign Key to KnowledgeBaseArticle
suggested_expert: Foreign Key to User
suggested_actions: JSON Array
expected_outcome: Text
time_to_implement: Duration
is_applied: Boolean
applied_at: DateTime
applied_by: Foreign Key to User
effectiveness_rating: Positive Integer (1-5)
reasoning: Text
matching_factors: JSON Array
model_version: String (50 chars)
created_at: DateTime
updated_at: DateTime

LearningPattern

id: UUID (Primary Key)
name: String (200 chars)
pattern_type: Choice (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)
description: Text
frequency: Positive Integer
success_rate: Float (0.0-1.0)
confidence_score: Float (0.0-1.0)
triggers: JSON Array
actions: JSON Array
outcomes: JSON Array
source_incidents: Many-to-Many to Incident
source_postmortems: Many-to-Many to Postmortem
is_validated: Boolean
validated_by: Foreign Key to User
validation_notes: Text
times_applied: Positive Integer
last_applied: DateTime
created_at: DateTime
updated_at: DateTime

Management Commands

Generate Postmortems

python manage.py generate_postmortems --days 7 --severity HIGH --force

Options:

--days: Number of days back to look for resolved incidents (default: 7)
--severity: Only generate for specific severity levels
--force: Force generation even if postmortem exists
--dry-run: Show what would be generated without creating

Generate Recommendations

python manage.py generate_recommendations --days 1 --status OPEN --max-recommendations 5

Options:

--days: Number of days back to look for incidents (default: 1)
--status: Only generate for specific incident status (default: OPEN)
--severity: Only generate for specific severity levels
--force: Force generation even if recommendations exist
--max-recommendations: Maximum recommendations per incident (default: 5)
--dry-run: Show what would be generated without creating

Update Learning Patterns

python manage.py update_learning_patterns --days 30 --min-frequency 3

Options:

--days: Number of days back to analyze (default: 30)
--min-frequency: Minimum frequency to create pattern (default: 3)
--dry-run: Show what patterns would be created/updated

Integration Points

Incident Intelligence Module

Incident Model: Primary relationship for postmortems and recommendations
Incident Resolution: Triggers automatic postmortem generation
Incident Classification: Used for similarity matching in recommendations

Analytics & Predictive Insights Module

KPI Calculations: Postmortem completion rates, knowledge base usage
Pattern Detection: Integration with learning patterns for trend analysis
Predictive Models: Use learning patterns for incident prediction

Automation & Orchestration Module

Runbook Integration: Knowledge base articles can be linked to runbooks
Automated Actions: Postmortem action items can trigger automation workflows

Security Module

Access Control: Knowledge base articles respect data classification levels
Audit Logging: All knowledge base usage is tracked for compliance

Best Practices

Postmortem Management

Automated Generation: Enable automatic postmortem generation for high-severity incidents
Review Process: Implement a structured review and approval workflow
Action Item Tracking: Ensure action items are assigned and tracked to completion
Timeline Accuracy: Verify and enhance auto-generated timelines with human input

Knowledge Base Management

Content Quality: Regularly review and update knowledge base articles
Search Optimization: Use relevant tags and keywords for better discoverability
User Feedback: Collect and act on user ratings and feedback
Review Schedule: Set up regular review cycles for knowledge base articles

Recommendation Engine

Confidence Thresholds: Set appropriate confidence thresholds for different use cases
Feedback Loop: Collect effectiveness ratings to improve recommendation quality
Pattern Validation: Regularly validate learning patterns with subject matter experts
Continuous Learning: Update models based on new incident data and outcomes

Learning Patterns

Pattern Validation: Have experts validate patterns before they're used for recommendations
Success Tracking: Monitor the success rate of applied patterns
Pattern Evolution: Update patterns as new data becomes available
Knowledge Sharing: Share validated patterns across teams and organizations

Error Handling

The API returns appropriate HTTP status codes and error messages:

400 Bad Request: Invalid request data or parameters
401 Unauthorized: Authentication required
403 Forbidden: Insufficient permissions
404 Not Found: Resource not found
500 Internal Server Error: Server-side error

Error Response Format:

{
  "error": "Error message describing what went wrong",
  "details": "Additional error details if available",
  "code": "ERROR_CODE"
}

Rate Limiting

API endpoints are rate-limited to prevent abuse:

Read Operations: 1000 requests per hour per user
Write Operations: 100 requests per hour per user
Search Operations: 500 requests per hour per user

Authentication

All API endpoints require authentication using one of the following methods:

Token Authentication: Include Authorization: Token <token> header
Session Authentication: Use Django session authentication
SSO Authentication: Use configured SSO providers

Permissions

Read Access: All authenticated users can read published knowledge base articles
Write Access: Users need appropriate permissions to create/edit postmortems and articles
Admin Access: Only admin users can manage learning patterns and system settings
Data Classification: Access to sensitive content is controlled by data classification levels

19 KiB Raw Blame History

Knowledge & Learning API Documentation

Overview

Features

API Endpoints

Postmortems

List Postmortems

Get Postmortem Details

Create Postmortem

Generate Automated Postmortem

Approve Postmortem

Publish Postmortem

Get Overdue Postmortems

Get Postmortem Statistics

Knowledge Base Articles

List Knowledge Base Articles

Get Knowledge Base Article

Create Knowledge Base Article

Rate Knowledge Base Article

Bookmark Knowledge Base Article

Search Knowledge Base

Get Articles Due for Review

Get Popular Articles

Get Knowledge Base Statistics

Incident Recommendations

List Recommendations

Get Recommendation Details

Apply Recommendation

Rate Recommendation Effectiveness

Generate Recommendations for Incident

Get Recommendation Statistics

Learning Patterns

List Learning Patterns

Get Learning Pattern Details

Validate Learning Pattern

Apply Learning Pattern

Get Learning Pattern Statistics

Automated Postmortem Generation

List Generation Logs

Get Generation Log Details

Generate Postmortem for Incident

Data Models

Postmortem

KnowledgeBaseArticle

IncidentRecommendation

LearningPattern

Management Commands

Generate Postmortems

Generate Recommendations

Update Learning Patterns

Integration Points

Incident Intelligence Module

Analytics & Predictive Insights Module

Automation & Orchestration Module

Security Module

Best Practices

Postmortem Management

Knowledge Base Management

Recommendation Engine

Learning Patterns

Error Handling

Rate Limiting

Authentication

Permissions

19 KiB

Raw Blame History