19 KiB
Knowledge & Learning API Documentation
Overview
The Knowledge & Learning module provides comprehensive functionality for automated postmortem generation, knowledge base management, and intelligent incident recommendations. This module helps organizations learn from incidents and build institutional knowledge to prevent future issues.
Features
- Automated Postmortems: Generate postmortems automatically from incident data
- Knowledge Base: Manage and search knowledge articles, runbooks, and troubleshooting guides
- Recommendation Engine: Suggest similar incidents, solutions, and experts
- Learning Patterns: Identify and track patterns from incident data
- Usage Analytics: Track how knowledge is being used and its effectiveness
API Endpoints
Postmortems
List Postmortems
GET /api/knowledge/postmortems/
Query Parameters:
status: Filter by status (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)severity: Filter by severity (LOW, MEDIUM, HIGH, CRITICAL)is_automated: Filter by automation status (true/false)owner: Filter by owner usernamesearch: Search in title, executive_summary, root_cause_analysisordering: Order by created_at, updated_at, due_date, severity
Response:
{
"count": 25,
"next": "http://api.example.com/api/knowledge/postmortems/?page=2",
"previous": null,
"results": [
{
"id": "uuid",
"title": "Postmortem: Database Outage",
"incident": "uuid",
"incident_title": "Database Connection Timeout",
"status": "PUBLISHED",
"severity": "HIGH",
"owner_username": "john.doe",
"completion_percentage": 95.0,
"is_overdue": false,
"created_at": "2024-01-15T10:30:00Z",
"due_date": "2024-01-22T10:30:00Z"
}
]
}
Get Postmortem Details
GET /api/knowledge/postmortems/{id}/
Response:
{
"id": "uuid",
"title": "Postmortem: Database Outage",
"incident": "uuid",
"incident_title": "Database Connection Timeout",
"executive_summary": "On January 15, 2024, a high severity incident occurred...",
"timeline": [
{
"timestamp": "2024-01-15T10:30:00Z",
"event": "Incident reported",
"description": "Database connection timeout detected",
"actor": "monitoring.system"
}
],
"root_cause_analysis": "The root cause was identified as...",
"impact_assessment": "The incident affected 500 users...",
"lessons_learned": "Key lessons learned include...",
"action_items": [
{
"title": "Update database connection pool settings",
"description": "Increase connection pool size to handle peak load",
"priority": "HIGH",
"assignee": "database.team",
"due_date": "2024-01-29T00:00:00Z",
"category": "Technical Improvement"
}
],
"is_automated": true,
"generation_confidence": 0.85,
"auto_generated_sections": ["executive_summary", "timeline", "root_cause_analysis"],
"status": "PUBLISHED",
"severity": "HIGH",
"owner": "uuid",
"owner_username": "john.doe",
"reviewers": ["uuid1", "uuid2"],
"reviewer_usernames": ["jane.smith", "bob.wilson"],
"approver": "uuid",
"approver_username": "alice.johnson",
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-16T14:20:00Z",
"published_at": "2024-01-16T14:20:00Z",
"due_date": "2024-01-22T10:30:00Z",
"related_incidents": ["uuid1", "uuid2"],
"affected_services": ["database", "api"],
"affected_teams": ["database.team", "platform.team"],
"completion_percentage": 95.0,
"is_overdue": false
}
Create Postmortem
POST /api/knowledge/postmortems/
Request Body:
{
"title": "Postmortem: New Incident",
"incident": "uuid",
"executive_summary": "Executive summary...",
"severity": "HIGH",
"owner": "uuid",
"due_date": "2024-01-29T00:00:00Z"
}
Generate Automated Postmortem
POST /api/knowledge/postmortems/{id}/generate_automated/
Response:
{
"message": "Postmortem generated successfully",
"confidence_score": 0.85
}
Approve Postmortem
POST /api/knowledge/postmortems/{id}/approve/
Publish Postmortem
POST /api/knowledge/postmortems/{id}/publish/
Get Overdue Postmortems
GET /api/knowledge/postmortems/overdue/
Get Postmortem Statistics
GET /api/knowledge/postmortems/statistics/
Response:
{
"total_postmortems": 150,
"by_status": {
"DRAFT": 25,
"IN_REVIEW": 15,
"APPROVED": 20,
"PUBLISHED": 85,
"ARCHIVED": 5
},
"by_severity": {
"LOW": 10,
"MEDIUM": 45,
"HIGH": 70,
"CRITICAL": 25
},
"automated_percentage": 75.5,
"overdue_count": 8,
"avg_completion_time": "5 days, 12:30:00"
}
Knowledge Base Articles
List Knowledge Base Articles
GET /api/knowledge/knowledge-articles/
Query Parameters:
article_type: Filter by type (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)category: Filter by categorysubcategory: Filter by subcategorystatus: Filter by status (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)is_featured: Filter featured articles (true/false)difficulty_level: Filter by difficulty (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)search: Search in title, content, summary, tags, search_keywordsordering: Order by created_at, updated_at, view_count, title
Get Knowledge Base Article
GET /api/knowledge/knowledge-articles/{slug}/
Note: This endpoint automatically increments the view count.
Create Knowledge Base Article
POST /api/knowledge/knowledge-articles/
Request Body:
{
"title": "Database Troubleshooting Guide",
"content": "This guide covers common database issues...",
"summary": "Comprehensive guide for database troubleshooting",
"article_type": "TROUBLESHOOTING",
"category": "Database",
"subcategory": "Performance",
"tags": ["database", "troubleshooting", "performance"],
"difficulty_level": "INTERMEDIATE",
"related_services": ["database", "api"],
"related_components": ["postgresql", "connection-pool"]
}
Rate Knowledge Base Article
POST /api/knowledge/knowledge-articles/{slug}/rate/
Request Body:
{
"rating": 4,
"feedback": "Very helpful guide, saved me hours of debugging"
}
Bookmark Knowledge Base Article
POST /api/knowledge/knowledge-articles/{slug}/bookmark/
Search Knowledge Base
POST /api/knowledge/knowledge-articles/search/
Request Body:
{
"query": "database connection timeout",
"article_types": ["RUNBOOK", "TROUBLESHOOTING"],
"categories": ["Database"],
"difficulty_levels": ["INTERMEDIATE", "ADVANCED"],
"limit": 20,
"offset": 0
}
Response:
{
"results": [
{
"id": "uuid",
"title": "Database Connection Troubleshooting",
"slug": "database-connection-troubleshooting",
"summary": "Guide for resolving database connection issues",
"article_type": "TROUBLESHOOTING",
"category": "Database",
"similarity_score": 0.85,
"relevance_score": 0.92,
"popularity_score": 0.75,
"matching_keywords": ["database", "connection", "timeout"]
}
],
"total_count": 15,
"query": "database connection timeout",
"filters": {
"article_types": ["RUNBOOK", "TROUBLESHOOTING"],
"categories": ["Database"],
"difficulty_levels": ["INTERMEDIATE", "ADVANCED"]
},
"pagination": {
"limit": 20,
"offset": 0,
"has_more": false
}
}
Get Articles Due for Review
GET /api/knowledge/knowledge-articles/due_for_review/
Get Popular Articles
GET /api/knowledge/knowledge-articles/popular/
Get Knowledge Base Statistics
GET /api/knowledge/knowledge-articles/statistics/
Incident Recommendations
List Recommendations
GET /api/knowledge/recommendations/
Query Parameters:
recommendation_type: Filter by type (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)confidence_level: Filter by confidence (LOW, MEDIUM, HIGH, VERY_HIGH)is_applied: Filter by application status (true/false)incident: Filter by incident IDsearch: Search in title, description, reasoningordering: Order by created_at, confidence_score, similarity_score
Get Recommendation Details
GET /api/knowledge/recommendations/{id}/
Apply Recommendation
POST /api/knowledge/recommendations/{id}/apply/
Rate Recommendation Effectiveness
POST /api/knowledge/recommendations/{id}/rate_effectiveness/
Request Body:
{
"rating": 4
}
Generate Recommendations for Incident
POST /api/knowledge/recommendations/generate_for_incident/
Request Body:
{
"incident_id": "uuid",
"recommendation_types": ["SIMILAR_INCIDENT", "KNOWLEDGE_ARTICLE", "SOLUTION"],
"max_recommendations": 5,
"min_confidence": 0.6
}
Response:
{
"message": "Recommendations generated successfully",
"recommendations": [
{
"id": "uuid",
"title": "Similar Incident: Database Timeout Issue",
"type": "SIMILAR_INCIDENT",
"confidence_score": 0.85,
"similarity_score": 0.78
}
]
}
Get Recommendation Statistics
GET /api/knowledge/recommendations/statistics/
Learning Patterns
List Learning Patterns
GET /api/knowledge/learning-patterns/
Query Parameters:
pattern_type: Filter by type (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)is_validated: Filter by validation status (true/false)search: Search in name, description, triggers, actionsordering: Order by created_at, confidence_score, frequency, success_rate
Get Learning Pattern Details
GET /api/knowledge/learning-patterns/{id}/
Validate Learning Pattern
POST /api/knowledge/learning-patterns/{id}/validate/
Request Body:
{
"validation_notes": "This pattern has been validated by the expert team"
}
Apply Learning Pattern
POST /api/knowledge/learning-patterns/{id}/apply/
Get Learning Pattern Statistics
GET /api/knowledge/learning-patterns/statistics/
Automated Postmortem Generation
List Generation Logs
GET /api/knowledge/postmortem-generations/
Query Parameters:
status: Filter by status (PENDING, PROCESSING, COMPLETED, FAILED, REVIEW_REQUIRED)incident: Filter by incident IDgeneration_trigger: Filter by trigger typeordering: Order by started_at, completed_at, processing_time
Get Generation Log Details
GET /api/knowledge/postmortem-generations/{id}/
Generate Postmortem for Incident
POST /api/knowledge/postmortem-generations/generate_postmortem/
Request Body:
{
"incident_id": "uuid",
"include_timeline": true,
"include_logs": true,
"generation_trigger": "manual"
}
Response:
{
"message": "Postmortem generation initiated",
"generation_id": "uuid"
}
Data Models
Postmortem
- id: UUID (Primary Key)
- title: String (200 chars)
- incident: Foreign Key to Incident
- executive_summary: Text
- timeline: JSON Array
- root_cause_analysis: Text
- impact_assessment: Text
- lessons_learned: Text
- action_items: JSON Array
- is_automated: Boolean
- generation_confidence: Float (0.0-1.0)
- auto_generated_sections: JSON Array
- status: Choice (DRAFT, IN_REVIEW, APPROVED, PUBLISHED, ARCHIVED)
- severity: Choice (LOW, MEDIUM, HIGH, CRITICAL)
- owner: Foreign Key to User
- reviewers: Many-to-Many to User
- approver: Foreign Key to User
- created_at: DateTime
- updated_at: DateTime
- published_at: DateTime
- due_date: DateTime
- related_incidents: Many-to-Many to Incident
- affected_services: JSON Array
- affected_teams: JSON Array
KnowledgeBaseArticle
- id: UUID (Primary Key)
- title: String (200 chars)
- slug: Slug (Unique)
- content: Text
- summary: Text
- tags: JSON Array
- article_type: Choice (RUNBOOK, TROUBLESHOOTING, BEST_PRACTICE, etc.)
- category: String (100 chars)
- subcategory: String (100 chars)
- related_services: JSON Array
- related_components: JSON Array
- status: Choice (DRAFT, REVIEW, APPROVED, PUBLISHED, DEPRECATED)
- is_featured: Boolean
- view_count: Positive Integer
- author: Foreign Key to User
- last_updated_by: Foreign Key to User
- maintainer: Foreign Key to User
- created_at: DateTime
- updated_at: DateTime
- last_reviewed: DateTime
- next_review_due: DateTime
- related_incidents: Many-to-Many to Incident
- source_postmortems: Many-to-Many to Postmortem
- confluence_url: URL
- wiki_url: URL
- external_references: JSON Array
- search_keywords: JSON Array
- difficulty_level: Choice (BEGINNER, INTERMEDIATE, ADVANCED, EXPERT)
IncidentRecommendation
- id: UUID (Primary Key)
- incident: Foreign Key to Incident
- recommendation_type: Choice (SIMILAR_INCIDENT, SOLUTION, KNOWLEDGE_ARTICLE, etc.)
- title: String (200 chars)
- description: Text
- similarity_score: Float (0.0-1.0)
- confidence_level: Choice (LOW, MEDIUM, HIGH, VERY_HIGH)
- confidence_score: Float (0.0-1.0)
- related_incident: Foreign Key to Incident
- knowledge_article: Foreign Key to KnowledgeBaseArticle
- suggested_expert: Foreign Key to User
- suggested_actions: JSON Array
- expected_outcome: Text
- time_to_implement: Duration
- is_applied: Boolean
- applied_at: DateTime
- applied_by: Foreign Key to User
- effectiveness_rating: Positive Integer (1-5)
- reasoning: Text
- matching_factors: JSON Array
- model_version: String (50 chars)
- created_at: DateTime
- updated_at: DateTime
LearningPattern
- id: UUID (Primary Key)
- name: String (200 chars)
- pattern_type: Choice (ROOT_CAUSE, RESOLUTION, PREVENTION, etc.)
- description: Text
- frequency: Positive Integer
- success_rate: Float (0.0-1.0)
- confidence_score: Float (0.0-1.0)
- triggers: JSON Array
- actions: JSON Array
- outcomes: JSON Array
- source_incidents: Many-to-Many to Incident
- source_postmortems: Many-to-Many to Postmortem
- is_validated: Boolean
- validated_by: Foreign Key to User
- validation_notes: Text
- times_applied: Positive Integer
- last_applied: DateTime
- created_at: DateTime
- updated_at: DateTime
Management Commands
Generate Postmortems
python manage.py generate_postmortems --days 7 --severity HIGH --force
Options:
--days: Number of days back to look for resolved incidents (default: 7)--severity: Only generate for specific severity levels--force: Force generation even if postmortem exists--dry-run: Show what would be generated without creating
Generate Recommendations
python manage.py generate_recommendations --days 1 --status OPEN --max-recommendations 5
Options:
--days: Number of days back to look for incidents (default: 1)--status: Only generate for specific incident status (default: OPEN)--severity: Only generate for specific severity levels--force: Force generation even if recommendations exist--max-recommendations: Maximum recommendations per incident (default: 5)--dry-run: Show what would be generated without creating
Update Learning Patterns
python manage.py update_learning_patterns --days 30 --min-frequency 3
Options:
--days: Number of days back to analyze (default: 30)--min-frequency: Minimum frequency to create pattern (default: 3)--dry-run: Show what patterns would be created/updated
Integration Points
Incident Intelligence Module
- Incident Model: Primary relationship for postmortems and recommendations
- Incident Resolution: Triggers automatic postmortem generation
- Incident Classification: Used for similarity matching in recommendations
Analytics & Predictive Insights Module
- KPI Calculations: Postmortem completion rates, knowledge base usage
- Pattern Detection: Integration with learning patterns for trend analysis
- Predictive Models: Use learning patterns for incident prediction
Automation & Orchestration Module
- Runbook Integration: Knowledge base articles can be linked to runbooks
- Automated Actions: Postmortem action items can trigger automation workflows
Security Module
- Access Control: Knowledge base articles respect data classification levels
- Audit Logging: All knowledge base usage is tracked for compliance
Best Practices
Postmortem Management
- Automated Generation: Enable automatic postmortem generation for high-severity incidents
- Review Process: Implement a structured review and approval workflow
- Action Item Tracking: Ensure action items are assigned and tracked to completion
- Timeline Accuracy: Verify and enhance auto-generated timelines with human input
Knowledge Base Management
- Content Quality: Regularly review and update knowledge base articles
- Search Optimization: Use relevant tags and keywords for better discoverability
- User Feedback: Collect and act on user ratings and feedback
- Review Schedule: Set up regular review cycles for knowledge base articles
Recommendation Engine
- Confidence Thresholds: Set appropriate confidence thresholds for different use cases
- Feedback Loop: Collect effectiveness ratings to improve recommendation quality
- Pattern Validation: Regularly validate learning patterns with subject matter experts
- Continuous Learning: Update models based on new incident data and outcomes
Learning Patterns
- Pattern Validation: Have experts validate patterns before they're used for recommendations
- Success Tracking: Monitor the success rate of applied patterns
- Pattern Evolution: Update patterns as new data becomes available
- Knowledge Sharing: Share validated patterns across teams and organizations
Error Handling
The API returns appropriate HTTP status codes and error messages:
- 400 Bad Request: Invalid request data or parameters
- 401 Unauthorized: Authentication required
- 403 Forbidden: Insufficient permissions
- 404 Not Found: Resource not found
- 500 Internal Server Error: Server-side error
Error Response Format:
{
"error": "Error message describing what went wrong",
"details": "Additional error details if available",
"code": "ERROR_CODE"
}
Rate Limiting
API endpoints are rate-limited to prevent abuse:
- Read Operations: 1000 requests per hour per user
- Write Operations: 100 requests per hour per user
- Search Operations: 500 requests per hour per user
Authentication
All API endpoints require authentication using one of the following methods:
- Token Authentication: Include
Authorization: Token <token>header - Session Authentication: Use Django session authentication
- SSO Authentication: Use configured SSO providers
Permissions
- Read Access: All authenticated users can read published knowledge base articles
- Write Access: Users need appropriate permissions to create/edit postmortems and articles
- Admin Access: Only admin users can manage learning patterns and system settings
- Data Classification: Access to sensitive content is controlled by data classification levels