Files
ETB/ETB-API/MONITORING_DEPLOYMENT_GUIDE.md
Iliyan Angelov 6b247e5b9f Updates
2025-09-19 11:58:53 +03:00

11 KiB

ETB-API Monitoring System Deployment Guide

Overview

This guide provides step-by-step instructions for deploying the comprehensive monitoring system for your ETB-API platform. The monitoring system provides enterprise-grade observability across all modules.

Prerequisites

System Requirements

  • Python 3.8+
  • Django 5.2+
  • PostgreSQL 12+ (recommended) or SQLite (development)
  • Redis 6+ (for Celery)
  • Celery 5.3+

Dependencies

  • psutil>=5.9.0
  • requests>=2.31.0
  • celery>=5.3.0
  • redis>=4.5.0

Installation Steps

1. Install Dependencies

# Install Python dependencies
pip install psutil>=5.9.0 requests>=2.31.0

# Install Redis (Ubuntu/Debian)
sudo apt-get install redis-server

# Install Redis (CentOS/RHEL)
sudo yum install redis

2. Database Setup

# Create and run migrations
python manage.py makemigrations monitoring
python manage.py migrate

# Create superuser (if not exists)
python manage.py createsuperuser

3. Initialize Monitoring Configuration

# Set up default monitoring targets, metrics, and alert rules
python manage.py setup_monitoring --admin-user admin

4. Configure Celery

Create or update celery.py in your project:

from celery import Celery
from django.conf import settings
import os

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings')

app = Celery('core')
app.config_from_object('django.conf:settings', namespace='CELERY')

# Add monitoring tasks schedule
app.conf.beat_schedule = {
    'health-checks': {
        'task': 'monitoring.tasks.execute_health_checks',
        'schedule': 60.0,  # Every minute
    },
    'metrics-collection': {
        'task': 'monitoring.tasks.collect_metrics',
        'schedule': 300.0,  # Every 5 minutes
    },
    'alert-evaluation': {
        'task': 'monitoring.tasks.evaluate_alerts',
        'schedule': 60.0,  # Every minute
    },
    'data-cleanup': {
        'task': 'monitoring.tasks.cleanup_old_data',
        'schedule': 86400.0,  # Daily
    },
    'system-status-report': {
        'task': 'monitoring.tasks.generate_system_status_report',
        'schedule': 300.0,  # Every 5 minutes
    },
}

app.autodiscover_tasks()

5. Environment Configuration

Create .env file or set environment variables:

# Monitoring Settings
MONITORING_ENABLED=true
MONITORING_HEALTH_CHECK_INTERVAL=60
MONITORING_METRICS_COLLECTION_INTERVAL=300
MONITORING_ALERT_EVALUATION_INTERVAL=60

# Alerting Settings
ALERTING_EMAIL_FROM=monitoring@yourcompany.com
ALERTING_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
ALERTING_WEBHOOK_URL=https://your-webhook-url.com/alerts

# Performance Thresholds
PERFORMANCE_API_RESPONSE_THRESHOLD=2000
PERFORMANCE_CPU_THRESHOLD=80
PERFORMANCE_MEMORY_THRESHOLD=80
PERFORMANCE_DISK_THRESHOLD=80

# Email Configuration (for alerts)
EMAIL_HOST=smtp.gmail.com
EMAIL_PORT=587
EMAIL_USE_TLS=True
EMAIL_HOST_USER=your-email@gmail.com
EMAIL_HOST_PASSWORD=your-app-password
DEFAULT_FROM_EMAIL=monitoring@yourcompany.com

6. Start Services

# Start Django development server
python manage.py runserver

# Start Celery worker (in separate terminal)
celery -A core worker -l info

# Start Celery beat scheduler (in separate terminal)
celery -A core beat -l info

# Start Redis (if not running as service)
redis-server

Production Deployment

1. Database Configuration

For production, use PostgreSQL:

# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'etb_api_monitoring',
        'USER': 'monitoring_user',
        'PASSWORD': 'secure_password',
        'HOST': 'localhost',
        'PORT': '5432',
    }
}

2. Redis Configuration

# settings.py
CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'

3. Static Files and Media

# Collect static files
python manage.py collectstatic

# Configure web server (Nginx example)
server {
    listen 80;
    server_name your-domain.com;
    
    location /static/ {
        alias /path/to/your/static/files/;
    }
    
    location /media/ {
        alias /path/to/your/media/files/;
    }
    
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

4. Process Management

Use systemd services for production:

Django Service (/etc/systemd/system/etb-api.service):

[Unit]
Description=ETB-API Django Application
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/path/to/etb-api
Environment=PATH=/path/to/etb-api/venv/bin
ExecStart=/path/to/etb-api/venv/bin/python manage.py runserver 0.0.0.0:8000
Restart=always

[Install]
WantedBy=multi-user.target

Celery Worker Service (/etc/systemd/system/etb-celery.service):

[Unit]
Description=ETB-API Celery Worker
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/path/to/etb-api
Environment=PATH=/path/to/etb-api/venv/bin
ExecStart=/path/to/etb-api/venv/bin/celery -A core worker -l info
Restart=always

[Install]
WantedBy=multi-user.target

Celery Beat Service (/etc/systemd/system/etb-celery-beat.service):

[Unit]
Description=ETB-API Celery Beat Scheduler
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/path/to/etb-api
Environment=PATH=/path/to/etb-api/venv/bin
ExecStart=/path/to/etb-api/venv/bin/celery -A core beat -l info
Restart=always

[Install]
WantedBy=multi-user.target

5. Enable Services

# Enable and start services
sudo systemctl enable etb-api
sudo systemctl enable etb-celery
sudo systemctl enable etb-celery-beat
sudo systemctl enable redis

sudo systemctl start etb-api
sudo systemctl start etb-celery
sudo systemctl start etb-celery-beat
sudo systemctl start redis

Monitoring Configuration

1. Customize Monitoring Targets

Access the admin interface at http://your-domain.com/admin/monitoring/ to:

  • Add custom monitoring targets
  • Configure health check intervals
  • Set up external service monitoring
  • Customize alert thresholds

2. Configure Alert Rules

Create alert rules for:

  • Performance Alerts: High response times, error rates
  • Business Alerts: SLA breaches, incident volume spikes
  • Security Alerts: Failed logins, security events
  • Infrastructure Alerts: High CPU, memory, disk usage

3. Set Up Notification Channels

Configure notification channels:

  • Email: Set up SMTP configuration
  • Slack: Configure webhook URLs
  • Webhooks: Set up external alerting systems
  • PagerDuty: Integrate with incident management

4. Create Custom Dashboards

Design dashboards for different user roles:

  • Executive Dashboard: High-level KPIs and trends
  • Operations Dashboard: Real-time system status
  • Security Dashboard: Security metrics and alerts
  • Development Dashboard: Application performance metrics

Verification

1. Check System Health

# Check health check summary
curl -H "Authorization: Token your-token" \
     http://localhost:8000/api/monitoring/health-checks/summary/

# Check system overview
curl -H "Authorization: Token your-token" \
     http://localhost:8000/api/monitoring/overview/

2. Verify Celery Tasks

# Check Celery worker status
celery -A core inspect active

# Check scheduled tasks
celery -A core inspect scheduled

3. Test Alerting

# Trigger a test alert
python manage.py shell
>>> from monitoring.models import AlertRule
>>> rule = AlertRule.objects.first()
>>> # Manually trigger alert for testing

Maintenance

1. Data Cleanup

The system automatically cleans up old data, but you can manually run:

python manage.py shell
>>> from monitoring.tasks import cleanup_old_data
>>> cleanup_old_data.delay()

2. Performance Tuning

Monitor and tune:

  • Health check intervals
  • Metrics collection frequency
  • Alert evaluation intervals
  • Data retention periods

3. Scaling

For high-volume environments:

  • Use multiple Celery workers
  • Implement Redis clustering
  • Use database read replicas
  • Consider time-series databases for metrics

Troubleshooting

Common Issues

  1. Health Checks Failing

    # Check logs
    tail -f /var/log/etb-api.log
    
    # Test individual targets
    python manage.py shell
    >>> from monitoring.services.health_checks import HealthCheckService
    >>> service = HealthCheckService()
    >>> service.execute_all_health_checks()
    
  2. Celery Tasks Not Running

    # Check Celery status
    celery -A core inspect active
    
    # Check Redis connection
    redis-cli ping
    
    # Restart services
    sudo systemctl restart etb-celery
    sudo systemctl restart etb-celery-beat
    
  3. Alerts Not Sending

    # Check email configuration
    python manage.py shell
    >>> from django.core.mail import send_mail
    >>> send_mail('Test', 'Test message', 'from@example.com', ['to@example.com'])
    
    # Check Slack webhook
    curl -X POST -H 'Content-type: application/json' \
         --data '{"text":"Test message"}' \
         YOUR_SLACK_WEBHOOK_URL
    

Log Locations

  • Django logs: /var/log/etb-api.log
  • Celery logs: /var/log/celery.log
  • Nginx logs: /var/log/nginx/
  • System logs: /var/log/syslog

Security Considerations

1. Authentication

  • Use strong authentication tokens
  • Implement token rotation
  • Use HTTPS in production
  • Restrict admin access

2. Data Protection

  • Encrypt sensitive configuration data
  • Use secure database connections
  • Implement data retention policies
  • Regular security audits

3. Network Security

  • Use firewalls to restrict access
  • Implement rate limiting
  • Monitor for suspicious activity
  • Regular security updates

Backup and Recovery

1. Database Backup

# PostgreSQL backup
pg_dump etb_api_monitoring > backup_$(date +%Y%m%d_%H%M%S).sql

# Automated backup script
#!/bin/bash
BACKUP_DIR="/backups/monitoring"
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump etb_api_monitoring > $BACKUP_DIR/backup_$DATE.sql
find $BACKUP_DIR -name "backup_*.sql" -mtime +7 -delete

2. Configuration Backup

# Backup configuration files
tar -czf monitoring_config_$(date +%Y%m%d).tar.gz \
    /path/to/etb-api/core/settings.py \
    /path/to/etb-api/.env \
    /etc/systemd/system/etb-*.service

3. Recovery Procedures

  1. Restore database from backup
  2. Restore configuration files
  3. Restart services
  4. Verify monitoring functionality
  5. Check alert rules and thresholds

Support and Maintenance

Regular Tasks

  • Daily: Check system health and alerts
  • Weekly: Review metrics trends and thresholds
  • Monthly: Update monitoring configuration
  • Quarterly: Review and optimize performance

Monitoring the Monitor

  • Set up external monitoring for the monitoring system
  • Monitor Celery worker health
  • Track database performance
  • Monitor disk space usage

This deployment guide provides a comprehensive foundation for implementing enterprise-grade monitoring for your ETB-API system. Adjust configurations based on your specific requirements and infrastructure.