# ETB-API Monitoring System Deployment Guide ## Overview This guide provides step-by-step instructions for deploying the comprehensive monitoring system for your ETB-API platform. The monitoring system provides enterprise-grade observability across all modules. ## Prerequisites ### System Requirements - Python 3.8+ - Django 5.2+ - PostgreSQL 12+ (recommended) or SQLite (development) - Redis 6+ (for Celery) - Celery 5.3+ ### Dependencies - psutil>=5.9.0 - requests>=2.31.0 - celery>=5.3.0 - redis>=4.5.0 ## Installation Steps ### 1. Install Dependencies ```bash # Install Python dependencies pip install psutil>=5.9.0 requests>=2.31.0 # Install Redis (Ubuntu/Debian) sudo apt-get install redis-server # Install Redis (CentOS/RHEL) sudo yum install redis ``` ### 2. Database Setup ```bash # Create and run migrations python manage.py makemigrations monitoring python manage.py migrate # Create superuser (if not exists) python manage.py createsuperuser ``` ### 3. Initialize Monitoring Configuration ```bash # Set up default monitoring targets, metrics, and alert rules python manage.py setup_monitoring --admin-user admin ``` ### 4. Configure Celery Create or update `celery.py` in your project: ```python from celery import Celery from django.conf import settings import os os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings') app = Celery('core') app.config_from_object('django.conf:settings', namespace='CELERY') # Add monitoring tasks schedule app.conf.beat_schedule = { 'health-checks': { 'task': 'monitoring.tasks.execute_health_checks', 'schedule': 60.0, # Every minute }, 'metrics-collection': { 'task': 'monitoring.tasks.collect_metrics', 'schedule': 300.0, # Every 5 minutes }, 'alert-evaluation': { 'task': 'monitoring.tasks.evaluate_alerts', 'schedule': 60.0, # Every minute }, 'data-cleanup': { 'task': 'monitoring.tasks.cleanup_old_data', 'schedule': 86400.0, # Daily }, 'system-status-report': { 'task': 'monitoring.tasks.generate_system_status_report', 'schedule': 300.0, # Every 5 minutes }, } app.autodiscover_tasks() ``` ### 5. Environment Configuration Create `.env` file or set environment variables: ```bash # Monitoring Settings MONITORING_ENABLED=true MONITORING_HEALTH_CHECK_INTERVAL=60 MONITORING_METRICS_COLLECTION_INTERVAL=300 MONITORING_ALERT_EVALUATION_INTERVAL=60 # Alerting Settings ALERTING_EMAIL_FROM=monitoring@yourcompany.com ALERTING_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK ALERTING_WEBHOOK_URL=https://your-webhook-url.com/alerts # Performance Thresholds PERFORMANCE_API_RESPONSE_THRESHOLD=2000 PERFORMANCE_CPU_THRESHOLD=80 PERFORMANCE_MEMORY_THRESHOLD=80 PERFORMANCE_DISK_THRESHOLD=80 # Email Configuration (for alerts) EMAIL_HOST=smtp.gmail.com EMAIL_PORT=587 EMAIL_USE_TLS=True EMAIL_HOST_USER=your-email@gmail.com EMAIL_HOST_PASSWORD=your-app-password DEFAULT_FROM_EMAIL=monitoring@yourcompany.com ``` ### 6. Start Services ```bash # Start Django development server python manage.py runserver # Start Celery worker (in separate terminal) celery -A core worker -l info # Start Celery beat scheduler (in separate terminal) celery -A core beat -l info # Start Redis (if not running as service) redis-server ``` ## Production Deployment ### 1. Database Configuration For production, use PostgreSQL: ```python # settings.py DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'etb_api_monitoring', 'USER': 'monitoring_user', 'PASSWORD': 'secure_password', 'HOST': 'localhost', 'PORT': '5432', } } ``` ### 2. Redis Configuration ```python # settings.py CELERY_BROKER_URL = 'redis://localhost:6379/0' CELERY_RESULT_BACKEND = 'redis://localhost:6379/0' ``` ### 3. Static Files and Media ```bash # Collect static files python manage.py collectstatic # Configure web server (Nginx example) server { listen 80; server_name your-domain.com; location /static/ { alias /path/to/your/static/files/; } location /media/ { alias /path/to/your/media/files/; } location / { proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } ``` ### 4. Process Management Use systemd services for production: **Django Service** (`/etc/systemd/system/etb-api.service`): ```ini [Unit] Description=ETB-API Django Application After=network.target [Service] Type=simple User=www-data Group=www-data WorkingDirectory=/path/to/etb-api Environment=PATH=/path/to/etb-api/venv/bin ExecStart=/path/to/etb-api/venv/bin/python manage.py runserver 0.0.0.0:8000 Restart=always [Install] WantedBy=multi-user.target ``` **Celery Worker Service** (`/etc/systemd/system/etb-celery.service`): ```ini [Unit] Description=ETB-API Celery Worker After=network.target [Service] Type=simple User=www-data Group=www-data WorkingDirectory=/path/to/etb-api Environment=PATH=/path/to/etb-api/venv/bin ExecStart=/path/to/etb-api/venv/bin/celery -A core worker -l info Restart=always [Install] WantedBy=multi-user.target ``` **Celery Beat Service** (`/etc/systemd/system/etb-celery-beat.service`): ```ini [Unit] Description=ETB-API Celery Beat Scheduler After=network.target [Service] Type=simple User=www-data Group=www-data WorkingDirectory=/path/to/etb-api Environment=PATH=/path/to/etb-api/venv/bin ExecStart=/path/to/etb-api/venv/bin/celery -A core beat -l info Restart=always [Install] WantedBy=multi-user.target ``` ### 5. Enable Services ```bash # Enable and start services sudo systemctl enable etb-api sudo systemctl enable etb-celery sudo systemctl enable etb-celery-beat sudo systemctl enable redis sudo systemctl start etb-api sudo systemctl start etb-celery sudo systemctl start etb-celery-beat sudo systemctl start redis ``` ## Monitoring Configuration ### 1. Customize Monitoring Targets Access the admin interface at `http://your-domain.com/admin/monitoring/` to: - Add custom monitoring targets - Configure health check intervals - Set up external service monitoring - Customize alert thresholds ### 2. Configure Alert Rules Create alert rules for: - **Performance Alerts**: High response times, error rates - **Business Alerts**: SLA breaches, incident volume spikes - **Security Alerts**: Failed logins, security events - **Infrastructure Alerts**: High CPU, memory, disk usage ### 3. Set Up Notification Channels Configure notification channels: - **Email**: Set up SMTP configuration - **Slack**: Configure webhook URLs - **Webhooks**: Set up external alerting systems - **PagerDuty**: Integrate with incident management ### 4. Create Custom Dashboards Design dashboards for different user roles: - **Executive Dashboard**: High-level KPIs and trends - **Operations Dashboard**: Real-time system status - **Security Dashboard**: Security metrics and alerts - **Development Dashboard**: Application performance metrics ## Verification ### 1. Check System Health ```bash # Check health check summary curl -H "Authorization: Token your-token" \ http://localhost:8000/api/monitoring/health-checks/summary/ # Check system overview curl -H "Authorization: Token your-token" \ http://localhost:8000/api/monitoring/overview/ ``` ### 2. Verify Celery Tasks ```bash # Check Celery worker status celery -A core inspect active # Check scheduled tasks celery -A core inspect scheduled ``` ### 3. Test Alerting ```bash # Trigger a test alert python manage.py shell >>> from monitoring.models import AlertRule >>> rule = AlertRule.objects.first() >>> # Manually trigger alert for testing ``` ## Maintenance ### 1. Data Cleanup The system automatically cleans up old data, but you can manually run: ```bash python manage.py shell >>> from monitoring.tasks import cleanup_old_data >>> cleanup_old_data.delay() ``` ### 2. Performance Tuning Monitor and tune: - Health check intervals - Metrics collection frequency - Alert evaluation intervals - Data retention periods ### 3. Scaling For high-volume environments: - Use multiple Celery workers - Implement Redis clustering - Use database read replicas - Consider time-series databases for metrics ## Troubleshooting ### Common Issues 1. **Health Checks Failing** ```bash # Check logs tail -f /var/log/etb-api.log # Test individual targets python manage.py shell >>> from monitoring.services.health_checks import HealthCheckService >>> service = HealthCheckService() >>> service.execute_all_health_checks() ``` 2. **Celery Tasks Not Running** ```bash # Check Celery status celery -A core inspect active # Check Redis connection redis-cli ping # Restart services sudo systemctl restart etb-celery sudo systemctl restart etb-celery-beat ``` 3. **Alerts Not Sending** ```bash # Check email configuration python manage.py shell >>> from django.core.mail import send_mail >>> send_mail('Test', 'Test message', 'from@example.com', ['to@example.com']) # Check Slack webhook curl -X POST -H 'Content-type: application/json' \ --data '{"text":"Test message"}' \ YOUR_SLACK_WEBHOOK_URL ``` ### Log Locations - Django logs: `/var/log/etb-api.log` - Celery logs: `/var/log/celery.log` - Nginx logs: `/var/log/nginx/` - System logs: `/var/log/syslog` ## Security Considerations ### 1. Authentication - Use strong authentication tokens - Implement token rotation - Use HTTPS in production - Restrict admin access ### 2. Data Protection - Encrypt sensitive configuration data - Use secure database connections - Implement data retention policies - Regular security audits ### 3. Network Security - Use firewalls to restrict access - Implement rate limiting - Monitor for suspicious activity - Regular security updates ## Backup and Recovery ### 1. Database Backup ```bash # PostgreSQL backup pg_dump etb_api_monitoring > backup_$(date +%Y%m%d_%H%M%S).sql # Automated backup script #!/bin/bash BACKUP_DIR="/backups/monitoring" DATE=$(date +%Y%m%d_%H%M%S) pg_dump etb_api_monitoring > $BACKUP_DIR/backup_$DATE.sql find $BACKUP_DIR -name "backup_*.sql" -mtime +7 -delete ``` ### 2. Configuration Backup ```bash # Backup configuration files tar -czf monitoring_config_$(date +%Y%m%d).tar.gz \ /path/to/etb-api/core/settings.py \ /path/to/etb-api/.env \ /etc/systemd/system/etb-*.service ``` ### 3. Recovery Procedures 1. Restore database from backup 2. Restore configuration files 3. Restart services 4. Verify monitoring functionality 5. Check alert rules and thresholds ## Support and Maintenance ### Regular Tasks - **Daily**: Check system health and alerts - **Weekly**: Review metrics trends and thresholds - **Monthly**: Update monitoring configuration - **Quarterly**: Review and optimize performance ### Monitoring the Monitor - Set up external monitoring for the monitoring system - Monitor Celery worker health - Track database performance - Monitor disk space usage This deployment guide provides a comprehensive foundation for implementing enterprise-grade monitoring for your ETB-API system. Adjust configurations based on your specific requirements and infrastructure.