Updates
This commit is contained in:
495
ETB-API/MONITORING_DEPLOYMENT_GUIDE.md
Normal file
495
ETB-API/MONITORING_DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,495 @@
|
||||
# ETB-API Monitoring System Deployment Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide provides step-by-step instructions for deploying the comprehensive monitoring system for your ETB-API platform. The monitoring system provides enterprise-grade observability across all modules.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### System Requirements
|
||||
- Python 3.8+
|
||||
- Django 5.2+
|
||||
- PostgreSQL 12+ (recommended) or SQLite (development)
|
||||
- Redis 6+ (for Celery)
|
||||
- Celery 5.3+
|
||||
|
||||
### Dependencies
|
||||
- psutil>=5.9.0
|
||||
- requests>=2.31.0
|
||||
- celery>=5.3.0
|
||||
- redis>=4.5.0
|
||||
|
||||
## Installation Steps
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
```bash
|
||||
# Install Python dependencies
|
||||
pip install psutil>=5.9.0 requests>=2.31.0
|
||||
|
||||
# Install Redis (Ubuntu/Debian)
|
||||
sudo apt-get install redis-server
|
||||
|
||||
# Install Redis (CentOS/RHEL)
|
||||
sudo yum install redis
|
||||
```
|
||||
|
||||
### 2. Database Setup
|
||||
|
||||
```bash
|
||||
# Create and run migrations
|
||||
python manage.py makemigrations monitoring
|
||||
python manage.py migrate
|
||||
|
||||
# Create superuser (if not exists)
|
||||
python manage.py createsuperuser
|
||||
```
|
||||
|
||||
### 3. Initialize Monitoring Configuration
|
||||
|
||||
```bash
|
||||
# Set up default monitoring targets, metrics, and alert rules
|
||||
python manage.py setup_monitoring --admin-user admin
|
||||
```
|
||||
|
||||
### 4. Configure Celery
|
||||
|
||||
Create or update `celery.py` in your project:
|
||||
|
||||
```python
|
||||
from celery import Celery
|
||||
from django.conf import settings
|
||||
import os
|
||||
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings')
|
||||
|
||||
app = Celery('core')
|
||||
app.config_from_object('django.conf:settings', namespace='CELERY')
|
||||
|
||||
# Add monitoring tasks schedule
|
||||
app.conf.beat_schedule = {
|
||||
'health-checks': {
|
||||
'task': 'monitoring.tasks.execute_health_checks',
|
||||
'schedule': 60.0, # Every minute
|
||||
},
|
||||
'metrics-collection': {
|
||||
'task': 'monitoring.tasks.collect_metrics',
|
||||
'schedule': 300.0, # Every 5 minutes
|
||||
},
|
||||
'alert-evaluation': {
|
||||
'task': 'monitoring.tasks.evaluate_alerts',
|
||||
'schedule': 60.0, # Every minute
|
||||
},
|
||||
'data-cleanup': {
|
||||
'task': 'monitoring.tasks.cleanup_old_data',
|
||||
'schedule': 86400.0, # Daily
|
||||
},
|
||||
'system-status-report': {
|
||||
'task': 'monitoring.tasks.generate_system_status_report',
|
||||
'schedule': 300.0, # Every 5 minutes
|
||||
},
|
||||
}
|
||||
|
||||
app.autodiscover_tasks()
|
||||
```
|
||||
|
||||
### 5. Environment Configuration
|
||||
|
||||
Create `.env` file or set environment variables:
|
||||
|
||||
```bash
|
||||
# Monitoring Settings
|
||||
MONITORING_ENABLED=true
|
||||
MONITORING_HEALTH_CHECK_INTERVAL=60
|
||||
MONITORING_METRICS_COLLECTION_INTERVAL=300
|
||||
MONITORING_ALERT_EVALUATION_INTERVAL=60
|
||||
|
||||
# Alerting Settings
|
||||
ALERTING_EMAIL_FROM=monitoring@yourcompany.com
|
||||
ALERTING_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
|
||||
ALERTING_WEBHOOK_URL=https://your-webhook-url.com/alerts
|
||||
|
||||
# Performance Thresholds
|
||||
PERFORMANCE_API_RESPONSE_THRESHOLD=2000
|
||||
PERFORMANCE_CPU_THRESHOLD=80
|
||||
PERFORMANCE_MEMORY_THRESHOLD=80
|
||||
PERFORMANCE_DISK_THRESHOLD=80
|
||||
|
||||
# Email Configuration (for alerts)
|
||||
EMAIL_HOST=smtp.gmail.com
|
||||
EMAIL_PORT=587
|
||||
EMAIL_USE_TLS=True
|
||||
EMAIL_HOST_USER=your-email@gmail.com
|
||||
EMAIL_HOST_PASSWORD=your-app-password
|
||||
DEFAULT_FROM_EMAIL=monitoring@yourcompany.com
|
||||
```
|
||||
|
||||
### 6. Start Services
|
||||
|
||||
```bash
|
||||
# Start Django development server
|
||||
python manage.py runserver
|
||||
|
||||
# Start Celery worker (in separate terminal)
|
||||
celery -A core worker -l info
|
||||
|
||||
# Start Celery beat scheduler (in separate terminal)
|
||||
celery -A core beat -l info
|
||||
|
||||
# Start Redis (if not running as service)
|
||||
redis-server
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### 1. Database Configuration
|
||||
|
||||
For production, use PostgreSQL:
|
||||
|
||||
```python
|
||||
# settings.py
|
||||
DATABASES = {
|
||||
'default': {
|
||||
'ENGINE': 'django.db.backends.postgresql',
|
||||
'NAME': 'etb_api_monitoring',
|
||||
'USER': 'monitoring_user',
|
||||
'PASSWORD': 'secure_password',
|
||||
'HOST': 'localhost',
|
||||
'PORT': '5432',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Redis Configuration
|
||||
|
||||
```python
|
||||
# settings.py
|
||||
CELERY_BROKER_URL = 'redis://localhost:6379/0'
|
||||
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
|
||||
```
|
||||
|
||||
### 3. Static Files and Media
|
||||
|
||||
```bash
|
||||
# Collect static files
|
||||
python manage.py collectstatic
|
||||
|
||||
# Configure web server (Nginx example)
|
||||
server {
|
||||
listen 80;
|
||||
server_name your-domain.com;
|
||||
|
||||
location /static/ {
|
||||
alias /path/to/your/static/files/;
|
||||
}
|
||||
|
||||
location /media/ {
|
||||
alias /path/to/your/media/files/;
|
||||
}
|
||||
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8000;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Process Management
|
||||
|
||||
Use systemd services for production:
|
||||
|
||||
**Django Service** (`/etc/systemd/system/etb-api.service`):
|
||||
```ini
|
||||
[Unit]
|
||||
Description=ETB-API Django Application
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=www-data
|
||||
Group=www-data
|
||||
WorkingDirectory=/path/to/etb-api
|
||||
Environment=PATH=/path/to/etb-api/venv/bin
|
||||
ExecStart=/path/to/etb-api/venv/bin/python manage.py runserver 0.0.0.0:8000
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Celery Worker Service** (`/etc/systemd/system/etb-celery.service`):
|
||||
```ini
|
||||
[Unit]
|
||||
Description=ETB-API Celery Worker
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=www-data
|
||||
Group=www-data
|
||||
WorkingDirectory=/path/to/etb-api
|
||||
Environment=PATH=/path/to/etb-api/venv/bin
|
||||
ExecStart=/path/to/etb-api/venv/bin/celery -A core worker -l info
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Celery Beat Service** (`/etc/systemd/system/etb-celery-beat.service`):
|
||||
```ini
|
||||
[Unit]
|
||||
Description=ETB-API Celery Beat Scheduler
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=www-data
|
||||
Group=www-data
|
||||
WorkingDirectory=/path/to/etb-api
|
||||
Environment=PATH=/path/to/etb-api/venv/bin
|
||||
ExecStart=/path/to/etb-api/venv/bin/celery -A core beat -l info
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### 5. Enable Services
|
||||
|
||||
```bash
|
||||
# Enable and start services
|
||||
sudo systemctl enable etb-api
|
||||
sudo systemctl enable etb-celery
|
||||
sudo systemctl enable etb-celery-beat
|
||||
sudo systemctl enable redis
|
||||
|
||||
sudo systemctl start etb-api
|
||||
sudo systemctl start etb-celery
|
||||
sudo systemctl start etb-celery-beat
|
||||
sudo systemctl start redis
|
||||
```
|
||||
|
||||
## Monitoring Configuration
|
||||
|
||||
### 1. Customize Monitoring Targets
|
||||
|
||||
Access the admin interface at `http://your-domain.com/admin/monitoring/` to:
|
||||
|
||||
- Add custom monitoring targets
|
||||
- Configure health check intervals
|
||||
- Set up external service monitoring
|
||||
- Customize alert thresholds
|
||||
|
||||
### 2. Configure Alert Rules
|
||||
|
||||
Create alert rules for:
|
||||
|
||||
- **Performance Alerts**: High response times, error rates
|
||||
- **Business Alerts**: SLA breaches, incident volume spikes
|
||||
- **Security Alerts**: Failed logins, security events
|
||||
- **Infrastructure Alerts**: High CPU, memory, disk usage
|
||||
|
||||
### 3. Set Up Notification Channels
|
||||
|
||||
Configure notification channels:
|
||||
|
||||
- **Email**: Set up SMTP configuration
|
||||
- **Slack**: Configure webhook URLs
|
||||
- **Webhooks**: Set up external alerting systems
|
||||
- **PagerDuty**: Integrate with incident management
|
||||
|
||||
### 4. Create Custom Dashboards
|
||||
|
||||
Design dashboards for different user roles:
|
||||
|
||||
- **Executive Dashboard**: High-level KPIs and trends
|
||||
- **Operations Dashboard**: Real-time system status
|
||||
- **Security Dashboard**: Security metrics and alerts
|
||||
- **Development Dashboard**: Application performance metrics
|
||||
|
||||
## Verification
|
||||
|
||||
### 1. Check System Health
|
||||
|
||||
```bash
|
||||
# Check health check summary
|
||||
curl -H "Authorization: Token your-token" \
|
||||
http://localhost:8000/api/monitoring/health-checks/summary/
|
||||
|
||||
# Check system overview
|
||||
curl -H "Authorization: Token your-token" \
|
||||
http://localhost:8000/api/monitoring/overview/
|
||||
```
|
||||
|
||||
### 2. Verify Celery Tasks
|
||||
|
||||
```bash
|
||||
# Check Celery worker status
|
||||
celery -A core inspect active
|
||||
|
||||
# Check scheduled tasks
|
||||
celery -A core inspect scheduled
|
||||
```
|
||||
|
||||
### 3. Test Alerting
|
||||
|
||||
```bash
|
||||
# Trigger a test alert
|
||||
python manage.py shell
|
||||
>>> from monitoring.models import AlertRule
|
||||
>>> rule = AlertRule.objects.first()
|
||||
>>> # Manually trigger alert for testing
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### 1. Data Cleanup
|
||||
|
||||
The system automatically cleans up old data, but you can manually run:
|
||||
|
||||
```bash
|
||||
python manage.py shell
|
||||
>>> from monitoring.tasks import cleanup_old_data
|
||||
>>> cleanup_old_data.delay()
|
||||
```
|
||||
|
||||
### 2. Performance Tuning
|
||||
|
||||
Monitor and tune:
|
||||
|
||||
- Health check intervals
|
||||
- Metrics collection frequency
|
||||
- Alert evaluation intervals
|
||||
- Data retention periods
|
||||
|
||||
### 3. Scaling
|
||||
|
||||
For high-volume environments:
|
||||
|
||||
- Use multiple Celery workers
|
||||
- Implement Redis clustering
|
||||
- Use database read replicas
|
||||
- Consider time-series databases for metrics
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Health Checks Failing**
|
||||
```bash
|
||||
# Check logs
|
||||
tail -f /var/log/etb-api.log
|
||||
|
||||
# Test individual targets
|
||||
python manage.py shell
|
||||
>>> from monitoring.services.health_checks import HealthCheckService
|
||||
>>> service = HealthCheckService()
|
||||
>>> service.execute_all_health_checks()
|
||||
```
|
||||
|
||||
2. **Celery Tasks Not Running**
|
||||
```bash
|
||||
# Check Celery status
|
||||
celery -A core inspect active
|
||||
|
||||
# Check Redis connection
|
||||
redis-cli ping
|
||||
|
||||
# Restart services
|
||||
sudo systemctl restart etb-celery
|
||||
sudo systemctl restart etb-celery-beat
|
||||
```
|
||||
|
||||
3. **Alerts Not Sending**
|
||||
```bash
|
||||
# Check email configuration
|
||||
python manage.py shell
|
||||
>>> from django.core.mail import send_mail
|
||||
>>> send_mail('Test', 'Test message', 'from@example.com', ['to@example.com'])
|
||||
|
||||
# Check Slack webhook
|
||||
curl -X POST -H 'Content-type: application/json' \
|
||||
--data '{"text":"Test message"}' \
|
||||
YOUR_SLACK_WEBHOOK_URL
|
||||
```
|
||||
|
||||
### Log Locations
|
||||
|
||||
- Django logs: `/var/log/etb-api.log`
|
||||
- Celery logs: `/var/log/celery.log`
|
||||
- Nginx logs: `/var/log/nginx/`
|
||||
- System logs: `/var/log/syslog`
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### 1. Authentication
|
||||
- Use strong authentication tokens
|
||||
- Implement token rotation
|
||||
- Use HTTPS in production
|
||||
- Restrict admin access
|
||||
|
||||
### 2. Data Protection
|
||||
- Encrypt sensitive configuration data
|
||||
- Use secure database connections
|
||||
- Implement data retention policies
|
||||
- Regular security audits
|
||||
|
||||
### 3. Network Security
|
||||
- Use firewalls to restrict access
|
||||
- Implement rate limiting
|
||||
- Monitor for suspicious activity
|
||||
- Regular security updates
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### 1. Database Backup
|
||||
|
||||
```bash
|
||||
# PostgreSQL backup
|
||||
pg_dump etb_api_monitoring > backup_$(date +%Y%m%d_%H%M%S).sql
|
||||
|
||||
# Automated backup script
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/backups/monitoring"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
pg_dump etb_api_monitoring > $BACKUP_DIR/backup_$DATE.sql
|
||||
find $BACKUP_DIR -name "backup_*.sql" -mtime +7 -delete
|
||||
```
|
||||
|
||||
### 2. Configuration Backup
|
||||
|
||||
```bash
|
||||
# Backup configuration files
|
||||
tar -czf monitoring_config_$(date +%Y%m%d).tar.gz \
|
||||
/path/to/etb-api/core/settings.py \
|
||||
/path/to/etb-api/.env \
|
||||
/etc/systemd/system/etb-*.service
|
||||
```
|
||||
|
||||
### 3. Recovery Procedures
|
||||
|
||||
1. Restore database from backup
|
||||
2. Restore configuration files
|
||||
3. Restart services
|
||||
4. Verify monitoring functionality
|
||||
5. Check alert rules and thresholds
|
||||
|
||||
## Support and Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
|
||||
- **Daily**: Check system health and alerts
|
||||
- **Weekly**: Review metrics trends and thresholds
|
||||
- **Monthly**: Update monitoring configuration
|
||||
- **Quarterly**: Review and optimize performance
|
||||
|
||||
### Monitoring the Monitor
|
||||
|
||||
- Set up external monitoring for the monitoring system
|
||||
- Monitor Celery worker health
|
||||
- Track database performance
|
||||
- Monitor disk space usage
|
||||
|
||||
This deployment guide provides a comprehensive foundation for implementing enterprise-grade monitoring for your ETB-API system. Adjust configurations based on your specific requirements and infrastructure.
|
||||
Reference in New Issue
Block a user