Monitoring¶
Overview¶
ligo-scald provides comprehensive monitoring capabilities, including Nagios integration for alerting and HTTPS support for secure connections to database backends.
Nagios Integration¶
Set up Nagios monitoring based on thresholds or job heartbeats to monitor the health of your gravitational-wave data systems.
The monitoring system can track:
Threshold alerts - Trigger when data values exceed specified limits
Heartbeat monitoring - Detect when data streams stop or become stale
System health - Monitor database connectivity and data flow
Configuration¶
Nagios checks are configured in your scald configuration file under the nagios section:
nagios:
my_check:
lookback: 300 # seconds
alert_type: threshold
backend: default
alert_settings:
threshold: 10.0
threshold_units: "Hz"
schemas:
my_check:
measurement: sensor_data
column: frequency
tag_key: detector
tags: ['H1', 'L1', 'V1']
aggregate: max
Alert Types¶
Threshold Alerts¶
Monitor when values exceed specified thresholds:
nagios:
frequency_check:
lookback: 600
alert_type: threshold
alert_settings:
threshold: 50.0
threshold_units: "Hz"
Heartbeat Monitoring¶
Detect when data becomes stale or stops flowing:
nagios:
data_heartbeat:
lookback: 300
alert_type: heartbeat
API Endpoint¶
Nagios checks are accessible via the HTTP API:
curl http://localhost:8080/api/nagios/my_check
The endpoint returns JSON-formatted status information compatible with Nagios monitoring systems.
HTTPS Configuration¶
Enable secure HTTPS connections to database backends by configuring authentication and SSL in your configuration file.
Basic HTTPS Setup¶
backends:
default:
backend: influxdb
db: your_database
hostname: your_hostname
port: 8086
auth: true
https: true
Authentication Credentials¶
Scald requires database authentication credentials as environment variables. These can be provided in a .netrc file inside the Scald config directory.
Example .netrc format:
machine your_hostname
login your_username
password your_password
SSL Certificate Configuration¶
For HTTPS connections, specify the SSL certificate path as an environment variable:
export SCALD_SSL_CA_CERT="/etc/pki/tls/certs/your_cert_name"
Security Best Practices¶
When configuring monitoring with HTTPS:
Use strong authentication - Ensure database credentials are secure
Verify SSL certificates - Use trusted certificate authorities
Limit access - Restrict network access to monitoring endpoints
Regular updates - Keep SSL certificates current
Monitor logs - Review access logs for suspicious activity
Environment Variables¶
Key environment variables for monitoring:
SCALD_SSL_CA_CERT- Path to SSL certificate fileSCALDRC_PATH- Path to configuration fileDatabase credentials (via
.netrcor environment variables)
Troubleshooting¶
Common monitoring issues:
SSL certificate errors - Verify certificate path and validity
Authentication failures - Check credentials and permissions
Network connectivity - Ensure database hosts are accessible
Nagios timeout - Adjust lookback periods for slow queries
False positives - Fine-tune threshold values and alert settings