========== Monitoring ========== .. contents:: :local: Overview ~~~~~~~~ ligo-scald provides comprehensive monitoring capabilities, including Nagios integration for alerting and HTTPS support for secure connections to database backends. Nagios Integration ~~~~~~~~~~~~~~~~~~ Set up Nagios monitoring based on thresholds or job heartbeats to monitor the health of your gravitational-wave data systems. The monitoring system can track: - **Threshold alerts** - Trigger when data values exceed specified limits - **Heartbeat monitoring** - Detect when data streams stop or become stale - **System health** - Monitor database connectivity and data flow Configuration ~~~~~~~~~~~~~ Nagios checks are configured in your scald configuration file under the ``nagios`` section: .. code-block:: yaml nagios: my_check: lookback: 300 # seconds alert_type: threshold backend: default alert_settings: threshold: 10.0 threshold_units: "Hz" schemas: my_check: measurement: sensor_data column: frequency tag_key: detector tags: ['H1', 'L1', 'V1'] aggregate: max Alert Types ~~~~~~~~~~~ Threshold Alerts ^^^^^^^^^^^^^^^^ Monitor when values exceed specified thresholds: .. code-block:: yaml nagios: frequency_check: lookback: 600 alert_type: threshold alert_settings: threshold: 50.0 threshold_units: "Hz" Heartbeat Monitoring ^^^^^^^^^^^^^^^^^^^^ Detect when data becomes stale or stops flowing: .. code-block:: yaml nagios: data_heartbeat: lookback: 300 alert_type: heartbeat API Endpoint ~~~~~~~~~~~~ Nagios checks are accessible via the HTTP API: .. code-block:: bash curl http://localhost:8080/api/nagios/my_check The endpoint returns JSON-formatted status information compatible with Nagios monitoring systems. HTTPS Configuration ~~~~~~~~~~~~~~~~~~~ Enable secure HTTPS connections to database backends by configuring authentication and SSL in your configuration file. Basic HTTPS Setup ^^^^^^^^^^^^^^^^^ .. code-block:: yaml backends: default: backend: influxdb db: your_database hostname: your_hostname port: 8086 auth: true https: true Authentication Credentials ^^^^^^^^^^^^^^^^^^^^^^^^^^ Scald requires database authentication credentials as environment variables. These can be provided in a ``.netrc`` file inside the Scald config directory. Example ``.netrc`` format: .. code-block:: text machine your_hostname login your_username password your_password SSL Certificate Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For HTTPS connections, specify the SSL certificate path as an environment variable: .. code-block:: bash export SCALD_SSL_CA_CERT="/etc/pki/tls/certs/your_cert_name" Security Best Practices ~~~~~~~~~~~~~~~~~~~~~~~ When configuring monitoring with HTTPS: 1. **Use strong authentication** - Ensure database credentials are secure 2. **Verify SSL certificates** - Use trusted certificate authorities 3. **Limit access** - Restrict network access to monitoring endpoints 4. **Regular updates** - Keep SSL certificates current 5. **Monitor logs** - Review access logs for suspicious activity Environment Variables ~~~~~~~~~~~~~~~~~~~~~ Key environment variables for monitoring: - ``SCALD_SSL_CA_CERT`` - Path to SSL certificate file - ``SCALDRC_PATH`` - Path to configuration file - Database credentials (via ``.netrc`` or environment variables) Troubleshooting ~~~~~~~~~~~~~~~ Common monitoring issues: 1. **SSL certificate errors** - Verify certificate path and validity 2. **Authentication failures** - Check credentials and permissions 3. **Network connectivity** - Ensure database hosts are accessible 4. **Nagios timeout** - Adjust lookback periods for slow queries 5. **False positives** - Fine-tune threshold values and alert settings