Monitoring

Overview

ligo-scald provides comprehensive monitoring capabilities, including Nagios integration for alerting and HTTPS support for secure connections to database backends.

Nagios Integration

Set up Nagios monitoring based on thresholds or job heartbeats to monitor the health of your gravitational-wave data systems.

The monitoring system can track:

  • Threshold alerts - Trigger when data values exceed specified limits

  • Heartbeat monitoring - Detect when data streams stop or become stale

  • System health - Monitor database connectivity and data flow

Configuration

Nagios checks are configured in your scald configuration file under the nagios section:

nagios:
  my_check:
    lookback: 300  # seconds
    alert_type: threshold
    backend: default
    alert_settings:
      threshold: 10.0
      threshold_units: "Hz"

schemas:
  my_check:
    measurement: sensor_data
    column: frequency
    tag_key: detector
    tags: ['H1', 'L1', 'V1']
    aggregate: max

Alert Types

Threshold Alerts

Monitor when values exceed specified thresholds:

nagios:
  frequency_check:
    lookback: 600
    alert_type: threshold
    alert_settings:
      threshold: 50.0
      threshold_units: "Hz"

Heartbeat Monitoring

Detect when data becomes stale or stops flowing:

nagios:
  data_heartbeat:
    lookback: 300
    alert_type: heartbeat

API Endpoint

Nagios checks are accessible via the HTTP API:

curl http://localhost:8080/api/nagios/my_check

The endpoint returns JSON-formatted status information compatible with Nagios monitoring systems.

HTTPS Configuration

Enable secure HTTPS connections to database backends by configuring authentication and SSL in your configuration file.

Basic HTTPS Setup

backends:
  default:
    backend: influxdb
    db: your_database
    hostname: your_hostname
    port: 8086
    auth: true
    https: true

Authentication Credentials

Scald requires database authentication credentials as environment variables. These can be provided in a .netrc file inside the Scald config directory.

Example .netrc format:

machine your_hostname
login your_username
password your_password

SSL Certificate Configuration

For HTTPS connections, specify the SSL certificate path as an environment variable:

export SCALD_SSL_CA_CERT="/etc/pki/tls/certs/your_cert_name"

Security Best Practices

When configuring monitoring with HTTPS:

  1. Use strong authentication - Ensure database credentials are secure

  2. Verify SSL certificates - Use trusted certificate authorities

  3. Limit access - Restrict network access to monitoring endpoints

  4. Regular updates - Keep SSL certificates current

  5. Monitor logs - Review access logs for suspicious activity

Environment Variables

Key environment variables for monitoring:

  • SCALD_SSL_CA_CERT - Path to SSL certificate file

  • SCALDRC_PATH - Path to configuration file

  • Database credentials (via .netrc or environment variables)

Troubleshooting

Common monitoring issues:

  1. SSL certificate errors - Verify certificate path and validity

  2. Authentication failures - Check credentials and permissions

  3. Network connectivity - Ensure database hosts are accessible

  4. Nagios timeout - Adjust lookback periods for slow queries

  5. False positives - Fine-tune threshold values and alert settings