Applications

Likelihood Ratio Tests

If we believe reasonable pipelines can phrase their predictions in terms of a likelihood ratio test, then we expect them to currently compute

\[\Lambda(h) = \frac{p(h|signal;c)}{p(h|noise)}\]

which is based on information from \(h(t)\) alone. By signal vs. noise, we really mean “signal \(\oplus\) noise” vs. “only noise,” where noise includes both pure Gaussian noise and Gaussian noise with non-Gaussian noise artifacts. We would like to compute a more general likelihood ratio which includes information from auxiliary channels as well

\[\begin{split}\Lambda(h,a) & = \frac{p(h, a|signal)}{p(h, a|noise)} \\ & = \frac{p(h, a|signal;g)p(g) + p(h, a|signal;c)p(c)}{p(h, a|noise; g)p(g) + p(h, a|noise; c)p(c)} \\ & \geq \frac{p(h, a|signal; c)p(c)}{p(h, a|noise; g)p(g) + p(h, a|noise; c)p(c)}\end{split}\]

Formally, both the \(g\) and \(c\) models are contained within the noise model; they simly correspond to the pressence or absence of a non-Gaussian noise artifact, respectively. Because the correlations between \(h\) and \(a\) are expected to be strong when there are both signals and glitches present, we neglect this term and place a lower bound on the likelihood ratio. Pragmatically, this should have little effect on the overall estimate because the chances of both a signal and a glitch occuring simultaneously should be small; GW170817 is a noteworthy counter-example. Now, we can further simplify this expression if we assume that \(h\) and \(a\) are independent for each of the models we consider. Specifically,

\[\begin{split}p(h, a|signal; c) & = p(h|signal; c)p(a|c) \\ p(h, a|noise) & = p(h|noise)p(a|noise) = p(h|noise)(p(a|g)p(g) + p(a|c)p(c))\end{split}\]

There may be correlations between \(h\) and \(a\) for the noise model as well, but we neglect those for now in this Naive Bayes-like approach. While this may produce quantitatively incorrect estimates of \(p(h, a|noise)\), Naive-Bayes estimates often preserve the ordinal ranking of the full function, which is all we really need to rank candidate events correctly.

With this in hand, we can write

\[\begin{split}\Lambda(h,a) & = \frac{p(h|signal; c)}{p(h|noise)}\frac{p(a|c)p(c)}{p(a|g)p(g) + p(a|c)p(c))} \\ & = \Lambda(h) \frac{1}{1 +p(a|g)p(g)/p(a|c)p(c)}\end{split}\]

and obtain a multiplicative factor to the current likelihood ratio.

If this factorization holds, we simply have to measure \(p(a|g)\) and \(p(a|c)\), which we will approximate through statistical inference techniques via

\[\begin{split}p(a|g) & \sim p(a)p(g|h) \\ p(a|c) & \sim p(a)p(c|h) \approx p(a) (1 - p(g|h))\end{split}\]

Within canonical machine learning contexts, this could be thought of as a regression problem, in which we attempt to estimate \(p(a|g)\) via samples of \(a\) weighted by \(p(g|h)\). In this sense, our 2-class classification via supervised learning infers the optimal decision surface between these models, and calibrated measures of the machine learning output approximates \(p(r(a)|g)\) and \(p(r(a)|c)\).

We note that this approach will likely work well for short-duration signals with timescales comparable to typical glitches. If there are searches with longer timescales than typical glitches, in which only part of the data may be polluted by glitches, more complex approaches may be needed.

Filtering

If we assume the time-domain can be broken into a set of consecutive independent trials, or noise realizations, we can apply the likelihood ratio test on each segment separately. Therefore, the overall glitch likelihood for the entire duration of the signal would just be a product of the likelihoods over each segment separately. This would answer the question, “was there a glitch at any time during the signal’s duration?” and ignores when that glitch occured, which is differen than, “did a glitch cause the apparent signal?”

A possible way to include when the glitch occured is to define modified matched filters and/or detection statistics. Some possibilities assuming an optimal filter in Gaussian noise \(f(t)\)

\[\begin{split}\rho(t) & = \int d\tau\, f(t-\tau) h(\tau) \\ \rho_g(t) & = \int d\tau\, f(t-\tau) h(\tau) p(g|a(\tau)) \\ \rho_c(t) & = \int d\tau\, f(t-\tau) h(\tau) p(c|a(\tau)) = \rho(t) - \rho_g(t)\end{split}\]

Searches could then define a likelihood ratio incorporating all these detection statistics instead of just \(\rho\). This in effect defines the amount of signal-to-noise associated with glitchy time as opposed to clean time. While this heuristically feels like a reasonable thing to compute, we do not currently have any demonstration that it is actually optimal.

Similarly, one could define integrals of the amount of time within the signal’s duration associated with glitchy and clean times.

\[\begin{split}\Delta t & = \int\limits_{-\Delta t/2}^{+\Delta t/2} dt \\ \Delta t_g & = \int\limits_{-\Delta t/2}^{+\Delta t/2} dt\, p(g|a(t)) \\ \Delta t_c & = \int\limits_{-\Delta t/2}^{+\Delta t/2} dt\, p(c|a(t)) = \Delta t - \Delta t_g\end{split}\]

Again, it is not know whether these heuristics are actually useful.