idq.classifiers.ovl¶

class idq.classifiers.ovl.DOVL(*args, **kwargs)[source]¶

Discrete OVL: a modified version of OVL that trains based on discrete samples to estimate the deadtime. We note that there is an extension of this that estimates the False Alarm Probability instead of the deadtime (DOVLfap), but still uses discrete samples. The actual implementation of the OVL algorithm itself is stored in a subclass (:class:idq.classifiers.OVL) because DOVL has standard training signatures while OVL does not

WRITE ME describe what we “overwrite” from OVL (although we don’t really have convenient access to these…) train (a trivial delegation, and we overwrite it simply because we want the signature to have a clearer variable name) _recalculate

calibrate(dataset, bounded=True, **kwargs)[source]¶: calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map

evaluate(dataset)[source]¶: sets the ranks for these feature vectors modifies the objects in place!

feature_importance()[source]¶: delegates to Vetolist.feature_importance

feature_importance_figure(dataset, start, end, t0, **kwargs)[source]¶: delegate to Vetolist.feature_importance_figure with a few extra things

feature_importance_table(dataset=None, **kwargs)[source]¶: delegate to Vetolist.feature_importance_table

redundancies(dataset)[source]¶

computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.

we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations

returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table

timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)[source]¶: delegates to Vetolist.timeseries returns ranks

train(dataset)[source]¶

Instantiates a Vetolist and trains using the data within a dataset.

Algorithmic parameters include:

channels

thresholds

windows

num_recalculate

incremental

minima (key, value pairs)

and are specified through self.kwargs, set during instantiation.

NOTE: We do not explicitly call self._check_columns(datachunk) and instead assume the user has already done this when constructing a dataset.

class idq.classifiers.ovl.DOVLfap(*args, **kwargs)[source]¶

an extension of DOVL that estimates the FAP in a slightly different way. DOVL uses an approximation of the deadtime (comparable to what OVL does)

fap=(num_vetod_gch+num_vetod_cln)/(num_gch+num_cln)

DOVLfap uses an approximation close to the FAP

fap=num_vetod_cln/num_clean

while the FAP might appear more correct, the deadtime is likely to be more numerically stable no infinities and thresholds on poisson_signif may prune over-trained configureations

this is controlled based on a conditional within _compute_dovl_configuration_performance

class idq.classifiers.ovl.OVL(*args, **kwargs)[source]¶

a wrapper for the Ordered Veto List (OVL) algorithm published in Essick et al, CQG 30, 15 (2013) (DOI: 10.1088/0264-9381/30/15/155010) This algorithm estimates False Alarm Probability based on measures of the deadtime associated with segments generated around auxiliary events.

WRITE ME set this up so it takes in a path to an output directory and then writes the Vetolist, data objects into that directory with appropriate names (extract start, end from data objects)

describe inheritence for the extra attributes not declared within SupervisedClassifier _allowed_metrics _default_incremental _default_minima _gammln_cof _gammln_stp and the associated methods _recalculate redundancies _check_columns _gcf _gammln _gserln _gammpln

train(dataset)[source]¶

Instantiates a Vetolist and trains using the data within a dataset.

Algorithmic parameters include:

channels

thresholds

windows

num_recalculate

incremental

minima (key, value pairs)

and are specified through self.kwargs, set during instantiation.

class idq.classifiers.ovl.Vetolist(start, end, segs=None, model_id=None, generate_id=False, channels=None, thresholds=None, windows=None, significance=None, **kwargs)[source]¶

a representation of the OVL and DOVL vetolist this is stored as the internal model within OVL and DOVL

WRITE ME

decribe protections of attributes and their meanings: _default_thresholds _default_windows _table_dtypes _repr_head _repr_body

dump(path, nickname=None, **kwargs)[source]¶

write this object into path should contain all info, not just the configuration order Uses hdf5 format, and asserts that the path must end in “.hdf5”

we can add attributes to this file via the kwargs specified in this function’s signature

WARNING: we may want to add default attrs to the hdf5 file, like the code’s version and the git repo location, etc. we also may want to add information about how this file was produced, but that may be optional and can be specified via kwargs

NOTE: we’ll likely use an instance of HDF5Reporter rather than this method

feature_importance()[source]¶: should essentially be an ordered list of veto configurations although this is really a structured numpy array

feature_importance_figure(dataset, start, end, t0, time, significance, colormap='copper_r', nonerow=True, nonecolor=None, verbose=False, **kwargs)[source]¶: generate and return a figure demonstrating the feature importance based on the data within dataset; should return a figure object.

feature_importance_table(time, significance, dataset=None, map2str=True, **kwargs)[source]¶: should return (columns, data) compatible with the DQR’s json.format_table (see use in idq/reports.py

init_table()[source]¶: instantiate the table used to store data internally

load(path, nickname=None, verbose=True)[source]¶

load the data from file into memory will load all info, not just hte configuration order Uses hdf5 format, and asserts that the path must end in “.hdf5”

if verbose: print attrs from dataset

NOTE:: we’ll likely use an instance of HDF5Reporter rather than this method

metric2rank(value, metric='eff_fap', **kwargs)[source]¶: map our metric value into a rank we define a somewhat ad-hoc monotonic function with a separate scale for each metric

nonredundant_segments(dataset, time, significance)[source]¶: Checks which configurations are redundant and then returns the indices of the non-redundant configurations, and their segments

prune(minimum, metric='eff_fap', **kwargs)[source]¶

remove all configurations with metric<minimum

WARNING:

this will cause data to be forgotten in an unrecoverable way!

only use this if you are certain you know what you’re doing!

redundancies(dataset, time, significance, verbose=False, Verbose=False)[source]¶

computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.

we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations

returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table

reorder(metric='eff_fap', metric_weights=None, **kwargs)[source]¶: re-order the vetolist according to this metric we always sort so that bigger numbers show up first

save_as_data_quality_flags(out_path, dataset, time, significance, remove_redundant=True)[source]¶: Saves the veto list as a set of data quality flags in an hdf5 file

segments(dataset, time, significance, **kwargs)[source]¶: generate veto segments for all configurations in the vetolist returns a list of segment lists in the order corresponding to the configurations within self.table

timeseries(dataset, time, significance, metric='eff_fap', dt=0.00390625, segs=None, **kwargs)[source]¶

generate timeseries for each segment in dataset.segments separately should be useful wihtin OVL.evaluate, vetolist2segments, etc

iterates from the end of the table to the front, setting the values of the timeseries to the metric of the corresponding configuration at each step if more than one configuration is active at this time, the final timeseries will have the value corresponding to the configuration that’s closer to the front of the list

-> this makes sense if you assume the configurations are ordered by decending metric, which should be the case

we extract the ‘rank’ column, which is set via calls to self.reorder.

NOTE:

monotonic decrease in metric as you move down the table is not

strictly guaranteed and will depend on whether the list has converged. * it is also possible that the list will never converge because of cycles (2 configurations flip back and forth forever with additional iterations) This may be a weakness of the algorithm!