idq.classifiers.ovl¶
- class idq.classifiers.ovl.DOVL(*args, **kwargs)[source]¶
Discrete OVL: a modified version of OVL that trains based on discrete samples to estimate the deadtime. We note that there is an extension of this that estimates the False Alarm Probability instead of the deadtime (DOVLfap), but still uses discrete samples. The actual implementation of the OVL algorithm itself is stored in a subclass (:class:idq.classifiers.OVL) because DOVL has standard training signatures while OVL does not
WRITE ME describe what we “overwrite” from OVL (although we don’t really have convenient access to these…) train (a trivial delegation, and we overwrite it simply because we want the signature to have a clearer variable name) _recalculate
- calibrate(dataset, bounded=True, **kwargs)[source]¶
calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map
- feature_importance_figure(dataset, start, end, t0, **kwargs)[source]¶
delegate to Vetolist.feature_importance_figure with a few extra things
- feature_importance_table(dataset=None, **kwargs)[source]¶
delegate to Vetolist.feature_importance_table
- redundancies(dataset)[source]¶
computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.
we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations
returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table
- timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)[source]¶
delegates to Vetolist.timeseries returns ranks
- train(dataset)[source]¶
Instantiates a Vetolist and trains using the data within a dataset.
Algorithmic parameters include:
channels
thresholds
windows
num_recalculate
incremental
minima (key, value pairs)
and are specified through self.kwargs, set during instantiation.
NOTE: We do not explicitly call self._check_columns(datachunk) and instead assume the user has already done this when constructing a dataset.
- class idq.classifiers.ovl.DOVLfap(*args, **kwargs)[source]¶
an extension of DOVL that estimates the FAP in a slightly different way. DOVL uses an approximation of the deadtime (comparable to what OVL does)
fap=(num_vetod_gch+num_vetod_cln)/(num_gch+num_cln)
DOVLfap uses an approximation close to the FAP
fap=num_vetod_cln/num_clean
while the FAP might appear more correct, the deadtime is likely to be more numerically stable no infinities and thresholds on poisson_signif may prune over-trained configureations
this is controlled based on a conditional within _compute_dovl_configuration_performance
- class idq.classifiers.ovl.OVL(*args, **kwargs)[source]¶
a wrapper for the Ordered Veto List (OVL) algorithm published in Essick et al, CQG 30, 15 (2013) (DOI: 10.1088/0264-9381/30/15/155010) This algorithm estimates False Alarm Probability based on measures of the deadtime associated with segments generated around auxiliary events.
WRITE ME set this up so it takes in a path to an output directory and then writes the Vetolist, data objects into that directory with appropriate names (extract start, end from data objects)
describe inheritence for the extra attributes not declared within SupervisedClassifier _allowed_metrics _default_incremental _default_minima _gammln_cof _gammln_stp and the associated methods _recalculate redundancies _check_columns _gcf _gammln _gserln _gammpln
- class idq.classifiers.ovl.Vetolist(start, end, segs=None, model_id=None, generate_id=False, channels=None, thresholds=None, windows=None, significance=None, **kwargs)[source]¶
a representation of the OVL and DOVL vetolist this is stored as the internal model within OVL and DOVL
- WRITE ME
- decribe protections of attributes and their meanings
_default_thresholds _default_windows _table_dtypes _repr_head _repr_body
- dump(path, nickname=None, **kwargs)[source]¶
write this object into path should contain all info, not just the configuration order Uses hdf5 format, and asserts that the path must end in “.hdf5”
we can add attributes to this file via the kwargs specified in this function’s signature
WARNING: we may want to add default attrs to the hdf5 file, like the code’s version and the git repo location, etc. we also may want to add information about how this file was produced, but that may be optional and can be specified via kwargs
NOTE: we’ll likely use an instance of HDF5Reporter rather than this method
- feature_importance()[source]¶
should essentially be an ordered list of veto configurations although this is really a structured numpy array
- feature_importance_figure(dataset, start, end, t0, time, significance, colormap='copper_r', nonerow=True, nonecolor=None, verbose=False, **kwargs)[source]¶
generate and return a figure demonstrating the feature importance based on the data within dataset; should return a figure object.
- feature_importance_table(time, significance, dataset=None, map2str=True, **kwargs)[source]¶
should return (columns, data) compatible with the DQR’s json.format_table (see use in idq/reports.py
- load(path, nickname=None, verbose=True)[source]¶
load the data from file into memory will load all info, not just hte configuration order Uses hdf5 format, and asserts that the path must end in “.hdf5”
if verbose: print attrs from dataset
- NOTE:
we’ll likely use an instance of HDF5Reporter rather than this method
- metric2rank(value, metric='eff_fap', **kwargs)[source]¶
map our metric value into a rank we define a somewhat ad-hoc monotonic function with a separate scale for each metric
- nonredundant_segments(dataset, time, significance)[source]¶
Checks which configurations are redundant and then returns the indices of the non-redundant configurations, and their segments
- prune(minimum, metric='eff_fap', **kwargs)[source]¶
remove all configurations with metric<minimum
WARNING:
this will cause data to be forgotten in an unrecoverable way!
only use this if you are certain you know what you’re doing!
- redundancies(dataset, time, significance, verbose=False, Verbose=False)[source]¶
computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.
we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations
returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table
- reorder(metric='eff_fap', metric_weights=None, **kwargs)[source]¶
re-order the vetolist according to this metric we always sort so that bigger numbers show up first
- save_as_data_quality_flags(out_path, dataset, time, significance, remove_redundant=True)[source]¶
Saves the veto list as a set of data quality flags in an hdf5 file
- segments(dataset, time, significance, **kwargs)[source]¶
generate veto segments for all configurations in the vetolist returns a list of segment lists in the order corresponding to the configurations within self.table
- timeseries(dataset, time, significance, metric='eff_fap', dt=0.00390625, segs=None, **kwargs)[source]¶
generate timeseries for each segment in dataset.segments separately should be useful wihtin OVL.evaluate, vetolist2segments, etc
iterates from the end of the table to the front, setting the values of the timeseries to the metric of the corresponding configuration at each step if more than one configuration is active at this time, the final timeseries will have the value corresponding to the configuration that’s closer to the front of the list
-> this makes sense if you assume the configurations are ordered by decending metric, which should be the case
we extract the ‘rank’ column, which is set via calls to self.reorder.
NOTE:
monotonic decrease in metric as you move down the table is not
strictly guaranteed and will depend on whether the list has converged. * it is also possible that the list will never converge because of cycles (2 configurations flip back and forth forever with additional iterations) This may be a weakness of the algorithm!