idq.classifiers¶

class idq.classifiers.ClassifierModel(start, end, segs=None, model_id=None, generate_id=False)[source]¶

a parent class that defines some basic attributes that all trained models must have to track data provenance each classifier will likely extend this class for their own purposes

feature_importance_figure(dataset, start, end, t0, **kwargs)[source]¶: generate and return a figure demonstrating the feature importance based on the data within dataset; should return a figure object.

feature_importance_table(dataset, **kwargs)[source]¶: should return (columns, data) compatible with the DQR’s json.format_table (see use in idq/reports.py)

property hash¶: the identifier used to locate this model.

class idq.classifiers.IncrementalSupervisedClassifier(nickname, rootdir='.', model_id=None, **kwargs)[source]¶

An extension of SupervisedClassifier that is meant to re-train itself incrementally instead of a series of batch jobs (starting from scratch) should be able to inherit much of the functionality from SupervisedClassifier

train(dataset)[source]¶: this should incrementally update the internal model. Otherwise, the classifier’s behavior should be the same as SupervisedClassifier

class idq.classifiers.SupervisedClassifier(nickname, rootdir='.', model_id=None, **kwargs)[source]¶

a parent class for classifiers. Children should overwrite methods as necessary. This classifier will support everything required syntactically for the pipeline to function, but will assign random ranks to all events.

calibrate(dataset, **kwargs)[source]¶: calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map

evaluate(dataset)[source]¶

This classifier assigns random ranks to all events independent of training data set. data should have the shape (Nsamples, Nfeatures) return an 1D array with length Nsamples representing the ranks assigned to each sample in data

WARNING: this needs to be highly efficient if we’re to use it to build time-series!

feature_importance()[source]¶: return a ranked list of important features within the trained model will raise an UntrainedException if we do not have a trained model stored internally

feature_importance_figure(*args, **kwargs)[source]¶: generate and return a figure demonstrating the feature importance based on the data within dataset factory; should return a figure object.

feature_importance_table(*args, **kwargs)[source]¶: should return (columns, data) compatible with the DQR’s json.format_table (see use in idq/reports.py

property flavor¶: this is a “private” variable because I don’t ever want a user to muck with this. I also want each child to have to declare this for themselves. this should be considered like a “type” but may be easier to deal with a string instead of a Type object

property nickname¶: this is a “private” variable because I don’t ever want a user to muck with this once it is set upon instantiation

timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)[source]¶: returns ranks

train(dataset)[source]¶: This classifier does NOT use data to make predictions. Instead, it supports this method for syntactic completeness. Note: this does NOT update self._model, and therefore self.feature_importance will continue to raise exceptions

idq.classifiers.get_classifiers()[source]¶

This hook is used to return SupervisedClassifiers in the form:: {“type[:specifier]”: Classifier}

where the specifier (optional) refers to a flavor of that particular classifier for more specificity

idq.classifiers.get_incremental_classifiers()[source]¶

This hook is used to return IncrementalSupervisedClassifiers in the form:: {“type[:specifier]”: Classifier}

where the specifier (optional) refers to a flavor of that particular classifier for more specificity