Classifiers¶
We support 2 types of supervised classification schemes, the SupervisedClassifier
and the IncrementalSupervisedClassifier
.
These objects are conceptually similar and generally follow the same API with the single exception in how they (re)train their internal models.
SupervisedClassifier
and its children train through a batch prescription; that is they re-train by starting from scratch and analyzing a large batch of data.
This means that if any historical information is to be retained through the re-trainig process, that data must be included in the set passed to the SupervisedClassifier.train
call.
In contrast, IncrementalSupervisedClassifier
and its children train incrementally.
This means that the data passed through the call to IncrementalSupervisedClassifier.train
is added to the previously used data in some sense.
The incremental scheme should be computationally lighter, particularly when we retrain continuously, and better matches the streaming nature of the overall architecture.
We note that IncrementalSupervisedClassifier
is a subclass of SupervisedClassifier
and therefore re-uses a lot of the code define therein.
This also means the API is specified within SupervisedClassifier
, with a few exceptions (see idq.classifiers.OVL
for an example).
OVL¶
Available Classifiers¶
- class idq.classifiers.ovl.OVL(*args, **kwargs)[source]¶
a wrapper for the Ordered Veto List (OVL) algorithm published in Essick et al, CQG 30, 15 (2013) (DOI: 10.1088/0264-9381/30/15/155010) This algorithm estimates False Alarm Probability based on measures of the deadtime associated with segments generated around auxiliary events.
WRITE ME set this up so it takes in a path to an output directory and then writes the Vetolist, data objects into that directory with appropriate names (extract start, end from data objects)
describe inheritence for the extra attributes not declared within SupervisedClassifier _allowed_metrics _default_incremental _default_minima _gammln_cof _gammln_stp and the associated methods _recalculate redundancies _check_columns _gcf _gammln _gserln _gammpln
- calibrate(dataset, bounded=True, **kwargs)¶
calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map
- evaluate(dataset)¶
sets the ranks for these feature vectors modifies the objects in place!
- feature_importance()¶
delegates to Vetolist.feature_importance
- feature_importance_figure(dataset, start, end, t0, **kwargs)¶
delegate to Vetolist.feature_importance_figure with a few extra things
- feature_importance_table(dataset=None, **kwargs)¶
delegate to Vetolist.feature_importance_table
- property flavor¶
this is a “private” variable because I don’t ever want a user to muck with this. I also want each child to have to declare this for themselves. this should be considered like a “type” but may be easier to deal with a string instead of a Type object
- property nickname¶
this is a “private” variable because I don’t ever want a user to muck with this once it is set upon instantiation
- redundancies(dataset)¶
computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.
we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations
returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table
- timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)¶
delegates to Vetolist.timeseries returns ranks
- class idq.classifiers.ovl.DOVL(*args, **kwargs)[source]¶
Discrete OVL: a modified version of OVL that trains based on discrete samples to estimate the deadtime. We note that there is an extension of this that estimates the False Alarm Probability instead of the deadtime (DOVLfap), but still uses discrete samples. The actual implementation of the OVL algorithm itself is stored in a subclass (:class:idq.classifiers.OVL) because DOVL has standard training signatures while OVL does not
WRITE ME describe what we “overwrite” from OVL (although we don’t really have convenient access to these…) train (a trivial delegation, and we overwrite it simply because we want the signature to have a clearer variable name) _recalculate
- calibrate(dataset, bounded=True, **kwargs)[source]¶
calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map
- feature_importance_figure(dataset, start, end, t0, **kwargs)[source]¶
delegate to Vetolist.feature_importance_figure with a few extra things
- feature_importance_table(dataset=None, **kwargs)[source]¶
delegate to Vetolist.feature_importance_table
- property flavor¶
this is a “private” variable because I don’t ever want a user to muck with this. I also want each child to have to declare this for themselves. this should be considered like a “type” but may be easier to deal with a string instead of a Type object
- property nickname¶
this is a “private” variable because I don’t ever want a user to muck with this once it is set upon instantiation
- redundancies(dataset)[source]¶
computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.
we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations
returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table
- timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)[source]¶
delegates to Vetolist.timeseries returns ranks
- train(dataset)[source]¶
Instantiates a Vetolist and trains using the data within a dataset.
Algorithmic parameters include:
channels
thresholds
windows
num_recalculate
incremental
minima (key, value pairs)
and are specified through self.kwargs, set during instantiation.
NOTE: We do not explicitly call self._check_columns(datachunk) and instead assume the user has already done this when constructing a dataset.
Scikit-learn Classifiers¶
iDQ supports many supervised machine learning classifiers by leveraging scikit-learn’s API, a popular python-based machine learning library.
All of these are derived from the base class
idq.classifiers.SupervisedSklearnClassifier
In every implementation, parameters defined for a scikit-learn classifier can be passed directly within the configuration file or used via grid search-based hyperparameter tuning. For cross-validation, the full set of training samples is split into 3 folds to determine the best set of hyperparameters chosen from the grid.
The procedure in which to train sklearn-based classifiers is shown below:
digraph sklearn_train { labeljust = "r"; label="" rankdir=LR; graph [fontname="helvetica", fontsize=24]; edge [ fontname="helvetica", fontsize=10 ]; node [fontname="helvetica", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; Quiver [label="labeled quiver"]; Classifier [label="classifier"]; Model [label="trained model"]; Quiver -> Classifier; Classifier -> Model; }The classifier is composed of a scikit-learn Pipeline object and comprises of a preprocessing step to whiten incoming features, one or multiple steps that compose the actual classifier, and an optional rank scaler at the end of the pipeline which depends on whether the classifier allows probability estimates or not.
digraph classifier_pipeline { labeljust = "r"; label="" rankdir=LR; graph [fontname="helvetica", fontsize=24]; edge [ fontname="helvetica", fontsize=10 ]; node [fontname="helvetica", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; whitener; classifier; rankScaler [label="rank scaler (optional)"]; whitener -> classifier; classifier -> rankScaler; }Whitening¶
There are two modes of whitening available:
StandardScaler: Does your usual whitening of features, standardizes features to have mean-zero, variance-one. Set as default, whitener = standard.
RobustScaler: Same as StandardScaler, but is robust to outliers. Can be set with whitener = robust.
A comparison of the different scalers in scikit-learn is provided here..
Keyword Arguments¶
For all scikit-learn classifiers, the following keyword arguments are required:
flavor: type of classifier to use
window: window of features to consider surrounding a target channel feature, any channels that don’t fall within this window are dropped and default values specified are used.
safe_channels_path: the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.
In addition, the following optional keyword arguments can be passed in:
whitener: type of whitening to use. Options are standard/robust.
Hyperparameter Tuning¶
If using brute-force hyperparameter cross-validation:
We need to specify the type to be
grid
and include a[classifier.search.params.hyperparam]
section for each hyperparameter for use in cross-validation.Example:
[classifier.search] type = "grid" [classifier.search.params.hyperparam1] range = [low, high] type = dist_type discrete = is_discrete num_samples = num_pointsAvailable continuous distribution types are ‘uniform’, log_uniform’, available discrete distribution types are ‘uniform’.
In addition, the following optional keyword arguments can be passed in:
num_cv_proc: number of processes to use for cross-validation
num_cv_folds: number of folds to use for cross-validation
cv_scoring: scoring function to use for cross-validation
If using randomized hyperparameter cross-validation:
We need to specify the type to be
random
, specify the number of samples to use (num_samples
), and include a[classifier.search.params.hyperparam]
section for each hyperparameter for use in cross-validation.[classifier.search] type = "grid" [classifier.search.params.hyperparam1] range = [low, high] type = dist_type discrete = is_discreteAvailable continuous distribution types are ‘uniform’, log_uniform’, available discrete distribution types are ‘uniform’.
In addition, the following optional keyword arguments can be passed in:
num_cv_proc: number of processes to use for cross-validation
num_cv_folds: number of folds to use for cross-validation
cv_scoring: scoring function to use for cross-validation
If using specific hyperparameter values:
Add hyperparameters in the [classifier.params] section, one key per hyperparameter:
hyperparam: a value to be used for the specific hyperparameter. Repeat for multiple hyperparameters. Please consult the User’s Guide below for a given classifier to determine which hyperparameters to use.
In all these cases, hyperparameters must be named in the form:
classifier__hyperparam
If using a composite classifier, e.g. ApproximateKernelSGD, which has several components, you can pass in components to the various other components by name, and will be specified in the relevant docstrings for each composite classifier. For example, setting the kernel hyperparameters in ApproximateKernelSGD is done by setting kernel__hyperparam = value.
Available Classifiers¶
- class idq.classifiers.sklearn.RandomForest(*args, **kwargs)[source]¶
A Random Forest of Decision Trees based on scikit-learn.
This is a supervised learning algorithm which uses a group of randomized decision trees (a forest) to perform classification.
`Random Forest User’s Guide
<http://scikit-learn.org/stable/modules/ensemble.html#forest>`_.
`Random Forest API
- class idq.classifiers.sklearn.SupportVectorMachine(*args, **kwargs)[source]¶
A support vector machine based on scikit-learn.
This is a supervised learning algorithm which uses a hyperplane to separate data points into two distinct classes. It also allows for kernel-based learning, so that if samples aren’t appropriate to be separated by a hyperplane, samples gets transformed via a kernel to a higher-dimensional space where samples can be separated in a linear fashion.
Various kernels are supported and can be passed in by passing in the kernel kwarg to the classifier configuration section.
NOTE: The scikit-learn classifier, SVC, is used to perform classification. Probability is set to true so that the mapping between rank to a calibrated probability can be performed more easily.
`SVM API
<http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC>`_.
- class idq.classifiers.sklearn.GradientBoostedTree(*args, **kwargs)[source]¶
A Gradient Tree Boosting algorithm based on scikit-learn.
This is a supervised learning algorithm which produces an ensemble of decision trees, builds them up in a stage-wise fashion, and allows use of arbitrary differentiable loss functions.
`GBT User’s Guide
<http://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting>`_.
`GBT API
- class idq.classifiers.sklearn.NeuralNetwork(*args, **kwargs)[source]¶
A neural network (multi-layer perception) algorithm based on scikit-learn.
This is a supervised learning algorithm which produces a shallow neural network of multiple layers with a choice of activation function for the hidden layers. It trains itself using backpropagation.
`MultiLayer Perceptron User’s Guide
<http://scikit-learn.org/stable/modules/neural_networks_supervised.html#multi-layer-perceptron>`_.
`MultiLayer Perceptron API
- class idq.classifiers.sklearn.NaiveBayes(*args, **kwargs)[source]¶
A Naive Bayes classifier based on scikit-learn.
This is a supervised learning algorithm which assumes independence between all features, and uses Bayes’ theorem to determine the posterior probability that a set of features is in a given class. In this particular implementation, the likelihood of features are Gaussian in form.
`Gaussian Naive Bayes User’s Guide
<http://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes>`_.
`Gaussian Naive Bayes API
- class idq.classifiers.sklearn.ApproximateKernelSGD(*args, **kwargs)[source]¶
A Stochastic Gradient Descent classifier based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SDG classifier.
Guide for using the Stochastic Gradient Descent classifier:
`SGD User’s Guide
<http://scikit-learn.org/stable/modules/sgd.html#stochastic-gradient-descent>`_.
`SGD API
Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:
`Kernel Approximation User’s Guide
<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.
`Kernel Approximation API
- class idq.classifiers.sklearn.ApproximateKernelSVM(*args, **kwargs)[source]¶
A linear SVM based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SVM classifier.
Guide for using the linear SVM classifier:
`SVM User’s Guide
<http://scikit-learn.org/stable/modules/svm.html#support-vector-machines>`_.
`Linear SVM API
<http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>`_.
Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:
`Kernel Approximation User’s Guide
<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.
`Kernel Approximation API
Available Incremental Classifiers¶
- class idq.classifiers.sklearn.PassiveAggressive(*args, **kwargs)[source]¶
A Passive-Aggressive classifier based on scikit-learn. Trains in an incremental fashion.
Based off of <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf>`_.
`Passive-Aggressive User’s Guide
<http://scikit-learn.org/stable/modules/linear_model.html#passive-aggressive>`_.
`Passive-Aggressive API
- class idq.classifiers.sklearn.IncrementalNeuralNetwork(*args, **kwargs)[source]¶
A Multilayer Perception (neural network) algorithm based on scikit-learn. Trains in an incremental fashion.
This is a supervised learning algorithm which produces a shallow neural network of multiple layers with a choice of activation function for the hidden layers. It trains itself using backpropagation.
`MultiLayer Perceptron User’s Guide
<http://scikit-learn.org/stable/modules/neural_networks_supervised.html#multi-layer-perceptron>`_.
`MultiLayer Perceptron API
- class idq.classifiers.sklearn.IncrementalApproximateKernelSGD(*args, **kwargs)[source]¶
A Stochastic Gradient Descent classifier based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SDG classifier. Trains in an incremental fashion.
Guide for using the Stochastic Gradient Descent classifier:
`SGD User’s Guide
<http://scikit-learn.org/stable/modules/sgd.html#stochastic-gradient-descent>`_.
`SGD API
Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:
`Kernel Approximation User’s Guide
<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.
`Kernel Approximation API
- class idq.classifiers.sklearn.IncrementalNaiveBayes(*args, **kwargs)[source]¶
A Naive Bayes classifier based on scikit-learn. Trains in an incremental fashion.
This is a supervised learning algorithm which assumes independence between all features, and uses Bayes’ theorem to determine the posterior probability that a set of features is in a given class. In this particular implementation, the likelihood of features are Gaussian in form.
`Gaussian Naive Bayes User’s Guide
<http://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes>`_.
`Gaussian Naive Bayes API
Custom Classifiers¶
In addition to using one of the available classifiers, you can register any custom classifiers
that adhere to the scikit-learn Estimator API and use them within iDQ. In order to do this,
you need to create a file, $HOME/.config/idq/classifiers.py
for which iDQ will know where
to look for custom classifiers. Afterwards, you
need to extend a class from idq.classifiers.SupervisedSklearnClassifier
and implement
the classifier()
method that returns your classifier. Finally, you’ll need to register your
classifier so that iDQ knows how to use it when specifying it in your configuration.
As an example, we can write a custom classifier in $HOME/.config/idq/classifiers.py
as follows:
import sklearn.naive_bayes
from idq import hookimpl
from idq.classifiers.sklearn import SupervisedSklearnClassifier
class MyClassifier(SupervisedSklearnClassifier):
_flavor = "custom"
def classifier(self):
return [('classifier', sklearn.naive_bayes.GaussianNB())]
@hookimpl
def get_classifiers():
return {
"sklearn:custom": MyClassifier,
}
Then to use it within iDQ, you can specify flavor = "sklearn:custom"
in the classifier section
in your configuration file.
The classifier()
method needs to return a list of tuples, where the first element corresponds to
the name of the Estimator or Transformer, whereas the second corresponds to the Estimator/Transformer
itself. These will be used to build up the scikit-learn Pipeline
. You can have multiple estimators
or transformers as part of the pipeline, but the last estimator needs to be called “classifier” and
implement either the predict_proba()
(recommended) or decision_function()
(okay but not ideal).
XGBoost Classifiers¶
XGBoost is an optimized gradient boosting library which implements classifiers that benefit from gradient boosting, such as decision trees or linear models. An introduction to tree boosting can be found here.
There is a single implementation provided here, XGBTree
, which uses the scikit-learn classifier API
and so is perfectly compatible with everything already provided from the scikit-learn classifiers above,
including whitening and hyperparameter tuning.
A guide to all the hyperparameters available for this classifier is located here.
- class idq.classifiers.xgb.XGBTree(*args, **kwargs)[source]¶
A gradient-boosted tree classifier based on xgboost.
`XGBoost Intro
<https://xgboost.readthedocs.io/en/latest/tutorials/model.html>`_.
`XGBoost Hyperparameter Guide
<https://xgboost.readthedocs.io/en/latest/parameter.html>`_.
`XGBoost API
<https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn>`_.
Keras Classifiers¶
In addition to the traditional machine learning classifiers, we also provide a NeuralNetwork
classifier
which uses the subset of the Keras framework to provide deep learning architectures within iDQ.
The NeuralNetwork
classifier allows you to build dense, locally-connected or dropout layers. It also allows
the use of regularizers on an individual-layer basis to apply penalties on layer parameters. There are
also options to set aside part of the training set to use for validation which also gives validation metrics
during each training epoch. Finally, one can balance the training set by applying weights to each training sample
by setting the balanced
parameter to true in the classifier configuration.
For convenience, we expose a few variables to generate layers that scale based on the number of columns, features, or channels.
Ncol: number of columns
Nchan: number of channels
Ntotal: number of features =
Ncol
*Nchan
Here’s an example of a NeuralNetwork
configuration:
[[classifier]]
name = "deep"
flavor = "keras:dnn"
verbose = true
window = 0.1
random_state = 20
# neural-network specific parameters
balanced = true
loss = "binary_crossentropy"
optimizer = "adam"
metrics = "accuracy"
validation_split = 0.2
epochs = 30
batch_size = 32
[[classifier.layer]]
type = "Local1D"
activation = "relu"
filters = 1
kernel_size = "Ncol"
strides = "Ncol"
[[classifier.layer]]
type = "Dense"
units = "0.1Nchan"
activation = "relu"
regularizer_type = "l1"
regularizer_value = 0.01
[[classifier.layer]]
type = "Dropout"
rate = 0.1
[[classifier.layer]]
type = "Dense"
activation = "sigmoid"
units = 1
This creates a neural network with four layers:
A locally-connected 1D layer with kernel and strides set to
Ncol
, with a relu activation function.A dense layer with Nchannels/ 10 nodes, a relu activation function and l1 regularization with a penalty of 0.01
A dropout layer with a dropout rate of 0.1
A single-node layer with sigmoid activation (needed for 2-class classification)
It also includes a validation_split
of 0.2, which sets aside 20% of the training set to be used for validation, and gives the training accuracy during each epoch. Since balanced
is set to True, class weights are applied to each sample to balance out the training set.
You can add l1
, l2
, or l1_l2
regularization with a penalty to either Dense
or Local1D
layers by adding in two extra columns at the end, as seen in the second layer.
Keyword Arguments¶
For the Keras classifiers, the following keyword arguments are required:
one of more
[classifier.layer]
with layer configurations.flavor: type of classifier to use
window: window of features to consider surrounding a target channel feature, any channels that don’t fall within this window are dropped and default values specified are used.
safe_channels_path: the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.
loss: objective function to use, see losses guide
optimizer: optimizer to use, see optimizer guide
metrics: metric to be evaluated by the model during training and validation, can pass in a single string or a list of values.
In addition, the following optional keyword arguments can be passed in:
random_state: set a random seed for reproducibility
validation_split: set aside a fraction of the training set to be used for validation, gives validation metrics during each training epoch
balanced: boolean, default is
false
. sets whether class weights are applied to each sample to balance out training setsbatch_size: number of rows to train on at once
epochs: number of epochs to train on
Available Classifiers¶
PyTorch Classifiers¶
In addition to Keras, we also provide a mechanism to allow custom classifiers via PyTorch to provide
deep learning architectures within iDQ. While there are no built-in PyTorch classifiers currently
available, a similar custom classifier registration scheme as the scikit-learn classifiers are available
here. In order to do this, one needs to provide a torch.Module
, and optionally provide PyTorch-compatible optimizers and
criterion. The default optimizer is a torch.optim.SGD
and the default criterion is torch.nn.NLLLoss
.
As an example, we can write a custom classifier in $HOME/.config/idq/classifiers.py
as follows:
import torch
from torch import nn
import torch.nn.functional as F
from idq import hookimpl
from idq.classifiers.torch import SupervisedTorchClassifier
class ClassifierModule(nn.Module):
def __init__(self, num_features=100, dropout=0.5, **kwargs):
super(ClassifierModule, self).__init__()
input_dim = num_features
hidden_dim = num_features // 10
output_dim = 2 # 2-class classification
self.dropout = nn.Dropout(dropout)
self.hidden = nn.Linear(input_dim, hidden_dim)
self.output = nn.Linear(hidden_dim, output_dim)
def forward(self, X, **kwargs):
X = F.relu(self.hidden(X))
X = self.dropout(X)
X = F.softmax(self.output(X), dim=-1)
return X
class SimpleNet(SupervisedTorchClassifier):
def module(self):
return ClassifierModule
@hookimpl
def get_classifiers():
return {
"pytorch:custom": SimpleNet,
}
Here, we defined a ClassifierModule
which extends from torch.nn.Module
. This defines the PyTorch neural network which will be used for training and evaluation. There are a few helper keyword arguments which are passed into the module by default from iDQ. This includes num_features
, num_columns
, and num_channels
which helps out to define the input dimensionality as well as any specialized layers which may group up features by column and/or channel.
In order to register your custom classifier, you need to create a class that extends from SupervisedTorchClassifier
and define a module()
method which returns your module. Optionally, you can pass in your optimizer via optimizer()
and a criterion via criterion()
. Finally, you’ll need to register your class via get_classifiers()
above.
Finally, to configure the classifier, you can provide configuration similar to other scikit-learn classifiers within the classifier section. To configure general parameters, you can pass them via classifier__param
, e.g. classifier__max_epochs
for max epochs. To configure the module, optimizer or criterion, you’ll need to add that key word to your parameter, e.g. classifier__module__dropout
to configure the module’s dropout parameter.
For more details on classifier configuration, see this. All PyTorch support is provided through the skorch library which allows us to wrap PyTorch classifiers using the scikit-learn estimator API, allowing one to do pre-processing and cross-validation in the same way as the scikit-learn classifiers above.