Classifiers

We support 2 types of supervised classification schemes, the SupervisedClassifier and the IncrementalSupervisedClassifier. These objects are conceptually similar and generally follow the same API with the single exception in how they (re)train their internal models.

SupervisedClassifier and its children train through a batch prescription; that is they re-train by starting from scratch and analyzing a large batch of data. This means that if any historical information is to be retained through the re-trainig process, that data must be included in the set passed to the SupervisedClassifier.train call.

In contrast, IncrementalSupervisedClassifier and its children train incrementally. This means that the data passed through the call to IncrementalSupervisedClassifier.train is added to the previously used data in some sense. The incremental scheme should be computationally lighter, particularly when we retrain continuously, and better matches the streaming nature of the overall architecture.

We note that IncrementalSupervisedClassifier is a subclass of SupervisedClassifier and therefore re-uses a lot of the code define therein. This also means the API is specified within SupervisedClassifier, with a few exceptions (see idq.classifiers.OVL for an example).

OVL

Available Classifiers

class idq.classifiers.ovl.OVL(*args, **kwargs)[source]

a wrapper for the Ordered Veto List (OVL) algorithm published in Essick et al, CQG 30, 15 (2013) (DOI: 10.1088/0264-9381/30/15/155010) This algorithm estimates False Alarm Probability based on measures of the deadtime associated with segments generated around auxiliary events.

WRITE ME set this up so it takes in a path to an output directory and then writes the Vetolist, data objects into that directory with appropriate names (extract start, end from data objects)

describe inheritence for the extra attributes not declared within SupervisedClassifier _allowed_metrics _default_incremental _default_minima _gammln_cof _gammln_stp and the associated methods _recalculate redundancies _check_columns _gcf _gammln _gserln _gammpln

calibrate(dataset, bounded=True, **kwargs)

calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map

evaluate(dataset)

sets the ranks for these feature vectors modifies the objects in place!

feature_importance()

delegates to Vetolist.feature_importance

feature_importance_figure(dataset, start, end, t0, **kwargs)

delegate to Vetolist.feature_importance_figure with a few extra things

feature_importance_table(dataset=None, **kwargs)

delegate to Vetolist.feature_importance_table

property flavor

this is a “private” variable because I don’t ever want a user to muck with this. I also want each child to have to declare this for themselves. this should be considered like a “type” but may be easier to deal with a string instead of a Type object

property nickname

this is a “private” variable because I don’t ever want a user to muck with this once it is set upon instantiation

redundancies(dataset)

computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.

we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations

returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table

timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)

delegates to Vetolist.timeseries returns ranks

train(dataset)[source]

Instantiates a Vetolist and trains using the data within a dataset.

Algorithmic parameters include:

  • channels

  • thresholds

  • windows

  • num_recalculate

  • incremental

  • minima (key, value pairs)

and are specified through self.kwargs, set during instantiation.

class idq.classifiers.ovl.DOVL(*args, **kwargs)[source]

Discrete OVL: a modified version of OVL that trains based on discrete samples to estimate the deadtime. We note that there is an extension of this that estimates the False Alarm Probability instead of the deadtime (DOVLfap), but still uses discrete samples. The actual implementation of the OVL algorithm itself is stored in a subclass (:class:idq.classifiers.OVL) because DOVL has standard training signatures while OVL does not

WRITE ME describe what we “overwrite” from OVL (although we don’t really have convenient access to these…) train (a trivial delegation, and we overwrite it simply because we want the signature to have a clearer variable name) _recalculate

calibrate(dataset, bounded=True, **kwargs)[source]

calibrate this algorithm based on the dataset of feature vectors. requires all FeatureVectors in the dataset to have been evaluated This should update self._calibration_map

evaluate(dataset)[source]

sets the ranks for these feature vectors modifies the objects in place!

feature_importance()[source]

delegates to Vetolist.feature_importance

feature_importance_figure(dataset, start, end, t0, **kwargs)[source]

delegate to Vetolist.feature_importance_figure with a few extra things

feature_importance_table(dataset=None, **kwargs)[source]

delegate to Vetolist.feature_importance_table

property flavor

this is a “private” variable because I don’t ever want a user to muck with this. I also want each child to have to declare this for themselves. this should be considered like a “type” but may be easier to deal with a string instead of a Type object

property nickname

this is a “private” variable because I don’t ever want a user to muck with this once it is set upon instantiation

redundancies(dataset)[source]

computes the intersection and overlap of vetosegments for each possible configuration. This should contain all information necessary to determine which channels are redundant.

we only return information based on the current model, which may have trained itself down to a subset of the total possible configurations

returns table, headers table is a matrix of the livetime of the intersections of veto segments from each configuration pair headers is a list of the (channel, threshold, window) tuples, with the same order as the columns in table

timeseries(info, dataset_factory, dt=0.00390625, segs=None, set_ok=None)[source]

delegates to Vetolist.timeseries returns ranks

train(dataset)[source]

Instantiates a Vetolist and trains using the data within a dataset.

Algorithmic parameters include:

  • channels

  • thresholds

  • windows

  • num_recalculate

  • incremental

  • minima (key, value pairs)

and are specified through self.kwargs, set during instantiation.

NOTE: We do not explicitly call self._check_columns(datachunk) and instead assume the user has already done this when constructing a dataset.

Scikit-learn Classifiers

iDQ supports many supervised machine learning classifiers by leveraging scikit-learn’s API, a popular python-based machine learning library.

All of these are derived from the base class

  • idq.classifiers.SupervisedSklearnClassifier

In every implementation, parameters defined for a scikit-learn classifier can be passed directly within the configuration file or used via grid search-based hyperparameter tuning. For cross-validation, the full set of training samples is split into 3 folds to determine the best set of hyperparameters chosen from the grid.

The procedure in which to train sklearn-based classifiers is shown below:

digraph sklearn_train { labeljust = "r"; label="" rankdir=LR; graph [fontname="helvetica", fontsize=24]; edge [ fontname="helvetica", fontsize=10 ]; node [fontname="helvetica", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; Quiver [label="labeled quiver"]; Classifier [label="classifier"]; Model [label="trained model"]; Quiver -> Classifier; Classifier -> Model; }

The classifier is composed of a scikit-learn Pipeline object and comprises of a preprocessing step to whiten incoming features, one or multiple steps that compose the actual classifier, and an optional rank scaler at the end of the pipeline which depends on whether the classifier allows probability estimates or not.

digraph classifier_pipeline { labeljust = "r"; label="" rankdir=LR; graph [fontname="helvetica", fontsize=24]; edge [ fontname="helvetica", fontsize=10 ]; node [fontname="helvetica", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; whitener; classifier; rankScaler [label="rank scaler (optional)"]; whitener -> classifier; classifier -> rankScaler; }

Whitening

There are two modes of whitening available:

  • StandardScaler: Does your usual whitening of features, standardizes features to have mean-zero, variance-one. Set as default, whitener = standard.

  • RobustScaler: Same as StandardScaler, but is robust to outliers. Can be set with whitener = robust.

A comparison of the different scalers in scikit-learn is provided here..

Keyword Arguments

For all scikit-learn classifiers, the following keyword arguments are required:

  • flavor: type of classifier to use

  • window: window of features to consider surrounding a target channel feature, any channels that don’t fall within this window are dropped and default values specified are used.

  • safe_channels_path: the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.

In addition, the following optional keyword arguments can be passed in:

  • whitener: type of whitening to use. Options are standard/robust.

Hyperparameter Tuning

  1. If using brute-force hyperparameter cross-validation:

We need to specify the type to be grid and include a [classifier.search.params.hyperparam] section for each hyperparameter for use in cross-validation.

Example:

[classifier.search]
type = "grid"

[classifier.search.params.hyperparam1]
range = [low, high]
type = dist_type
discrete = is_discrete
num_samples = num_points

Available continuous distribution types are ‘uniform’, log_uniform’, available discrete distribution types are ‘uniform’.

In addition, the following optional keyword arguments can be passed in:

  • num_cv_proc: number of processes to use for cross-validation

  • num_cv_folds: number of folds to use for cross-validation

  • cv_scoring: scoring function to use for cross-validation

  1. If using randomized hyperparameter cross-validation:

We need to specify the type to be random, specify the number of samples to use (num_samples), and include a [classifier.search.params.hyperparam] section for each hyperparameter for use in cross-validation.

[classifier.search]
type = "grid"

[classifier.search.params.hyperparam1]
range = [low, high]
type = dist_type
discrete = is_discrete

Available continuous distribution types are ‘uniform’, log_uniform’, available discrete distribution types are ‘uniform’.

In addition, the following optional keyword arguments can be passed in:

  • num_cv_proc: number of processes to use for cross-validation

  • num_cv_folds: number of folds to use for cross-validation

  • cv_scoring: scoring function to use for cross-validation

  1. If using specific hyperparameter values:

Add hyperparameters in the [classifier.params] section, one key per hyperparameter:

  • hyperparam: a value to be used for the specific hyperparameter. Repeat for multiple hyperparameters. Please consult the User’s Guide below for a given classifier to determine which hyperparameters to use.

In all these cases, hyperparameters must be named in the form:

classifier__hyperparam

If using a composite classifier, e.g. ApproximateKernelSGD, which has several components, you can pass in components to the various other components by name, and will be specified in the relevant docstrings for each composite classifier. For example, setting the kernel hyperparameters in ApproximateKernelSGD is done by setting kernel__hyperparam = value.

Available Classifiers

class idq.classifiers.sklearn.RandomForest(*args, **kwargs)[source]

A Random Forest of Decision Trees based on scikit-learn.

This is a supervised learning algorithm which uses a group of randomized decision trees (a forest) to perform classification.

  • `Random Forest User’s Guide

<http://scikit-learn.org/stable/modules/ensemble.html#forest>`_.

  • `Random Forest API

<http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier>`_.

class idq.classifiers.sklearn.SupportVectorMachine(*args, **kwargs)[source]

A support vector machine based on scikit-learn.

This is a supervised learning algorithm which uses a hyperplane to separate data points into two distinct classes. It also allows for kernel-based learning, so that if samples aren’t appropriate to be separated by a hyperplane, samples gets transformed via a kernel to a higher-dimensional space where samples can be separated in a linear fashion.

Various kernels are supported and can be passed in by passing in the kernel kwarg to the classifier configuration section.

NOTE: The scikit-learn classifier, SVC, is used to perform classification. Probability is set to true so that the mapping between rank to a calibrated probability can be performed more easily.

<http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC>`_.

class idq.classifiers.sklearn.GradientBoostedTree(*args, **kwargs)[source]

A Gradient Tree Boosting algorithm based on scikit-learn.

This is a supervised learning algorithm which produces an ensemble of decision trees, builds them up in a stage-wise fashion, and allows use of arbitrary differentiable loss functions.

  • `GBT User’s Guide

<http://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting>`_.

  • `GBT API

<http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier>`_.

class idq.classifiers.sklearn.NeuralNetwork(*args, **kwargs)[source]

A neural network (multi-layer perception) algorithm based on scikit-learn.

This is a supervised learning algorithm which produces a shallow neural network of multiple layers with a choice of activation function for the hidden layers. It trains itself using backpropagation.

  • `MultiLayer Perceptron User’s Guide

<http://scikit-learn.org/stable/modules/neural_networks_supervised.html#multi-layer-perceptron>`_.

  • `MultiLayer Perceptron API

<http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier>`_.

class idq.classifiers.sklearn.NaiveBayes(*args, **kwargs)[source]

A Naive Bayes classifier based on scikit-learn.

This is a supervised learning algorithm which assumes independence between all features, and uses Bayes’ theorem to determine the posterior probability that a set of features is in a given class. In this particular implementation, the likelihood of features are Gaussian in form.

  • `Gaussian Naive Bayes User’s Guide

<http://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes>`_.

  • `Gaussian Naive Bayes API

<http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB>`_.

class idq.classifiers.sklearn.ApproximateKernelSGD(*args, **kwargs)[source]

A Stochastic Gradient Descent classifier based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SDG classifier.

Guide for using the Stochastic Gradient Descent classifier:

  • `SGD User’s Guide

<http://scikit-learn.org/stable/modules/sgd.html#stochastic-gradient-descent>`_.

  • `SGD API

<http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn-linear-model-sgdclassifier>`_.

Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:

  • `Kernel Approximation User’s Guide

<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.

  • `Kernel Approximation API

<http://scikit-learn.org/stable/modules/generated/sklearn.kernel_approximation.Nystroem.html#sklearn.kernel_approximation.Nystroem>`_.

class idq.classifiers.sklearn.ApproximateKernelSVM(*args, **kwargs)[source]

A linear SVM based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SVM classifier.

Guide for using the linear SVM classifier:

  • `SVM User’s Guide

<http://scikit-learn.org/stable/modules/svm.html#support-vector-machines>`_.

  • `Linear SVM API

<http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>`_.

Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:

  • `Kernel Approximation User’s Guide

<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.

  • `Kernel Approximation API

<http://scikit-learn.org/stable/modules/generated/sklearn.kernel_approximation.Nystroem.html#sklearn.kernel_approximation.Nystroem>`_.

Available Incremental Classifiers

class idq.classifiers.sklearn.PassiveAggressive(*args, **kwargs)[source]

A Passive-Aggressive classifier based on scikit-learn. Trains in an incremental fashion.

Based off of <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf>`_.

  • `Passive-Aggressive User’s Guide

<http://scikit-learn.org/stable/modules/linear_model.html#passive-aggressive>`_.

  • `Passive-Aggressive API

<http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html#sklearn-linear-model-passiveaggressiveclassifier>`_.

class idq.classifiers.sklearn.IncrementalNeuralNetwork(*args, **kwargs)[source]

A Multilayer Perception (neural network) algorithm based on scikit-learn. Trains in an incremental fashion.

This is a supervised learning algorithm which produces a shallow neural network of multiple layers with a choice of activation function for the hidden layers. It trains itself using backpropagation.

  • `MultiLayer Perceptron User’s Guide

<http://scikit-learn.org/stable/modules/neural_networks_supervised.html#multi-layer-perceptron>`_.

  • `MultiLayer Perceptron API

<http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier>`_.

class idq.classifiers.sklearn.IncrementalApproximateKernelSGD(*args, **kwargs)[source]

A Stochastic Gradient Descent classifier based on scikit-learn, with a choice of an approximate kernel to transform nonlinear features into linear features suitable for the SDG classifier. Trains in an incremental fashion.

Guide for using the Stochastic Gradient Descent classifier:

  • `SGD User’s Guide

<http://scikit-learn.org/stable/modules/sgd.html#stochastic-gradient-descent>`_.

  • `SGD API

<http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn-linear-model-sgdclassifier>`_.

Guide for the approximate kernel algorithm (using the Nystroem method), types of kernels and appropriate parameters:

  • `Kernel Approximation User’s Guide

<http://scikit-learn.org/stable/modules/kernel_approximation.html#kernel-approximation>`_.

  • `Kernel Approximation API

<http://scikit-learn.org/stable/modules/generated/sklearn.kernel_approximation.Nystroem.html#sklearn.kernel_approximation.Nystroem>`_.

class idq.classifiers.sklearn.IncrementalNaiveBayes(*args, **kwargs)[source]

A Naive Bayes classifier based on scikit-learn. Trains in an incremental fashion.

This is a supervised learning algorithm which assumes independence between all features, and uses Bayes’ theorem to determine the posterior probability that a set of features is in a given class. In this particular implementation, the likelihood of features are Gaussian in form.

  • `Gaussian Naive Bayes User’s Guide

<http://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes>`_.

  • `Gaussian Naive Bayes API

<http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB>`_.

Custom Classifiers

In addition to using one of the available classifiers, you can register any custom classifiers that adhere to the scikit-learn Estimator API and use them within iDQ. In order to do this, you need to create a file, $HOME/.config/idq/classifiers.py for which iDQ will know where to look for custom classifiers. Afterwards, you need to extend a class from idq.classifiers.SupervisedSklearnClassifier and implement the classifier() method that returns your classifier. Finally, you’ll need to register your classifier so that iDQ knows how to use it when specifying it in your configuration.

As an example, we can write a custom classifier in $HOME/.config/idq/classifiers.py as follows:

import sklearn.naive_bayes

from idq import hookimpl
from idq.classifiers.sklearn import SupervisedSklearnClassifier


class MyClassifier(SupervisedSklearnClassifier):
    _flavor = "custom"

    def classifier(self):
        return [('classifier', sklearn.naive_bayes.GaussianNB())]


@hookimpl
def get_classifiers():
    return {
        "sklearn:custom": MyClassifier,
    }

Then to use it within iDQ, you can specify flavor = "sklearn:custom" in the classifier section in your configuration file.

The classifier() method needs to return a list of tuples, where the first element corresponds to the name of the Estimator or Transformer, whereas the second corresponds to the Estimator/Transformer itself. These will be used to build up the scikit-learn Pipeline. You can have multiple estimators or transformers as part of the pipeline, but the last estimator needs to be called “classifier” and implement either the predict_proba() (recommended) or decision_function() (okay but not ideal).

XGBoost Classifiers

XGBoost is an optimized gradient boosting library which implements classifiers that benefit from gradient boosting, such as decision trees or linear models. An introduction to tree boosting can be found here.

There is a single implementation provided here, XGBTree, which uses the scikit-learn classifier API and so is perfectly compatible with everything already provided from the scikit-learn classifiers above, including whitening and hyperparameter tuning.

A guide to all the hyperparameters available for this classifier is located here.

class idq.classifiers.xgb.XGBTree(*args, **kwargs)[source]

A gradient-boosted tree classifier based on xgboost.

  • `XGBoost Intro

<https://xgboost.readthedocs.io/en/latest/tutorials/model.html>`_.

  • `XGBoost Hyperparameter Guide

<https://xgboost.readthedocs.io/en/latest/parameter.html>`_.

  • `XGBoost API

<https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn>`_.

Keras Classifiers

In addition to the traditional machine learning classifiers, we also provide a NeuralNetwork classifier which uses the subset of the Keras framework to provide deep learning architectures within iDQ.

The NeuralNetwork classifier allows you to build dense, locally-connected or dropout layers. It also allows the use of regularizers on an individual-layer basis to apply penalties on layer parameters. There are also options to set aside part of the training set to use for validation which also gives validation metrics during each training epoch. Finally, one can balance the training set by applying weights to each training sample by setting the balanced parameter to true in the classifier configuration.

For convenience, we expose a few variables to generate layers that scale based on the number of columns, features, or channels.

  • Ncol: number of columns

  • Nchan: number of channels

  • Ntotal: number of features = Ncol * Nchan

Here’s an example of a NeuralNetwork configuration:

[[classifier]]
name = "deep"
flavor = "keras:dnn"
verbose = true

window = 0.1
random_state = 20

# neural-network specific parameters
balanced = true
loss = "binary_crossentropy"
optimizer = "adam"
metrics = "accuracy"
validation_split = 0.2

epochs = 30
batch_size = 32

[[classifier.layer]]
type = "Local1D"
activation = "relu"
filters = 1
kernel_size = "Ncol"
strides = "Ncol"

[[classifier.layer]]
type = "Dense"
units = "0.1Nchan"
activation = "relu"
regularizer_type = "l1"
regularizer_value = 0.01

[[classifier.layer]]
type = "Dropout"
rate = 0.1

[[classifier.layer]]
type = "Dense"
activation = "sigmoid"
units = 1

This creates a neural network with four layers:

  1. A locally-connected 1D layer with kernel and strides set to Ncol, with a relu activation function.

  2. A dense layer with Nchannels/ 10 nodes, a relu activation function and l1 regularization with a penalty of 0.01

  3. A dropout layer with a dropout rate of 0.1

  4. A single-node layer with sigmoid activation (needed for 2-class classification)

It also includes a validation_split of 0.2, which sets aside 20% of the training set to be used for validation, and gives the training accuracy during each epoch. Since balanced is set to True, class weights are applied to each sample to balance out the training set.

You can add l1, l2, or l1_l2 regularization with a penalty to either Dense or Local1D layers by adding in two extra columns at the end, as seen in the second layer.

Keyword Arguments

For the Keras classifiers, the following keyword arguments are required:

  • one of more [classifier.layer] with layer configurations.

  • flavor: type of classifier to use

  • window: window of features to consider surrounding a target channel feature, any channels that don’t fall within this window are dropped and default values specified are used.

  • safe_channels_path: the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.

  • loss: objective function to use, see losses guide

  • optimizer: optimizer to use, see optimizer guide

  • metrics: metric to be evaluated by the model during training and validation, can pass in a single string or a list of values.

In addition, the following optional keyword arguments can be passed in:

  • random_state: set a random seed for reproducibility

  • validation_split: set aside a fraction of the training set to be used for validation, gives validation metrics during each training epoch

  • balanced: boolean, default is false. sets whether class weights are applied to each sample to balance out training sets

  • batch_size: number of rows to train on at once

  • epochs: number of epochs to train on

Available Classifiers

PyTorch Classifiers

In addition to Keras, we also provide a mechanism to allow custom classifiers via PyTorch to provide deep learning architectures within iDQ. While there are no built-in PyTorch classifiers currently available, a similar custom classifier registration scheme as the scikit-learn classifiers are available here. In order to do this, one needs to provide a torch.Module, and optionally provide PyTorch-compatible optimizers and criterion. The default optimizer is a torch.optim.SGD and the default criterion is torch.nn.NLLLoss.

As an example, we can write a custom classifier in $HOME/.config/idq/classifiers.py as follows:

import torch
from torch import nn
import torch.nn.functional as F

from idq import hookimpl
from idq.classifiers.torch import SupervisedTorchClassifier


class ClassifierModule(nn.Module):
    def __init__(self, num_features=100, dropout=0.5, **kwargs):
        super(ClassifierModule, self).__init__()

        input_dim = num_features
        hidden_dim = num_features // 10
        output_dim = 2  # 2-class classification

        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X


class SimpleNet(SupervisedTorchClassifier):
    def module(self):
        return ClassifierModule


@hookimpl
def get_classifiers():
    return {
        "pytorch:custom": SimpleNet,
    }

Here, we defined a ClassifierModule which extends from torch.nn.Module. This defines the PyTorch neural network which will be used for training and evaluation. There are a few helper keyword arguments which are passed into the module by default from iDQ. This includes num_features, num_columns, and num_channels which helps out to define the input dimensionality as well as any specialized layers which may group up features by column and/or channel.

In order to register your custom classifier, you need to create a class that extends from SupervisedTorchClassifier and define a module() method which returns your module. Optionally, you can pass in your optimizer via optimizer() and a criterion via criterion(). Finally, you’ll need to register your class via get_classifiers() above.

Finally, to configure the classifier, you can provide configuration similar to other scikit-learn classifiers within the classifier section. To configure general parameters, you can pass them via classifier__param, e.g. classifier__max_epochs for max epochs. To configure the module, optimizer or criterion, you’ll need to add that key word to your parameter, e.g. classifier__module__dropout to configure the module’s dropout parameter.

For more details on classifier configuration, see this. All PyTorch support is provided through the skorch library which allows us to wrap PyTorch classifiers using the scikit-learn estimator API, allowing one to do pre-processing and cross-validation in the same way as the scikit-learn classifiers above.