.. _classifiers:

Classifiers
####################################################################################################

We support 2 types of supervised classification schemes, the ``SupervisedClassifier`` and the ``IncrementalSupervisedClassifier``.
These objects are conceptually similar and generally follow the same API with the single exception in how they (re)train their internal models.

``SupervisedClassifier`` and its children train through a *batch* prescription; that is they re-train by starting from scratch and analyzing a large batch of data.
This means that if any historical information is to be retained through the re-trainig process, that data must be included in the set passed to the ``SupervisedClassifier.train`` call.

In contrast, ``IncrementalSupervisedClassifier`` and its children train incrementally. 
This means that the data passed through the call to ``IncrementalSupervisedClassifier.train`` is *added* to the previously used data in some sense. 
The incremental scheme should be computationally lighter, particularly when we retrain continuously, and better matches the streaming nature of the overall architecture.

We note that ``IncrementalSupervisedClassifier`` is a subclass of ``SupervisedClassifier`` and therefore re-uses a lot of the code define therein.
This also means the API is specified within ``SupervisedClassifier``, with a few exceptions (see :class:`idq.classifiers.OVL` for an example).

.. _classifiers-ovl:

OVL
----------------------------------------------------------------------------------------------------

.. _classifiers-ovl-classifiers:

Available Classifiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: idq.classifiers.ovl.OVL
    :inherited-members:

.. autoclass:: idq.classifiers.ovl.DOVL
    :inherited-members:

.. _classifiers-sklearn:

Scikit-learn Classifiers
----------------------------------------------------------------------------------------------------

iDQ supports many supervised machine learning classifiers by leveraging scikit-learn's API, a popular python-based machine learning library.

All of these are derived from the base class

* :class:`idq.classifiers.SupervisedSklearnClassifier`

In every implementation, parameters defined for a scikit-learn classifier can be passed directly within the configuration file or used via grid search-based hyperparameter tuning. For cross-validation, the full set of training samples is split into 3 folds to determine the best set of hyperparameters chosen from the grid.

The procedure in which to train sklearn-based classifiers is shown below:

.. graphviz::

   digraph sklearn_train {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="helvetica", fontsize=24];
      edge [ fontname="helvetica", fontsize=10 ];
      node [fontname="helvetica", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      Quiver [label="labeled quiver"];
      Classifier [label="classifier"];
      Model [label="trained model"];

      Quiver -> Classifier;
      Classifier -> Model;

   }

|

The classifier is composed of a scikit-learn `Pipeline` object and comprises of a preprocessing step to whiten incoming features, one or multiple steps that compose the actual classifier, and an optional rank scaler at the end of the pipeline which depends on whether the classifier allows probability estimates or not.

.. graphviz::

   digraph classifier_pipeline {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="helvetica", fontsize=24];
      edge [ fontname="helvetica", fontsize=10 ];
      node [fontname="helvetica", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      whitener;
      classifier;
      rankScaler [label="rank scaler (optional)"];

      whitener -> classifier;
      classifier -> rankScaler;

   }

|

.. _classifiers-sklearn-whiten:

Whitening
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two modes of whitening available:

  * `StandardScaler`: Does your usual whitening of features, standardizes features to have mean-zero, variance-one. Set as default, whitener = standard.
  * `RobustScaler`: Same as `StandardScaler`, but is robust to outliers. Can be set with whitener = robust.

A comparison of the different scalers in scikit-learn is provided `here. <http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py>`_.

.. _classifiers-sklearn-kwargs:

Keyword Arguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For all scikit-learn classifiers, the following keyword arguments are required:

* **flavor:** type of classifier to use
* **window:** window of features to consider surrounding a target channel feature, any channels that don't fall within this window are dropped and default values specified are used.
* **safe_channels_path:** the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.

In addition, the following optional keyword arguments can be passed in:

* **whitener:** type of whitening to use. Options are standard/robust.

.. _classifiers-sklearn-hyperparams:

Hyperparameter Tuning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. If using brute-force hyperparameter cross-validation:

  We need to specify the type to be ``grid`` and include a ``[classifier.search.params.hyperparam]`` section for each hyperparameter for use in cross-validation.

  Example:

  .. code:: bash

      [classifier.search]
      type = "grid"

      [classifier.search.params.hyperparam1]
      range = [low, high]
      type = dist_type
      discrete = is_discrete
      num_samples = num_points

  Available continuous distribution types are 'uniform', log_uniform', available discrete distribution types are 'uniform'.

  In addition, the following optional keyword arguments can be passed in:

  * **num_cv_proc:** number of processes to use for cross-validation
  * **num_cv_folds:** number of folds to use for cross-validation
  * **cv_scoring:** scoring function to use for cross-validation

2. If using randomized hyperparameter cross-validation:

  We need to specify the type to be ``random``, specify the number of samples to use (``num_samples``), and include a ``[classifier.search.params.hyperparam]`` section for each hyperparameter for use in cross-validation.

  .. code:: bash

      [classifier.search]
      type = "grid"

      [classifier.search.params.hyperparam1]
      range = [low, high]
      type = dist_type
      discrete = is_discrete

  Available continuous distribution types are 'uniform', log_uniform', available discrete distribution types are 'uniform'.

  In addition, the following optional keyword arguments can be passed in:

  * **num_cv_proc:** number of processes to use for cross-validation
  * **num_cv_folds:** number of folds to use for cross-validation
  * **cv_scoring:** scoring function to use for cross-validation

3. If using specific hyperparameter values:

  Add hyperparameters in the `[classifier.params]` section, one key per hyperparameter:

  * **hyperparam:** a value to be used for the specific hyperparameter. Repeat for multiple hyperparameters. Please consult the User's Guide below for a given classifier to determine which hyperparameters to use.

In all these cases, hyperparameters must be named in the form:

  **classifier__hyperparam**

If using a composite classifier, e.g. `ApproximateKernelSGD`, which has several components, you can pass in components to the various other components by name, and will be specified in the relevant docstrings for each composite classifier. For example, setting the kernel hyperparameters in `ApproximateKernelSGD` is done by setting kernel__hyperparam = value.

.. _classifiers-sklearn-classifiers:

Available Classifiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: idq.classifiers.sklearn.RandomForest

.. autoclass:: idq.classifiers.sklearn.SupportVectorMachine

.. autoclass:: idq.classifiers.sklearn.GradientBoostedTree

.. autoclass:: idq.classifiers.sklearn.NeuralNetwork

.. autoclass:: idq.classifiers.sklearn.NaiveBayes

.. autoclass:: idq.classifiers.sklearn.ApproximateKernelSGD

.. autoclass:: idq.classifiers.sklearn.ApproximateKernelSVM

.. _classifiers-sklearn-inc-classifiers:

Available Incremental Classifiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: idq.classifiers.sklearn.PassiveAggressive

.. autoclass:: idq.classifiers.sklearn.IncrementalNeuralNetwork

.. autoclass:: idq.classifiers.sklearn.IncrementalApproximateKernelSGD

.. autoclass:: idq.classifiers.sklearn.IncrementalNaiveBayes

.. _classifiers-sklearn-classifiers-custom:

Custom Classifiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to using one of the available classifiers, you can register any custom classifiers
that adhere to the scikit-learn Estimator API and use them within iDQ. In order to do this,
you need to create a file, ``$HOME/.config/idq/classifiers.py`` for which iDQ will know where
to look for custom classifiers. Afterwards, you
need to extend a class from :class:`idq.classifiers.SupervisedSklearnClassifier` and implement
the ``classifier()`` method that returns your classifier. Finally, you'll need to register your
classifier so that iDQ knows how to use it when specifying it in your configuration.

As an example, we can write a custom classifier in ``$HOME/.config/idq/classifiers.py`` as follows:

.. code:: python

    import sklearn.naive_bayes

    from idq import hookimpl
    from idq.classifiers.sklearn import SupervisedSklearnClassifier


    class MyClassifier(SupervisedSklearnClassifier):
        _flavor = "custom"

        def classifier(self):
            return [('classifier', sklearn.naive_bayes.GaussianNB())]


    @hookimpl
    def get_classifiers():
        return {
            "sklearn:custom": MyClassifier,
        }

Then to use it within iDQ, you can specify ``flavor = "sklearn:custom"`` in the classifier section
in your configuration file.

The ``classifier()`` method needs to return a list of tuples, where the first element corresponds to
the name of the Estimator or Transformer, whereas the second corresponds to the Estimator/Transformer
itself. These will be used to build up the scikit-learn ``Pipeline``. You can have multiple estimators
or transformers as part of the pipeline, but the last estimator needs to be called "classifier" and
implement either the ``predict_proba()`` (recommended) or ``decision_function()`` (okay but not ideal).

.. _classifiers-xgboost:

XGBoost Classifiers
----------------------------------------------------------------------------------------------------

XGBoost is an optimized gradient boosting library which implements classifiers that benefit from
gradient boosting, such as decision trees or linear models. An introduction to tree boosting can
be found `here <https://xgboost.readthedocs.io/en/latest/tutorials/model.html>`_.

There is a single implementation provided here, ``XGBTree``, which uses the scikit-learn classifier API
and so is perfectly compatible with everything already provided from the scikit-learn classifiers above,
including whitening and hyperparameter tuning.

A guide to all the hyperparameters available for this classifier is located `here <https://xgboost.readthedocs.io/en/latest/parameter.html>`_.

.. autoclass:: idq.classifiers.xgb.XGBTree

.. _classifiers-keras:

Keras Classifiers
----------------------------------------------------------------------------------------------------

In addition to the traditional machine learning classifiers, we also provide a ``NeuralNetwork`` classifier
which uses the subset of the Keras framework to provide deep learning architectures within iDQ.

The ``NeuralNetwork`` classifier allows you to build dense, locally-connected or dropout layers. It also allows
the use of regularizers on an individual-layer basis to apply penalties on layer parameters. There are
also options to set aside part of the training set to use for validation which also gives validation metrics
during each training epoch. Finally, one can balance the training set by applying weights to each training sample
by setting the ``balanced`` parameter to true in the classifier configuration.

For convenience, we expose a few variables to generate layers that scale based on the number of columns, features, or channels.

* **Ncol:** number of columns
* **Nchan:** number of channels
* **Ntotal:** number of features = ``Ncol`` * ``Nchan``

Here's an example of a ``NeuralNetwork`` configuration:

.. code:: bash

    [[classifier]]
    name = "deep"
    flavor = "keras:dnn"
    verbose = true

    window = 0.1
    random_state = 20

    # neural-network specific parameters
    balanced = true
    loss = "binary_crossentropy"
    optimizer = "adam"
    metrics = "accuracy"
    validation_split = 0.2

    epochs = 30
    batch_size = 32

    [[classifier.layer]]
    type = "Local1D"
    activation = "relu"
    filters = 1
    kernel_size = "Ncol"
    strides = "Ncol"

    [[classifier.layer]]
    type = "Dense"
    units = "0.1Nchan"
    activation = "relu"
    regularizer_type = "l1"
    regularizer_value = 0.01

    [[classifier.layer]]
    type = "Dropout"
    rate = 0.1

    [[classifier.layer]]
    type = "Dense"
    activation = "sigmoid"
    units = 1

This creates a neural network with four layers:

1. A locally-connected 1D layer with kernel and strides set to ``Ncol``, with a relu activation function.
2. A dense layer with Nchannels/ 10 nodes, a relu activation function and l1 regularization with a penalty of 0.01
3. A dropout layer with a dropout rate of 0.1
4. A single-node layer with sigmoid activation (needed for 2-class classification)

It also includes a ``validation_split`` of 0.2, which sets aside 20% of the training set to be used for validation, and gives the training accuracy during each epoch. Since ``balanced`` is set to True, class weights are applied to each sample to balance out the training set.

You can add ``l1``, ``l2``, or ``l1_l2`` regularization with a penalty to either ``Dense`` or ``Local1D`` layers by adding in two extra columns at the end, as seen in the second layer.

.. _classifiers-keras-kwargs:

Keyword Arguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For the Keras classifiers, the following keyword arguments are required:

* one of more ``[classifier.layer]`` with layer configurations.

* **flavor:** type of classifier to use
* **window:** window of features to consider surrounding a target channel feature, any channels that don't fall within this window are dropped and default values specified are used.
* **safe_channels_path:** the path that contains the channel list of safe auxiliary channels to consider in feature data. Only the channels here will be used in classifiers.

* **loss:** objective function to use, see `losses guide <https://keras.io/losses/>`_
* **optimizer:** optimizer to use, see `optimizer guide <https://keras.io/optimizers/>`_
* **metrics:** metric to be evaluated by the model during training and validation, can pass in a single string or a list of values.

In addition, the following optional keyword arguments can be passed in:

* **random_state:** set a random seed for reproducibility
* **validation_split:** set aside a fraction of the training set to be used for validation, gives validation metrics during each training epoch
* **balanced:** boolean, default is ``false``. sets whether class weights are applied to each sample to balance out training sets
* **batch_size:** number of rows to train on at once
* **epochs:** number of epochs to train on

.. _classifiers-keras-classifiers:

Available Classifiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: idq.classifiers.keras.NeuralNetwork

.. _classifiers-torch:

PyTorch Classifiers
----------------------------------------------------------------------------------------------------

In addition to Keras, we also provide a mechanism to allow custom classifiers via PyTorch to provide
deep learning architectures within iDQ. While there are no built-in PyTorch classifiers currently
available, a similar custom classifier registration scheme as the scikit-learn classifiers are available
here. In order to do this, one needs to provide a ``torch.Module``, and optionally provide PyTorch-compatible optimizers and
criterion. The default optimizer is a ``torch.optim.SGD`` and the default criterion is ``torch.nn.NLLLoss``.

As an example, we can write a custom classifier in ``$HOME/.config/idq/classifiers.py`` as follows:

.. code:: python

    import torch
    from torch import nn
    import torch.nn.functional as F

    from idq import hookimpl
    from idq.classifiers.torch import SupervisedTorchClassifier


    class ClassifierModule(nn.Module):
        def __init__(self, num_features=100, dropout=0.5, **kwargs):
            super(ClassifierModule, self).__init__()

            input_dim = num_features
            hidden_dim = num_features // 10
            output_dim = 2  # 2-class classification

            self.dropout = nn.Dropout(dropout)

            self.hidden = nn.Linear(input_dim, hidden_dim)
            self.output = nn.Linear(hidden_dim, output_dim)

        def forward(self, X, **kwargs):
            X = F.relu(self.hidden(X))
            X = self.dropout(X)
            X = F.softmax(self.output(X), dim=-1)
            return X


    class SimpleNet(SupervisedTorchClassifier):
        def module(self):
            return ClassifierModule


    @hookimpl
    def get_classifiers():
        return {
            "pytorch:custom": SimpleNet,
        }


Here, we defined a ``ClassifierModule`` which extends from ``torch.nn.Module``. This defines the PyTorch neural network which will be used for training and evaluation. There are a few helper keyword arguments which are passed into the module by default from iDQ. This includes ``num_features``, ``num_columns``, and ``num_channels`` which helps out to define the input dimensionality as well as any specialized layers which may group up features by column and/or channel.

In order to register your custom classifier, you need to create a class that extends from ``SupervisedTorchClassifier`` and define a ``module()`` method which returns your module. Optionally, you can pass in your optimizer via ``optimizer()`` and a criterion via ``criterion()``. Finally, you'll need to register your class via ``get_classifiers()`` above.

Finally, to configure the classifier, you can provide configuration similar to other scikit-learn classifiers within the classifier section. To configure general parameters, you can pass them via ``classifier__param``, e.g. ``classifier__max_epochs`` for max epochs. To configure the module, optimizer or criterion, you'll need to add that key word to your parameter, e.g. ``classifier__module__dropout`` to configure the module's dropout parameter.

For more details on classifier configuration, see `this <https://skorch.readthedocs.io/en/stable/user/neuralnet.html>`_. All PyTorch support is provided through the skorch library which allows us to wrap PyTorch classifiers using the scikit-learn estimator API, allowing one to do pre-processing and cross-validation in the same way as the scikit-learn classifiers above.