Introduction

API

Geometric SMOTE over-sampler follows the imbalanced-learn API using the base over-sampler functionality. More specifically:

It implements a fit method to learn from data:

oversampler = object.fit(data, targets)

it implements a fit_resample method to resample data sets:

data_resampled, targets_resampled = object.fit_resample(data, targets)

Geometric SMOTE over-sampler accepts the following inputs:

  • data: array-like (2-D list, pandas.Dataframe, numpy.array) or sparse matrices;
  • targets: array-like (1-D list, pandas.Series, numpy.array).

Imbalanced learning problem

Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. For a visual representation, the reader is referred to imbalanced-learn.

Data generation mechanism

SMOTE algorithm, as well as any other over-sampling method based on the SMOTE mechanism, generates synthetic samples along line segments that join minority class instances. Geometric SMOTE (G-SMOTE) is an enhancement of the SMOTE data generation mechanism. G-SMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hyper-sphere, G-SMOTE allows its deformation to a hyper-spheroid.