Discretization

This module provides the functionality to convert a time series into a symbolic representation. and can be imported as follows:

>>> from patsemb import discretization

The symbolic representation of a time series is a set of symbolic words, constructed using a fixed size alphabet. Because the symbolic words do not consist of continuous values (which is the case for time series), they are suitable for mining sequential patterns.

class patsemb.discretization.SAXDiscretizer(alphabet_size: int = 5, word_size: int = 8, window_size: int = 16, stride: int = 1, discretize_within: str = 'time_series')[source]
fit(dataset: array | List[array], y=None) SAXDiscretizer[source]

Fit this discretizer for the given (collection of) time series.

Parameters:
  • dataset (np.array of shape (n_samples,) or list of np.array of shape (n_samples,)) – The (collection of) time series to use for fitting this discretizer.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Returns the instance itself

Return type:

Discretizer

transform(time_series: array) ndarray[source]

Discretizer the given time series.

Parameters:

time_series (np.array of shape (n_samples,)) – The time series to discretize.

Returns:

symbolic_subsequences – The symbolic subsequences as a numpy array, with each row representing a different symbolic subsequence.

Return type:

np.ndarray of size (n_symbolic_sequences, length_symbolic_sequences)