Discretization

This module provides the functionality to convert a time series into a symbolic representation. and can be imported as follows:

>>> from patsemb import discretization

The symbolic representation of a time series is a set of symbolic words, constructed using a fixed size alphabet. Because the symbolic words do not consist of continuous values (which is the case for time series), they are suitable for mining sequential patterns.

class patsemb.discretization.SAXDiscretizer(alphabet_size: int = 5, word_size: int = 8, window_size: int = 16, stride: int = 1, discretize_within: str = 'time_series')[source]

fit(dataset: array | List[array], y=None) → SAXDiscretizer[source]

Fit this discretizer for the given (collection of) time series.

Parameters:

dataset (np.array of shape (n_samples,) or list of np.array of shape (n_samples,)) – The (collection of) time series to use for fitting this discretizer.
y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Returns the instance itself

Return type:

Discretizer

transform(time_series: array) → ndarray[source]

Discretizer the given time series.

Parameters:: time_series (np.array of shape (n_samples,)) – The time series to discretize.
Returns:: symbolic_subsequences – The symbolic subsequences as a numpy array, with each row representing a different symbolic subsequence.
Return type:: np.ndarray of size (n_symbolic_sequences, length_symbolic_sequences)