Discretization
This module provides the functionality to convert a time series into a symbolic representation. and can be imported as follows:
>>> from patsemb import discretization
The symbolic representation of a time series is a set of symbolic words, constructed using a fixed size alphabet. Because the symbolic words do not consist of continuous values (which is the case for time series), they are suitable for mining sequential patterns.
- class patsemb.discretization.SAXDiscretizer(alphabet_size: int = 5, word_size: int = 8, window_size: int = 16, stride: int = 1, discretize_within: str = 'time_series')[source]
- fit(dataset: array | List[array], y=None) SAXDiscretizer[source]
Fit this discretizer for the given (collection of) time series.
- Parameters:
dataset (np.array of shape (n_samples,) or list of np.array of shape (n_samples,)) – The (collection of) time series to use for fitting this discretizer.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns:
self – Returns the instance itself
- Return type:
Discretizer
- transform(time_series: array) ndarray[source]
Discretizer the given time series.
- Parameters:
time_series (np.array of shape (n_samples,)) – The time series to discretize.
- Returns:
symbolic_subsequences – The symbolic subsequences as a numpy array, with each row representing a different symbolic subsequence.
- Return type:
np.ndarray of size (n_symbolic_sequences, length_symbolic_sequences)