Pattern-based embedding
The pattern-based embedding modules offers to functionality to construct a
pattern-based embedding. It only requires a few lines of code to embed a
time series via the PatternBasedEmbedder.
First, you need to import the package as follows
>>> from patsemb import pattern_based_embedding
Next, let us initialize a random time series using numpy.
>>> import numpy as np
>>> time_series = np.random.rand(1000)
It now only takes two lines to embed the time series: one line to initialize the pattern-based embedder, and one line to call the fit_transform method!
>>> pattern_based_embedder = pattern_based_embedding.PatternBasedEmbedder()
>>> embedding = pattern_based_embedder.fit_transform(time_series)
- class patsemb.pattern_based_embedding.PatternBasedEmbedder(discretizer: Discretizer = None, pattern_miner: PatternMiner = None, *, window_sizes: List[int] | int = None, relative_support_embedding: bool = True, n_jobs: int | None = None)[source]
Construct pattern-based embeddings for a (collection of) time series. This process consists of two steps:
Mine sequential patterns in symbolic representations of the time series. A symbolic representation will be generated for each provided window size, and patterns will be mined in symbolic representation independently. These results in multi-resolution patterns.
Embed the time series values using the mined sequential patterns, which indicates at which positions in the time series a pattern occurs. The embedding will consist of one row for each mine pattern and one column for each observation in the time series. Therefore, each row corresponds to a feature and each column corresponds to a feature vector for a time series value.
- Parameters:
discretizer (Discretizer, default=SAXDiscretizer()) – The discretizer to convert time series into a symbolic representation of discrete symbols.
pattern_miner (PatternMiner, default=QCSP()) – The pattern miner used to mine sequential patterns in the discrete representation
window_sizes (List[int], default=None) – The window sizes to use for discretizing the time series. If
Noneis provided, then the window size of ´´discretizer´´ will be used.relative_support_embedding (bool, default=True) – Whether to construct an embedding using the relative support or a binary value indicating if the pattern occurs in a subsequence.
n_jobs (int, default=None) – The number of parallel jobs to use for mining the patterns within the time series and constructing the pattern-based embedding.
- Variables:
fitted_discretizers (Dict[int, Discretizer]) – The fitted discretizers, which can be used for computing a symbolic representation of a time series. The key of each item in the dictionary equals the window size used for discretization, while the value equals the fitted discretizer.
patterns (Dict[int, List[np.array]) – The mined sequential patterns. The key of each item in the dictionary equals the window size in which the patterns were mined, while the value equals the mined patterns.
References
- fit(dataset: ndarray | List[ndarray], y=None) PatternBasedEmbedder[source]
Fit this pattern-based embedder using a (collection of) time series. This is achieved by mining patterns in the discrete representation of the given time series. If multivariate time series are given, then each time series must have the same dimension!
- Parameters:
dataset (np.ndarray of shape (n_samples, n_attributes) or list of np.ndarray of shape (n_samples, n_attributes)) – The (collection of) time series to use for fitting this pattern-based embedder. If a collection of time series is given, then each collection may have a different length. For univariate time series, the given numpy arrays may be one-dimensional.
y (Ignored) – Is passed for fitting the discretizer, but will typically not be used and is only present here for API consistency by convention.
- Returns:
self – Returns the instance itself
- Return type:
- fit_transform(time_series: ~numpy.ndarray, y=None, *, return_embedding_per_attribute: bool = False) -> (<class 'numpy.ndarray'>, typing.Optional[typing.List[numpy.ndarray]])[source]
Fit this PatternBasedEmbedder using the given time series (i.e., mine the patterns in the discrete representation of the time series) and immediately transform the time series into a pattern-based embedding.
- Parameters:
time_series (np.ndarray of shape (n_samples, n_attributes)) – The multivariate time series to transform into a pattern-based embedding.
y (Ignored) – Is passed for fitting the discretizer, but will typically not be used and is only present here for API consistency by convention.
return_embedding_per_attribute (bool, default=False) – Whether to return the embedding matrix for each attribute independently.
- Returns:
pattern_based_embedding (np.ndarray of shape (n_patterns, n_samples)) – The pattern-based embedding, which has a column for each observation in the time series and a row for each mined pattern. Each column serves as a feature vector for the corresponding time stamp.
embedding_per_attribute (optional, list of length n_attributes with np.ndarray of shape (n_patterns, n_samples)) – The embedding matrix for each individual attribute. The matrix at position i correspond to the embedding for attribute i. This value is only returned if return_embedding_per_attribute=True.
- transform(time_series: ~numpy.ndarray, *, return_embedding_per_attribute: bool = False) -> (<class 'numpy.ndarray'>, typing.Optional[typing.List[numpy.ndarray]])[source]
Transform the given time series into a pattern-based embedding.
- Parameters:
time_series (np.ndarray of shape (n_samples, n_attributes)) – The time series to transform into a pattern-based embedding. A univariate time series may be one-dimensional.
return_embedding_per_attribute (bool, default=False) – Whether to return the embedding matrix for each attribute independently.
- Returns:
pattern_based_embedding (np.ndarray of shape (n_patterns, n_samples)) – The pattern-based embedding, which has a column for each observation in the time series and a row for each mined pattern. Each column serves as a feature vector for the corresponding time stamp.
embedding_per_attribute (optional, list of length n_attributes with np.ndarray of shape (n_patterns, n_samples)) – The embedding matrix for each individual attribute. The matrix at position i correspond to the embedding for attribute i. This value is only returned if return_embedding_per_attribute=True.