Semantic segmentation

This module offers functionality to compute a semantic segmentation from a pattern-based embedding. It can be imported as follows:

>>> from patsemb import semantic_segmentation

Currently, only a probabilistic semantic segmentor is implemented. This segmentor uses the fit-predict_proba interface, because it predicts segment probabilities instead of segment labels.

class patsemb.semantic_segmentation.LogisticRegressionSegmentor(n_segments: List[int] | int = None, n_jobs: int = 1, **kwargs)[source]

Segments the pattern-based embedding using Logistic Regression [carpentier2024pattern].

First, a KMeans clustering model is fitted on the embedding, which will provide a discrete clustering (i.e., every observation in the time series will be assigned a discrete cluster label). The number of clusters K is decided based on the silhouette method. The discrete clustering give an initial indication of when the semantic segments occur.

Second, the discrete clustering is fed to a logistic regression model. This model learns to which segment each time point of the pattern-based embedding belongs. Because logistic regression is a probabilistic model, we retrieve the probabilities of a given observation belong to a semantic segment, thereby obtaining a probabilistic segmentation.

Parameters:

n_segments (int or list of int, default=[2, 3, 4, 5, 6, 7, 8, 9]) – The number of segments. If a list of integers is passed, a clustering will be made for each value, and the best clustering is selected using the silhouette score.
n_jobs (int, default=1) – The number of jobs to use for computing the multiple clusterings. Has no effect if n_segments is an integer.
**kwargs –
Additional arguments to be passed to either KMeans clutering or LogisticRegression (both using Sklearn implementation). This class automatically infers which parameters can be passed to either object using the inspect module. If a parameter is valid for both models (e.g., max_iter), then it will be passed to both. If an additional argument is given, which is not valid for KMeans nor for LogisticRegression, a TypeError will be thrown.

A TypeError will also be raised if n_clusters is passed to this object - even though it is valid for KMeans - because this parameter will be set based on n_segments.

Variables:

k_means_kwargs (dict) – The arguments to pass to SKlearn KMeans.
logistic_regression_kwargs (dict) – The arguments to pass to SKlearn LogisticRegression.
logistic_regression (LogisticRegression) – The fitted SKlearn Logistic Regression model.

References

[carpentier2024pattern]

Carpentier, Louis, Feremans, Len, Meert, Wannes, Verbeke, Mathias. “Pattern-based Time Series Semantic Segmentation with Gradual State Transitions.” Proceedings of the 2024 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics, 2024, doi: 10.1137/1.9781611978032.36.

fit(X: ndarray, y=None) → ProbabilisticSemanticSegmentor[source]

Fit this probabilistic semantic segmentor.

Parameters:

X (np.ndarray of shape (n_patterns, n_samples)) – The embedding matrix to use for fitting this probabilistic semantic segmentor.
y (array-like, default=None) – Ground-truth information.

Returns:

self – Returns the instance itself

Return type:

ProbabilisticSemanticSegmentor

predict_proba(X: ndarray) → ndarray[source]

Predict the probabilistic semantic segment probabilities, based on the given pattern-based embedding.

Parameters:: X (np.ndarray of shape (n_patterns, n_samples)) – The embedding matrix which should be transformed.
Returns:: segment_probabilities – The predicted semantic segment probabilities.
Return type:: np.ndarray of shape (n_samples, n_segments)

class patsemb.semantic_segmentation.ProbabilisticSemanticSegmentor[source]

Learn a probabilistic semantic segmentation over the pattern-based embedding. This enables to learn gradual transitions over the semantic segmentation as intervals where the probability of one semantic segment increases while the probability of another semantic segment decreases.

Because segment probabilities are predicted, this class uses the fit-predict_proba interface (including a fit_predict_proba method) to make predictions.