The distance profile is a crucial measure in time series data mining, used extensively for similarity search and nearest-neighbor search tasks. It involves calculating the distance between a query subsequence and every subsequence in a time series. This operation, while conceptually simple, forms the foundation for many advanced analytical tasks, including anomaly detection, motif discovery, and time series segmentation.
In our work, we employ the MASS (Mueenβs Algorithm for Similarity Search) algorithm to compute the distance profile. MASS is renowned for its computational efficiency and scalability, capable of processing large datasets rapidly. This efficiency is critical when dealing with real-world time series data, where the volume of data can be substantial.
The distance profile's versatility allows it to handle various types of queries, including weighted queries and those involving multidimensional data. By aggregating distance profiles across multiple dimensions, we can find the nearest neighbors in a multidimensional space, further enhancing the algorithm's applicability.
The following code is a simple implementation of the distance profile algorithm on a one-dimensional series.
import numpy as np def z_normalize(ts): """Z-normalize a time series.""" return (ts - np.mean(ts)) / np.std(ts) def sliding_window_view(arr, window_size): """Generate a sliding window view of the array.""" return np.lib.stride_tricks.sliding_window_view(arr, window_size) def distance_profile(query, ts): """Compute the distance profile of a query within a time series.""" query_len = len(query) ts_len = len(ts) # Z-normalize the query query = z_normalize(query) # Generate all subsequences of the time series subsequences = sliding_window_view(ts, query_len) # Z-normalize the subsequences subsequences = np.apply_along_axis(z_normalize, 1, subsequences) # Compute the distance profile distances = np.linalg.norm(subsequences - query, axis=1) return distances # Example time series and query time_series = np.array([1, 2, 3, 4, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4]) query = np.array([2, 3, 4]) # Compute the distance profile dist_profile = distance_profile(query, time_series) print("Distance Profile:", dist_profile)
This code is provided for illustration purposes. For a more efficient implementation, use matrixprofile or stumpy python packages.
This figure illustrates the process of analyzing a time series to identify the occurrence of a specific query subsequence using the distance profile. It comprises three subplots:
In our work, we developed a method to calculate the multi-dimensional matrix profile using Mueenβs Algorithm for Similarity Search (MASS). This method is designed to handle multi-dimensional time series data, which is common in many real-world applications where data is collected across multiple channels or features simultaneously.
def multi_dimensional_mass(self, query_subsequence, time_series) -> np.ndarray: """ Calculate the multi-dimensional matrix profile. Args: query_subsequence (np.ndarray): The query subsequence. time_series (np.ndarray): The time series. Returns: np.ndarray: The multi-dimensional matrix profile. """ for dim in range(time_series.shape[1]): if dim == 0: profile = stumpy.core.mass( query_subsequence[:, dim], time_series[:, dim] ) else: profile += stumpy.core.mass( query_subsequence[:, dim], time_series[:, dim] ) return profile
We implemented a robust methodology for generating predictions using the computed multi-dimensional matrix profiles. Once the matrix profiles were established, we utilized them to identify subsequences within the time series that closely matched the query subsequences. By leveraging the aggregated similarity measures across all dimensions, we were able to pinpoint the most similar patterns. This process involved comparing each subsequence within the time series to the query, calculating the z-normalized Euclidean distances, and subsequently ranking the similarities. The subsequences with the lowest distances were considered the best matches, thereby providing predictions about the presence and location of specific patterns within the time series. This approach not only enhanced the precision of our predictions but also ensured that the multi-dimensional nature of the data was comprehensively analyzed, leading to more insightful and actionable results.
The distance profile is essential in time series data mining, facilitating tasks like similarity search, anomaly detection, and motif discovery. It calculates the distance between a query subsequence and all other subsequences within a time series, forming the basis for advanced analytical tasks.
We utilized the MASS (Mueen's Algorithm for Similarity Search) for its efficiency and scalability, crucial for handling large real-world datasets. The process involves:
Z-normalization of the time series and query to manage scale variations.
Euclidean distance computation for each subsequence against the query.
Visualization through plots to graphically depict the comparison across the time series.
Our implementation supports both one-dimensional and multidimensional data, enhancing the analysis and prediction accuracy by quantifying similarities across time series data. This methodology allows for precise, actionable insights critical for advanced time series analysis in various applications.
There are no datasets linked
There are no datasets linked