Installation¶
Install the package using pip:
# Uncomment to install
# !pip install traj-dist-rs
import traj_dist_rs
import numpy as np
print(f"traj_dist_rs version: {traj_dist_rs.__version__}")
traj_dist_rs version: 1.0.0rc2
Basic Concepts¶
The library provides implementations of several trajectory distance algorithms:
- SSPD: Symmetric Segment-Path Distance
- DTW: Dynamic Time Warping
- Hausdorff: Hausdorff Distance
- LCSS: Longest Common Subsequence
- EDR: Edit Distance on Real sequence
- ERP: Edit distance with Real Penalty
- Discret Frechet: Discrete Fréchet Distance
Each algorithm supports both Euclidean (Cartesian) and Spherical (Haversine) distance calculations.
Basic Usage¶
Let's start with basic usage of the traj-dist-rs package:
import traj_dist_rs
import numpy as np
# Define two trajectories as lists of [x, y] coordinates
traj1 = np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]])
traj2 = np.array([[0.1, 0.1], [1.1, 1.1], [2.1, 2.1]])
print("Trajectory 1:", traj1)
print("Trajectory 2:", traj2)
Trajectory 1: [[0. 0.] [1. 1.] [2. 2.]] Trajectory 2: [[0.1 0.1] [1.1 1.1] [2.1 2.1]]
Now we can calculate different trajectory distances:
# Calculate different trajectory distances
sspd_dist = traj_dist_rs.sspd(traj1, traj2, "euclidean")
dtw_dist = traj_dist_rs.dtw(traj1, traj2, "euclidean")
hausdorff_dist = traj_dist_rs.hausdorff(traj1, traj2, "euclidean")
print(f"SSPD distance: {sspd_dist:.6f}")
print(f"DTW distance: {dtw_dist.distance:.6f}")
print(f"Hausdorff distance: {hausdorff_dist:.6f}")
SSPD distance: 0.047140 DTW distance: 0.424264 Hausdorff distance: 0.141421
Using Spherical Distance¶
For geographic coordinates (latitude/longitude), use spherical distance:
import traj_dist_rs
import numpy as np
# Geographic coordinates as [latitude, longitude]
geo_traj1 = np.array([[40.7128, -74.0060], [40.7589, -73.9851], [40.7831, -73.9712]]) # NYC trajectory
geo_traj2 = np.array([[40.7228, -74.0160], [40.7689, -73.9951], [40.7931, -73.9812]]) # Similar NYC trajectory
print("Geographic Trajectory 1 (NYC):", geo_traj1)
print("Geographic Trajectory 2 (Similar to NYC):", geo_traj2)
Geographic Trajectory 1 (NYC): [[ 40.7128 -74.006 ] [ 40.7589 -73.9851] [ 40.7831 -73.9712]] Geographic Trajectory 2 (Similar to NYC): [[ 40.7228 -74.016 ] [ 40.7689 -73.9951] [ 40.7931 -73.9812]]
Calculate distances using spherical distance (Haversine formula):
# Calculate distances using spherical distance (Haversine formula)
sspd_dist = traj_dist_rs.sspd(geo_traj1, geo_traj2, "spherical")
dtw_dist = traj_dist_rs.dtw(geo_traj1, geo_traj2, "spherical")
print(f"SSPD spherical distance: {sspd_dist:.2f} meters")
print(f"DTW spherical distance: {dtw_dist.distance:.2f} meters")
SSPD spherical distance: 1836.07 meters DTW spherical distance: 3464.24 meters
Algorithms with Parameters¶
Some algorithms require additional parameters:
import traj_dist_rs
import numpy as np
traj1 = np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]])
traj2 = np.array([[0.5, 0.5], [1.5, 1.5], [2.5, 2.5], [3.5, 3.5]])
print("Using trajectories with parameters:")
print("Trajectory 1:", traj1)
print("Trajectory 2:", traj2)
Using trajectories with parameters: Trajectory 1: [[0. 0.] [1. 1.] [2. 2.] [3. 3.]] Trajectory 2: [[0.5 0.5] [1.5 1.5] [2.5 2.5] [3.5 3.5]]
# LCSS with epsilon parameter - threshold for point matching
lcss_dist = traj_dist_rs.lcss(traj1, traj2, "euclidean", 1.0)
# EDR with epsilon parameter - threshold for point matching
edr_dist = traj_dist_rs.edr(traj1, traj2, "euclidean", 1.0)
# ERP with gap point - reference point for penalties
erp_dist = traj_dist_rs.erp_compat_traj_dist(traj1, traj2, "euclidean", [0.0, 0.0])
print(f"LCSS distance (eps=1.0): {lcss_dist.distance:.6f}")
print(f"EDR distance (eps=1.0): {edr_dist.distance:.6f}")
print(f"ERP distance (g=[0,0]): {erp_dist.distance:.6f}")
LCSS distance (eps=1.0): 0.000000 EDR distance (eps=1.0): 0.000000 ERP distance (g=[0,0]): 2.828427
Comparing All Available Algorithms¶
Let's compare all available algorithms on the same trajectories:
# Create sample trajectories
traj1 = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]]
traj2 = [[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]
print("Comparing All Algorithms:")
print("Trajectory 1:", traj1)
print("Trajectory 2:", traj2)
Comparing All Algorithms: Trajectory 1: [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]] Trajectory 2: [[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]
# Calculate all available distances
distances = {}
# Standard algorithms without parameters
distances['SSPD'] = traj_dist_rs.sspd(traj1, traj2, "euclidean")
distances['DTW'] = traj_dist_rs.dtw(traj1, traj2, "euclidean")
distances['Hausdorff'] = traj_dist_rs.hausdorff(traj1, traj2, "euclidean")
distances['Discret Frechet'] = traj_dist_rs.discret_frechet(traj1, traj2, "euclidean")
# Algorithms with parameters
distances['LCSS'] = traj_dist_rs.lcss(traj1, traj2, "euclidean", 2.0)
distances['EDR'] = traj_dist_rs.edr(traj1, traj2, "euclidean", 2.0)
distances['ERP (compat)'] = traj_dist_rs.erp_compat_traj_dist(traj1, traj2, "euclidean", [0.0, 0.0])
distances['ERP (standard)'] = traj_dist_rs.erp_standard(traj1, traj2, "euclidean", [0.0, 0.0])
# Print results
for algorithm, distance in distances.items():
print(f"{algorithm}: {distance if isinstance(distance, float) else distance.distance:.6f}")
SSPD: 0.755922 DTW: 3.000000 Hausdorff: 1.000000 Discret Frechet: 1.000000 LCSS: 0.000000 EDR: 0.000000 ERP (compat): 3.000000 ERP (standard): 3.000000
Using DpResult (Distance with Optional Matrix)¶
Some algorithms return a DpResult object that contains both the distance value and optionally the alignment matrix:
import traj_dist_rs
import numpy as np
traj1 = np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]])
traj2 = np.array([[0.1, 0.1], [1.1, 1.1], [2.1, 2.1]])
# DTW with full matrix
result = traj_dist_rs.dtw(traj1, traj2, "euclidean", use_full_matrix=True)
print(f"DTW distance: {result.distance}")
print(f"Matrix shape: {result.matrix.shape if result.matrix is not None else 'None'}")
# DTW without matrix (faster)
result_no_matrix = traj_dist_rs.dtw(traj1, traj2, "euclidean", use_full_matrix=False)
print(f"DTW distance (no matrix): {result_no_matrix.distance}")
DTW distance: 0.4242640687119288 Matrix shape: (16,) DTW distance (no matrix): 0.4242640687119288
Visualizing Trajectories and Distances¶
Let's visualize the trajectories and compare algorithm results:
import matplotlib.pyplot as plt
# Create trajectories for visualization
traj1 = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]]
traj2 = [[0.0, 1.0], [1.0, 0.0], [2.0, 3.0], [3.0, 2.0]]
# Calculate distances
sspd_dist = traj_dist_rs.sspd(traj1, traj2, "euclidean")
dtw_dist = traj_dist_rs.dtw(traj1, traj2, "euclidean")
hausdorff_dist = traj_dist_rs.hausdorff(traj1, traj2, "euclidean")
# Plot trajectories
plt.figure(figsize=(10, 6))
traj1_x, traj1_y = zip(*traj1)
traj2_x, traj2_y = zip(*traj2)
plt.plot(traj1_x, traj1_y, 'b-o', label='Trajectory 1', linewidth=2, markersize=8)
plt.plot(traj2_x, traj2_y, 'r-s', label='Trajectory 2', linewidth=2, markersize=8)
plt.title(f'Trajectory Comparison\nSSPD: {sspd_dist:.3f}, DTW: {dtw_dist.distance:.3f}, Hausdorff: {hausdorff_dist:.3f}')
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
Batch Computation with pdist¶
Compute pairwise distances between multiple trajectories:
import traj_dist_rs
import numpy as np
# List of trajectories
trajectories = [
np.array([[0.0, 0.0], [1.0, 1.0]]),
np.array([[0.1, 0.1], [1.1, 1.1]]),
np.array([[0.2, 0.2], [1.2, 1.2]]),
np.array([[0.3, 0.3], [1.3, 1.3]]),
]
print(f"Number of trajectories: {len(trajectories)}")
# Create metric configuration using Metric class
metric = traj_dist_rs.Metric.sspd(type_d="euclidean")
# Compute pairwise distances (compressed format)
distances = traj_dist_rs.pdist(trajectories, metric=metric)
print(f"Number of pairwise distances: {len(distances)}")
print(f"Distances: {distances}")
Number of trajectories: 4 Number of pairwise distances: 6 Distances: [0.07071068 0.14142136 0.21213203 0.07071068 0.14142136 0.07071068]
Batch Computation with cdist¶
Compute cross-distance matrix between two trajectory collections:
import traj_dist_rs
import numpy as np
# Two collections of trajectories
trajectories_a = [
np.array([[0.0, 0.0], [1.0, 1.0]]),
np.array([[0.1, 0.1], [1.1, 1.1]]),
]
trajectories_b = [
np.array([[0.2, 0.2], [1.2, 1.2]]),
np.array([[0.3, 0.3], [1.3, 1.3]]),
np.array([[0.4, 0.4], [1.4, 1.4]]),
]
print(f"Collection A: {len(trajectories_a)} trajectories")
print(f"Collection B: {len(trajectories_b)} trajectories")
# Create metric configuration
metric = traj_dist_rs.Metric.dtw(type_d="euclidean")
# Compute cross-distance matrix
distance_matrix = traj_dist_rs.cdist(trajectories_a, trajectories_b, metric=metric)
print(f"Distance matrix shape: {distance_matrix.shape}")
print(f"Distance matrix:\n{distance_matrix}")
Collection A: 2 trajectories Collection B: 3 trajectories Distance matrix shape: (2, 3) Distance matrix: [[0.56568542 0.84852814 1.13137085] [0.28284271 0.56568542 0.84852814]]
Parallel Processing with pdist¶
Use parallel processing for faster batch computation:
import traj_dist_rs
import numpy as np
import time
# Create a larger dataset
trajectories = [np.random.rand(100, 2) for _ in range(50)]
print(f"Number of trajectories: {len(trajectories)}")
# Create metric configuration
metric = traj_dist_rs.Metric.sspd(type_d="euclidean")
# Sequential computation
start = time.time()
distances_seq = traj_dist_rs.pdist(trajectories, metric=metric, parallel=False)
seq_time = time.time() - start
# Parallel computation
start = time.time()
distances_par = traj_dist_rs.pdist(trajectories, metric=metric, parallel=True)
par_time = time.time() - start
print(f"Sequential time: {seq_time:.3f}s")
print(f"Parallel time: {par_time:.3f}s")
print(f"Speedup: {seq_time/par_time:.2f}x")
print(f"Results match: {distances_seq.shape == distances_par.shape}")
Number of trajectories: 50
Sequential time: 0.465s Parallel time: 0.161s Speedup: 2.88x Results match: True
Parallel Processing with cdist¶
Use parallel processing with cdist for cross-distance matrix computation:
import traj_dist_rs
import numpy as np
import time
# Create two larger trajectory collections
trajectories_a = [np.random.rand(100, 2) for _ in range(30)]
trajectories_b = [np.random.rand(100, 2) for _ in range(30)]
print(f"Collection A: {len(trajectories_a)} trajectories")
print(f"Collection B: {len(trajectories_b)} trajectories")
# Create metric configuration
metric = traj_dist_rs.Metric.dtw(type_d="euclidean")
# Sequential computation
start = time.time()
matrix_seq = traj_dist_rs.cdist(trajectories_a, trajectories_b, metric=metric, parallel=False)
seq_time = time.time() - start
# Parallel computation
start = time.time()
matrix_par = traj_dist_rs.cdist(trajectories_a, trajectories_b, metric=metric, parallel=True)
par_time = time.time() - start
print(f"Sequential time: {seq_time:.3f}s")
print(f"Parallel time: {par_time:.3f}s")
print(f"Speedup: {seq_time/par_time:.2f}x")
print(f"Results match: {np.allclose(matrix_seq, matrix_par)}")
print(f"Matrix shape: {matrix_par.shape}")
Collection A: 30 trajectories Collection B: 30 trajectories
Sequential time: 0.046s Parallel time: 0.025s Speedup: 1.87x Results match: True Matrix shape: (30, 30)
Using Metric API with Different Algorithms¶
The Metric class provides type-safe configuration for all algorithms:
import traj_dist_rs
import numpy as np
traj1 = np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]])
traj2 = np.array([[0.1, 0.1], [1.1, 1.1], [2.1, 2.1]])
# Create metrics for different algorithms
metric_sspd = traj_dist_rs.Metric.sspd(type_d="euclidean")
metric_dtw = traj_dist_rs.Metric.dtw(type_d="euclidean")
metric_hausdorff = traj_dist_rs.Metric.hausdorff(type_d="euclidean")
metric_discret_frechet = traj_dist_rs.Metric.discret_frechet(type_d="euclidean")
# Algorithms with parameters
metric_lcss = traj_dist_rs.Metric.lcss(eps=0.5, type_d="euclidean")
metric_edr = traj_dist_rs.Metric.edr(eps=0.5, type_d="euclidean")
metric_erp = traj_dist_rs.Metric.erp(g=[0.0, 0.0], type_d="euclidean")
# Use metrics with batch computation
trajectories = [traj1, traj2]
distances_sspd = traj_dist_rs.pdist(trajectories, metric=metric_sspd)
distances_dtw = traj_dist_rs.pdist(trajectories, metric=metric_dtw)
distances_lcss = traj_dist_rs.pdist(trajectories, metric=metric_lcss)
print(f"SSPD distance: {distances_sspd[0]:.4f}")
print(f"DTW distance: {distances_dtw[0]:.4f}")
print(f"LCSS distance: {distances_lcss[0]:.4f}")
SSPD distance: 0.0471 DTW distance: 0.4243 LCSS distance: 0.0000
Parallel Processing with joblib¶
Use joblib for parallel computation with custom functions:
import traj_dist_rs
import numpy as np
from joblib import Parallel, delayed
def compute_dtw_distance(traj1, traj2):
"""Compute DTW distance between two trajectories."""
result = traj_dist_rs.dtw(traj1, traj2, "euclidean", use_full_matrix=False)
return result.distance
# Create a set of trajectories
trajectories = [np.random.rand(50, 2) for _ in range(20)]
print(f"Number of trajectories: {len(trajectories)}")
# Compute pairwise distances in parallel
results = Parallel(n_jobs=-1)(
delayed(compute_dtw_distance)(t1, t2)
for i, t1 in enumerate(trajectories)
for j, t2 in enumerate(trajectories)
if i < j # Only compute unique pairs
)
print(f"Computed {len(results)} pairwise distances")
print(f"Sample distances: {results[:5]}")
Number of trajectories: 20
Computed 190 pairwise distances Sample distances: [18.097256649785862, 17.731029504036275, 18.023948804507484, 18.864287331837183, 17.88370430258936]
Performance Comparison¶
Let's compare the performance of different algorithms:
import time
# Create longer trajectories for performance testing
def create_random_trajectory(length, max_coord=10.0):
return [[np.random.uniform(0, max_coord), np.random.uniform(0, max_coord)]
for _ in range(length)]
# Create test trajectories
traj_short = create_random_trajectory(10)
traj_medium = create_random_trajectory(50)
print("Performance Comparison (short trajectory - 10 points):")
algorithms = [
('SSPD', lambda t1, t2: traj_dist_rs.sspd(t1, t2, "euclidean")),
('DTW', lambda t1, t2: traj_dist_rs.dtw(t1, t2, "euclidean")),
('Hausdorff', lambda t1, t2: traj_dist_rs.hausdorff(t1, t2, "euclidean")),
('Discret Frechet', lambda t1, t2: traj_dist_rs.discret_frechet(t1, t2, "euclidean")),
]
for name, func in algorithms:
start_time = time.time()
result = func(traj_short, traj_short)
end_time = time.time()
print(f"{name}: {end_time - start_time:.6f}s, distance: {result if isinstance(result, float) else result.distance:.6f}")
Performance Comparison (short trajectory - 10 points): SSPD: 0.000016s, distance: 0.000000 DTW: 0.000005s, distance: 0.000000 Hausdorff: 0.000012s, distance: 0.000000 Discret Frechet: 0.000004s, distance: 0.000000
Real-World Data Usage¶
Here's an example of how to use the library with real-world trajectory data:
# Example with GPS-like data
def create_gps_trajectory(start_lat, start_lon, num_points, lat_variation=0.01, lon_variation=0.01):
"""Create a GPS-like trajectory with some variation"""
trajectory = []
for i in range(num_points):
lat = start_lat + i * 0.001 + np.random.uniform(-lat_variation, lat_variation)
lon = start_lon + i * 0.001 + np.random.uniform(-lon_variation, lon_variation)
trajectory.append([lat, lon])
return trajectory
# Create two similar GPS trajectories
gps_traj1 = create_gps_trajectory(40.7128, -74.0060, 20) # NYC-like
gps_traj2 = create_gps_trajectory(40.7130, -74.0062, 20) # Similar, slightly different start
print(f"GPS Trajectory 1 (first 3 points): {gps_traj1[:3]}")
print(f"GPS Trajectory 2 (first 3 points): {gps_traj2[:3]}")
GPS Trajectory 1 (first 3 points): [[40.70380676115488, -73.99982349801198], [40.720821478494834, -74.00855980240388], [40.71709783650612, -74.01090520083754]] GPS Trajectory 2 (first 3 points): [[40.71425788255446, -74.01504438152837], [40.71500303822094, -74.00332512222462], [40.711561002063874, -73.99646818689277]]
# Calculate distances using spherical distance for GPS coordinates
sspd_dist = traj_dist_rs.sspd(gps_traj1, gps_traj2, "spherical")
dtw_dist = traj_dist_rs.dtw(gps_traj1, gps_traj2, "spherical")
hausdorff_dist = traj_dist_rs.hausdorff(gps_traj1, gps_traj2, "spherical")
print(f"SSPD spherical distance: {sspd_dist:.2f} meters")
print(f"DTW spherical distance: {dtw_dist.distance:.2f} meters")
print(f"Hausdorff spherical distance: {hausdorff_dist:.2f} meters")
SSPD spherical distance: 253.36 meters DTW spherical distance: 12871.86 meters Hausdorff spherical distance: 468.93 meters
Error Handling¶
The library provides appropriate error handling:
# Example of error handling
try:
# This will raise an error - invalid distance type
invalid_dist = traj_dist_rs.sspd([[0, 0], [1, 1]], [[1, 1], [2, 2]], "invalid_type")
except ValueError as e:
print(f"Caught expected error: {e}")
try:
# This will raise an error - invalid trajectory format
invalid_dist = traj_dist_rs.sspd([[0, 0, 0], [1, 1]], [[0, 0], [1, 1]], "euclidean")
except Exception as e:
print(f"Caught expected error: {e}")
Caught expected error: Invalid distance type 'invalid_type'. Expected 'euclidean' or 'spherical' Caught expected error: DataConvertionError: Failed to extract List[List[float]] into Vec<[f64; 2]>. Check list structure and element types.
Summary¶
This guide covered:
- Basic installation and import
- Different trajectory distance algorithms
- Euclidean vs. Spherical distance calculations
- Algorithms with parameters (LCSS, EDR, ERP)
- Comparing all available algorithms
- Visualizing trajectories and distances
- Batch computation with pdist and cdist
- Parallel processing with Rayon and joblib
- Performance considerations
- Real-world GPS data usage
- Error handling
The traj-dist-rs library provides high-performance trajectory distance calculations with both Python and Rust APIs, making it suitable for a wide range of trajectory analysis applications.
For more information on each algorithm, refer to the API documentation.