Data Manipulation¶

A collection of useful functions for manipulating trajectory data and dynamical basis set objects.

@author: Erik

pyedgar.data_manipulation.delay_embed(traj_data, n_embed, lag=1, verbosity=0)[source]¶

Performs delay embedding on the trajectory data. Takes in trajectory data of format types, and returns the delay embedded data in the same type.

Parameters:

traj_data (list of arrays OR tuple of two arrays OR single numpy array) – Dynamical data on which to perform the delay embedding. This can be of multiple types, and the type dictates the format of the data. Specifically, it can be either a list of trajectories, the internal flattened format, or a single trajectory in the form of an array.
n_embed (int) – The number of delay embeddings to perform.
lag (int, optional) – The number of timesteps to look back in time for each delay. Default is 1.
verbosity (int, optional) – The level of status messages that are output. Default is 0 (no messages).

Returns:

embedded_data (list of arrays OR tuple of two arrays OR single numpy array) – Dynamical data with delay embedding performed, of the same type as the trajectory data.

pyedgar.data_manipulation.flat_to_tlist(traj_2d, traj_edges)[source]¶

Takes a flattened trajectory with stop and start points and reformats it into a list of separate trajectories.

Parameters:	traj2D (2D numpy array) – Numpy array containing the flattened trajectory information. traj_edges (1D numpy array) – Numpy array where each element is the start of each trajectory: the n’th trajectory runs from traj_edges[n] to traj_edges[n+1]
Returns:	trajs (list of array-likes) – List where each element n is a array-like object of shape N_n x d, where N_n is the number of data points in that trajectory and d is the number of coordinates for each datapoint.

pyedgar.data_manipulation.get_initial_final_split(traj_edges, lag=1)[source]¶

Returns the incides of the points in the flat trajectory of the initial and final sample points. In this context, initial means the first N-lag points, and final means the last N-lag points.

Parameters:	lag (int, optional) – Number of timepoints in the future to look into the future for the transfer operator. Default is 1.
Returns:	t_0_indices (1D numpy array) – Indices in the flattened trajectory data of all the points at the initial times. t_0_indices (1D numpy array) – Indices in the flattened trajectory data of all the points at the final times.

pyedgar.data_manipulation.lift_function(function, n_embed, lag=1)[source]¶: Lift a function into the delay-embedded space.

pyedgar.data_manipulation.tlist_to_flat(trajs)[source]¶

Flattens a list of two dimensional trajectories into a single two dimensional datastructure, and returns it along with a list of tuples giving the locations of each trajectory.

Parameters:	trajs (list of array-likes) – List where each element n is a array-like object of shape N_n x d, where N_n is the number of data points in that trajectory and d is the number of coordinates for each datapoint.
Returns:	traj2D (2D numpy array) – Numpy array containing the flattened trajectory information. traj_edges (1D numpy array) – Numpy array where each element is the start of each trajectory: the n’th trajectory runs from traj_edges[n] to traj_edges[n+1]