Data Manipulation

A collection of useful functions for manipulating trajectory data and dynamical basis set objects.

@author: Erik

pyedgar.data_manipulation.delay_embed(traj_data, n_embed, lag=1, verbosity=0)[source]

Performs delay embedding on the trajectory data. Takes in trajectory data of format types, and returns the delay embedded data in the same type.

Parameters:
  • traj_data (list of arrays OR tuple of two arrays OR single numpy array) – Dynamical data on which to perform the delay embedding. This can be of multiple types, and the type dictates the format of the data. Specifically, it can be either a list of trajectories, the internal flattened format, or a single trajectory in the form of an array.
  • n_embed (int) – The number of delay embeddings to perform.
  • lag (int, optional) – The number of timesteps to look back in time for each delay. Default is 1.
  • verbosity (int, optional) – The level of status messages that are output. Default is 0 (no messages).
Returns:

embedded_data (list of arrays OR tuple of two arrays OR single numpy array) – Dynamical data with delay embedding performed, of the same type as the trajectory data.

pyedgar.data_manipulation.flat_to_tlist(traj_2d, traj_edges)[source]

Takes a flattened trajectory with stop and start points and reformats it into a list of separate trajectories.

Parameters:
  • traj2D (2D numpy array) – Numpy array containing the flattened trajectory information.
  • traj_edges (1D numpy array) – Numpy array where each element is the start of each trajectory: the n’th trajectory runs from traj_edges[n] to traj_edges[n+1]
Returns:

trajs (list of array-likes) – List where each element n is a array-like object of shape N_n x d, where N_n is the number of data points in that trajectory and d is the number of coordinates for each datapoint.

pyedgar.data_manipulation.get_initial_final_split(traj_edges, lag=1)[source]

Returns the incides of the points in the flat trajectory of the initial and final sample points. In this context, initial means the first N-lag points, and final means the last N-lag points.

Parameters:lag (int, optional) – Number of timepoints in the future to look into the future for the transfer operator. Default is 1.
Returns:
  • t_0_indices (1D numpy array) – Indices in the flattened trajectory data of all the points at the initial times.
  • t_0_indices (1D numpy array) – Indices in the flattened trajectory data of all the points at the final times.
pyedgar.data_manipulation.lift_function(function, n_embed, lag=1)[source]

Lift a function into the delay-embedded space.

pyedgar.data_manipulation.tlist_to_flat(trajs)[source]

Flattens a list of two dimensional trajectories into a single two dimensional datastructure, and returns it along with a list of tuples giving the locations of each trajectory.

Parameters:trajs (list of array-likes) – List where each element n is a array-like object of shape N_n x d, where N_n is the number of data points in that trajectory and d is the number of coordinates for each datapoint.
Returns:
  • traj2D (2D numpy array) – Numpy array containing the flattened trajectory information.
  • traj_edges (1D numpy array) – Numpy array where each element is the start of each trajectory: the n’th trajectory runs from traj_edges[n] to traj_edges[n+1]