HeavyEdge-Distance documentation#

Plugin of heavyedge to compute shape distance between edge profiles.

Usage#

HeavyEdge-Distance is designed to be used either as a command line program or as a Python module.

To compute shape distance matrix, convert your profile data to pre-shapes and compute distance matrix. For example, the following lines of commands convert a profile data to area-scaled pre-shape and compute Wasserstein distance matrix.

heavyedge scale --type=area <profile> -o <scaled_profile>  # command from HeavyEdge package
heavyedge dist-wasserstein --grid-num=100 <scaled_profile> -o <distance-matrix>

Command line#

Command lines are provided as plugins and can be invoked by:

heavyedge <command>

Refer to help message of heavyedge for list of commands and their arguments.

Python module#

The Python module heavyedge_distance provides functions to compute distance matrix in Python runtime. Refer to Runtime API section for high-level interface.

Module reference#

This section provides reference for heavyedge_distance Python module.

Runtime API#

High-level Python runtime interface.

heavyedge_distance.api.distmat_euclidean(f1, f2=None, batch_size=None, logger=<function <lambda>>)[source]#

L2 distance matrix between profiles.

Parameters:
f1heavyedge.ProfileData

Open h5 file.

f2heavyedge.ProfileData, optional

Open h5 file. If not passed, it is set to f1.

batch_sizeint, optional

Batch size to load data. If not passed, all data are loaded at once.

loggercallable, optional

Logger function which accepts a progress message string.

Returns:
(N1, N2) array

Euclidean distance matrix.

Notes

distmat_euclidean(f1) is faster than distmat_euclidean(f1, f1).

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_distance import get_sample_path
>>> from heavyedge_distance.api import distmat_euclidean
>>> with ProfileData(get_sample_path("MeanProfiles-AreaScaled.h5")) as data:
...     D = distmat_euclidean(data)
heavyedge_distance.api.distmat_wasserstein(t, f1, f2=None, batch_size=None, logger=<function <lambda>>)[source]#

Wasserstein distance matrix between area-scaled profiles.

Warning

This function assumes that the profiles in f1 and f2 are area-scaled and heights outside the support are zero.

Parameters:
t(M,) ndarray

Coordinates of grids over which the quantile functions will be measured. Must be strictly increasing from 0 to 1.

f1heavyedge.ProfileData

Open h5 file of area-scaled profiles.

f2heavyedge.ProfileData, optional

Open h5 file of area-scaled profiles. If not passed, it is set to f1.

batch_sizeint, optional

Batch size to load data. If not passed, all data are loaded at once.

loggercallable, optional

Logger function which accepts a progress message string.

Returns:
(N1, N2) array

Wasserstein distance matrix.

Notes

distmat_wasserstein(f1) is faster than distmat_wasserstein(f1, f1).

Examples

>>> import numpy as np
>>> from heavyedge import ProfileData
>>> from heavyedge_distance import get_sample_path
>>> from heavyedge_distance.api import distmat_wasserstein
>>> with ProfileData(get_sample_path("MeanProfiles-AreaScaled.h5")) as data:
...     D = distmat_wasserstein(np.linspace(0, 1, 100), data)
heavyedge_distance.api.distmat_frechet(f1, f2=None, batch_size=None, n_jobs=None, logger=<function <lambda>>)[source]#

1-D discrete Fréchet distance matrix between profiles.

Parameters:
f1heavyedge.ProfileData

Open h5 file.

f2heavyedge.ProfileData, optional

Open h5 file. If not passed, it is set to f1.

batch_sizeint, optional

Batch size to load data. If not passed, all data are loaded at once.

n_jobsint, optional

Number of parallel workers. If not passed, HEAVYEDGE_MAX_WORKERS environment variable is used. If the environment variable is invalid, set to 1.

loggercallable, optional

Logger function which accepts a progress message string.

Returns:
(N1, N2) array

Discrete Fréchet distance matrix.

Notes

distmat_frechet(f1) is faster than distmat_frechet(f1, f1).

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_distance import get_sample_path
>>> from heavyedge_distance.api import distmat_frechet
>>> with ProfileData(get_sample_path("MeanProfiles-PlateauScaled.h5")) as data:
...     D = distmat_frechet(data)

Low-level API#

Wasserstein distance#

Wasserstein-related functions.

heavyedge_distance.wasserstein.wdist(t, Qs1, Qs2)[source]#

Wasserstein distance matrix of 1D probability distributions.

\[d_W(f_1, f_2)^2 = \int^1_0 (Q_1(t) - Q_2(t))^2 dt\]

where \(Q_i\) is the quantile function of \(f_i\).

Parameters:
t(M,) ndarray

Points over which Qs1 and Qs2 are measured. Must be strictly increasing from 0 to 1.

Qs1(N1, M) ndarray

Quantile functions of first set of probability distributions.

Qs2(N2, M) ndarray or Non

Quantile functions of second set of probability distributions. If None is passed, it is set to Qs1.

Returns:
(N1, N2) array

Wasserstein distance matrix.

Examples

>>> import numpy as np
>>> from heavyedge import ProfileData
>>> from heavyedge.wasserstein import quantile
>>> from heavyedge_distance import get_sample_path
>>> from heavyedge_distance.wasserstein import wdist
>>> with ProfileData(get_sample_path("MeanProfiles-AreaScaled.h5")) as data:
...     x = data.x()
...     fs, Ls, _ = data[:]
>>> t = np.linspace(0, 1, 100)
>>> Qs = quantile(x, fs, Ls, t)
>>> D1 = wdist(t, Qs, None)
>>> D2 = wdist(t, Qs, Qs)