yaw.correlation.paircounts.PatchedCount#

class yaw.correlation.paircounts.PatchedCount(binning: IntervalIndex, counts: NDArray, *, auto: bool)[source]#

Bases: PatchedArray

Container class for pair counts between two samples.

The data in this container are the pair counts between two samples of points. The counts are stored for each spatial patch and per redshift bin, forming a data array of shape shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins.

The container supports comparison of the data elements and the redshift binning with == and !=. Additionally, PatchedCount can be added together if they have the same redshift binning and number of patches, e.g. to add pair counts measured on different scales. The count values can be rescaled/multiplied by a floating point number, e.g. to apply a weighting before summing different scales (see also yaw.correlation.add_corrfuncs()). Any sequence of PatchedCount can be summed together with the built-in python function sum().

Finally, the container supports indexing of and iteration over redshift bins and spatial patches using the special accessor attributes bins (see also SampledData) and patches. Some examples are listed below.

Examples

Create a redshift binning:

>>> import pandas as pd
>>> bins = pd.IntervalIndex.from_breaks([0.1, 0.2, 0.3])
>>> bins
IntervalIndex([(0.1, 0.2], (0.2, 0.3]], dtype='interval[float64, right]')

Create two data containers with some dummy values:

>>> count1 = PatchedCount.zeros(bins, n_patches=5, auto=False)
>>> count1.counts += 1  # set all counts with dummy value 1
>>> count2 = PatchedCount.zeros(bins, n_patches=5, auto=False)
>>> count2.counts += 2  # set all counts with dummy value 2
>>> count2
PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2))

Sum the pair counts and compare different methods:

>>> summed = count1 + count2
>>> summed
PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2))
>>> sum([count1, count2]) == summed
True
>>> (summed.counts == 3).all()
True

Rescale the pair counts:

>>> count1 * 2.0
PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2))
>>> count1 * 2.0 == count2
True

Select a subset of all redshift bins or all spatial patches:

>>> from yaw.examples import patched_count
>>> patched_count
PatchedCount(n_bins=30, z='0.070...1.420', shape=(64, 64, 30))

Note how the indicated shape changes when a patch subset is selected:

>>> patched_count.patches[:10]
PatchedCount(n_bins=30, z='0.070...1.420', shape=(10, 10, 30))

Note how the indicated redshift range and shape change when a bin subset is selected:

>>> patched_count.bins[:3]
PatchedCount(n_bins=3, z='0.070...0.205', shape=(64, 64, 3))

An example of iteration over bins, which yields instances with a single redshift bin:

>>> for zbin in patched_count.bins:
...     print(zbin)
...     break  # just show the first item
PatchedCount(n_bins=1, z='0.070...0.115', shape=(64, 64, 1))

Construct a new instance from an existing pair count array.

Parameters:
  • binning (pandas.IntervalIndex) – The redshift binning applied to the data.

  • counts (NDArray) – Internal data array containing the pair counts between spatial patches in bins of redshift. The array must be 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins. Same as as_array().

Keyword Arguments:

auto (bool) – Whether the data originates from an autocorrelation measurement.

Methods

__init__(binning, counts, *, auto)

Construct a new instance from an existing pair count array.

as_array()

Get the underlying data as contiguous array.

concatenate_bins(*data)

Concatenate pair count data containers with equal patches.

concatenate_patches(*data)

Concatenate pair count data containers with equal redshift binning.

from_file(path)

Create a class instance by deserialising data from a HDF5 file.

from_hdf(source)

Create a class instance by deserialising data from a HDF5 group.

get_binning()

Get the underlying, exact redshift bin intervals.

get_sum(*args, **kwargs)

is_compatible(other[, require])

Check whether this instance is compatible with another instance.

keys()

Array of patch index pairs with non-zero pair counts.

sample_sum([config])

Compute the sum of the data over all patches and samples thereof.

set_measurement(key, item)

Set the counts value in all redshift bins for a pair of patch indices.

sum([axis])

Shorthand for PatchedCount.counts.sum()

to_file(path)

Serialise the class instance to a new HDF5 file.

to_hdf(dest)

Serialise the class instance into an existing HDF5 group.

values()

Array of non-zero pair count values.

zeros(binning, n_patches, *, auto[, dtype])

Create a new instance where all elements of the counts array are initialised to zero.

Attributes

auto

Whether the stored data are from an autocorrelation measurement.

bins

An Indexer attribute that supports iteration over the bins or selecting a subset of the bins.

closed

Specifies on which side the redshift bin intervals are closed, can be: left, right, both, neither.

dtype

The numpy data type of the underlying data.

dz

Get the width of the redshift bins as array.

edges

Get the edges of the redshift bins as flat array.

mids

Get the centers of the redshift bins as array.

n_bins

Get the number of redshift bins.

n_patches

Get the number of spatial patches.

ndim

The number of dimensions of underlying data if viewed as array.

patches

An Indexer attribute that supports iteration over the spatial patches or selecting a subset of the patches.

shape

The shape of underlying data if viewed as array.

size

The number of items in the underlying data if viewed as array.

counts

Internal data array containing the pair counts between spatial patches in bins of redshift.

as_array() NDArray[source]#

Get the underlying data as contiguous array.

The array 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins.

auto = False#

Whether the stored data are from an autocorrelation measurement.

property bins: Indexer[int | slice | Sequence, PatchedCount]#

An Indexer attribute that supports iteration over the bins or selecting a subset of the bins.

The indexer always returns new container instances with the indexed data subset or the current item when iterating.

Warning

Indexing rules for a one-dimensional numpy array apply, however if the resulting binning is not contiguous or contains repeated bins, some operations on the returned container may fail.

Returns:

yaw.core.containers.Indexer

property closed: str#

Specifies on which side the redshift bin intervals are closed, can be: left, right, both, neither.

concatenate_bins(*data: PatchedCount) PatchedCount[source]#

Concatenate pair count data containers with equal patches.

The data is merged by appending the data along the redshift binning axis.

Note

Necessary condition for merging is that the patch numbers are identical and that the merged binning is contiguous and non-overlapping. Cannot merge cross- with autocorrelation containers.

Parameters:

*data – Containers of same type that are appended to the patch dimension of this container.

Returns:

New instance of this container with combined data.

concatenate_patches(*data: PatchedCount) PatchedCount[source]#

Concatenate pair count data containers with equal redshift binning.

The data is merged by extending the dimension of the patch axes. The resulting data array will be a block matrix of the input data arrays, i.e. all elements with correlations between different inputs set to zero.

Note

Necessary condition for merging is that the the redshift binning of all inputs is identical. Cannot merge cross- with autocorrelation containers.

Parameters:

*data – Containers of same type that are appended to the patch dimension of this container.

Returns:

New instance of this container with combined data.

counts: NDArray#

Internal data array containing the pair counts between spatial patches in bins of redshift.

The array is 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins. Same as as_array().

property dtype: DTypeLike#

The numpy data type of the underlying data.

property dz: ndarray[Any, dtype[float64]]#

Get the width of the redshift bins as array.

property edges: ndarray[Any, dtype[float64]]#

Get the edges of the redshift bins as flat array.

classmethod from_file(path: TypePathStr) _Thdf#

Create a class instance by deserialising data from a HDF5 file.

Parameters:

path (pathlib.Path, str) – Group in an opened HDF5 file that contains the necessary data.

Returns:

HDFSerializable

classmethod from_hdf(source: Group) PatchedCount[source]#

Create a class instance by deserialising data from a HDF5 group.

Parameters:

source (h5py.Group) – Group in an opened HDF5 file that contains the serialised data.

Returns:

HDFSerializablep

get_binning() IntervalIndex[source]#

Get the underlying, exact redshift bin intervals.

Returns:

pandas.IntervalIndex

get_sum(*args, **kwargs)#

Deprecated since version 2.3.1: Renamed to sample_sum().

is_compatible(other: _Tbinned, require: bool = False) bool#

Check whether this instance is compatible with another instance.

Ensures that both objects are instances of the same class and that the redshift binning is identical.

Parameters:
  • other (BinnedQuantity) – Object instance to compare to.

  • require (bool, optional) – Raise a ValueError if any of the checks fail.

Returns:

bool

keys() NDArray[source]#

Array of patch index pairs with non-zero pair counts.

The index pairs are ordered by first, then second index. The returned array is of shape (N, 2), where N is the number patches that contain non-zero entries in any of the redshift bins.

property mids: ndarray[Any, dtype[float64]]#

Get the centers of the redshift bins as array.

property n_bins: int#

Get the number of redshift bins.

property n_patches: int#

Get the number of spatial patches.

property ndim: int#

The number of dimensions of underlying data if viewed as array.

property patches: Indexer[int | slice | Sequence, PatchedCount]#

An Indexer attribute that supports iteration over the spatial patches or selecting a subset of the patches.

The indexer always returns new container instances with the indexed data subset or the current item when iterating.

Note

Indexing rules for a one-dimensional numpy array apply.

Returns:

yaw.core.containers.Indexer

sample_sum(config: ResamplingConfig | None = None) SampledData#

Compute the sum of the data over all patches and samples thereof.

Returns a data container with the sum in each redshift bin and samples generated from the patches using the resampling method specified in the configuration parameter.

Parameters:

config (ResamplingConfig) – Specifies the resampling method and its customisation parameters.

Returns:

SampledData

set_measurement(key: PatchIDs | tuple[int, int], item: NDArray)[source]#

Set the counts value in all redshift bins for a pair of patch indices.

Parameters:
  • key (yaw.core.containers.PatchIDs, tuple) – Pair of patch indices for which the new values are set.

  • item (NDArray) – Values to set, must be an array with length matching the number of redshift bins.

property shape: tuple[int]#

The shape of underlying data if viewed as array.

property size: int#

The number of items in the underlying data if viewed as array.

sum(axis: int | tuple[int] | None = None, **kwargs) NDArray[source]#

Shorthand for PatchedCount.counts.sum()

Parameters:
  • axis (tuple, int, optional) – Axis over which the internal 3-dimensional data array is summed.

  • **kwargs – Keyword arguments passed to numpy.ndarry.sum().

to_file(path: TypePathStr) None#

Serialise the class instance to a new HDF5 file.

Parameters:

path (pathlib.Path, str) – Path at which the HDF5 file is created.

to_hdf(dest: Group) None[source]#

Serialise the class instance into an existing HDF5 group.

Parameters:

dest (h5py.Group) – Group in which the serialised data structures are created.

values() NDArray[source]#

Array of non-zero pair count values.

The values are ordered in the same way as the indices returned by keys().

classmethod zeros(binning: IntervalIndex, n_patches: int, *, auto: bool, dtype: DTypeLike = <class 'numpy.float64'>) PatchedCount[source]#

Create a new instance where all elements of the counts array are initialised to zero.

Parameters:
  • binning (pandas.IntervalIndex) – Redshift binning for the container, determines size of last data array dimension.

  • n_patches (int) – Number of spatial patches, determines the size of the first two data array dimensions.

Keyword Arguments:
  • auto (bool) – Whether the data originates from an autocorrelation measurement.

  • dtype (DTypeLike, optional) – Data type to use for the internal data array.