yaw.correlation.paircounts.PatchedTotal#

class yaw.correlation.paircounts.PatchedTotal(binning: IntervalIndex, totals1: NDArray, totals2: NDArray, *, auto: bool)[source]#

Bases: PatchedArray

Container class for the product of the total number of objects of two samples.

The data in this container, the product of the total number of objects from two samples, is constructed by multiplting the total number of objects of both samples (split into spatial patches and redshift bins). This data is required to normalise pair counts when computing correlation functions, e.g. to account for different sizes of a data and a random sample.

Internally, the nubmer of objects are stored per sample as totals1 and totals2 respectively. The (outer) product for all combinations of patches is only computed when calling the as_array() or sample_sum() methods.

The container supports comparison of the data elements and the redshift binning with == and !=. The indexing rules are the same as for PatchedCount.

Examples

Select a subset of all redshift bins or all spatial patches:

>>> from yaw.examples import patched_total
>>> patched_total
PatchedTotal(n_bins=30, z='0.070...1.420', shape=(64, 64, 30))

Note how the indicated shape changes when a patch subset is selected:

>>> patched_total.patches[:10]
PatchedTotal(n_bins=30, z='0.070...1.420', shape=(10, 10, 30))

Note how the indicated redshift range and shape change when a bin subset is selected:

>>> patched_total.bins[:3]
PatchedTotal(n_bins=3, z='0.070...0.205', shape=(64, 64, 3))

An example of iteration over bins, which yields instances with a single redshift bin:

>>> for zbin in patched_total.bins:
...     print(zbin)
...     break  # just show the first item
PatchedTotal(n_bins=1, z='0.070...0.115', shape=(64, 64, 1))

Construct a new instance from the total number of objects in the first and second catalog.

Parameters:

binning (pandas.IntervalIndex) – The redshift binning applied to the data.
totals1 (NDArray) – The total number of objects from the first data catalogue per patch and redshift bin. The array must be of shape (N, K), where N is the number of spatial patches, and K is the number of redshift bins.
totals2 (NDArray) – The total number of objects from the second data catalogue per patch and redshift bin. The array must be of shape (N, K), where N is the number of spatial patches, and K is the number of redshift bins.

Keyword Arguments:

auto (bool) – Whether the data originates from an autocorrelation measurement.

Methods

`__init__`(binning, totals1, totals2, *, auto)	Construct a new instance from the total number of objects in the first and second catalog.
`as_array`()	Get the underlying data as contiguous array.
`concatenate_bins`(*data)	Concatenate pair count data containers with equal patches.
`concatenate_patches`(*data)	Concatenate pair count data containers with equal redshift binning.
`from_file`(path)	Create a class instance by deserialising data from a HDF5 file.
`from_hdf`(source)	Create a class instance by deserialising data from a HDF5 group.
`get_binning`()	Get the underlying, exact redshift bin intervals.
`get_sum`(args, *kwargs)
`is_compatible`(other[, require])	Check whether this instance is compatible with another instance.
`sample_sum`([config])	Compute the sum of the data over all patches and samples thereof.
`to_file`(path)	Serialise the class instance to a new HDF5 file.
`to_hdf`(dest)	Serialise the class instance into an existing HDF5 group.

Attributes

`auto`	Whether the stored data are from an autocorrelation measurement.
`bins`	An `Indexer` attribute that supports iteration over the bins or selecting a subset of the bins.
`closed`	Specifies on which side the redshift bin intervals are closed, can be: `left`, `right`, `both`, `neither`.
`dtype`	The numpy data type of the underlying data.
`dz`	Get the width of the redshift bins as array.
`edges`	Get the edges of the redshift bins as flat array.
`mids`	Get the centers of the redshift bins as array.
`n_bins`	Get the number of redshift bins.
`n_patches`	Get the number of spatial patches.
`ndim`	The number of dimensions of underlying data if viewed as array.
`patches`	An `Indexer` attribute that supports iteration over the spatial patches or selecting a subset of the patches.
`shape`	The shape of underlying data if viewed as array.
`size`	The number of items in the underlying data if viewed as array.
`totals1`	The total number of objects from the first data catalogue per patch and redshift bin.
`totals2`	The total number of objects from the second data catalogue per patch and redshift bin.

as_array() → NDArray[source]#

Get the underlying data as contiguous array.

The array 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins.

auto = False#: Whether the stored data are from an autocorrelation measurement.

property bins: Indexer[int | slice | Sequence, PatchedTotal]#

An Indexer attribute that supports iteration over the bins or selecting a subset of the bins.

The indexer always returns new container instances with the indexed data subset or the current item when iterating.

Warning

Indexing rules for a one-dimensional numpy array apply, however if the resulting binning is not contiguous or contains repeated bins, some operations on the returned container may fail.

Returns:: yaw.core.containers.Indexer

property closed: str#: Specifies on which side the redshift bin intervals are closed, can be: left, right, both, neither.

concatenate_bins(*data: PatchedTotal) → PatchedTotal[source]#

Concatenate pair count data containers with equal patches.

The data is merged by appending the data along the redshift binning axis.

Note

Necessary condition for merging is that the patch numbers are identical and that the merged binning is contiguous and non-overlapping. Cannot merge cross- with autocorrelation containers.

Parameters:: *data – Containers of same type that are appended to the patch dimension of this container.
Returns:: New instance of this container with combined data.

concatenate_patches(*data: PatchedTotal) → PatchedTotal[source]#

Concatenate pair count data containers with equal redshift binning.

The data is merged by extending the dimension of the patch axes. The resulting data array will be a block matrix of the input data arrays, i.e. all elements with correlations between different inputs set to zero.

Note

Necessary condition for merging is that the the redshift binning of all inputs is identical. Cannot merge cross- with autocorrelation containers.

Parameters:: *data – Containers of same type that are appended to the patch dimension of this container.
Returns:: New instance of this container with combined data.

property dtype: DTypeLike#: The numpy data type of the underlying data.

property dz: ndarray[Any, dtype[float64]]#: Get the width of the redshift bins as array.

property edges: ndarray[Any, dtype[float64]]#: Get the edges of the redshift bins as flat array.

classmethod from_file(path: TypePathStr) → _Thdf#

Create a class instance by deserialising data from a HDF5 file.

Parameters:: path (pathlib.Path, str) – Group in an opened HDF5 file that contains the necessary data.
Returns:: HDFSerializable

classmethod from_hdf(source: Group) → PatchedTotal[source]#

Create a class instance by deserialising data from a HDF5 group.

Parameters:: source (h5py.Group) – Group in an opened HDF5 file that contains the serialised data.
Returns:: HDFSerializablep

get_binning() → IntervalIndex[source]#

Get the underlying, exact redshift bin intervals.

Returns:: pandas.IntervalIndex

get_sum(*args, **kwargs)#: Deprecated since version 2.3.1: Renamed to sample_sum().

is_compatible(other: _Tbinned, require: bool = False) → bool#

Check whether this instance is compatible with another instance.

Ensures that both objects are instances of the same class and that the redshift binning is identical.

Parameters:

other (BinnedQuantity) – Object instance to compare to.
require (bool, optional) – Raise a ValueError if any of the checks fail.

Returns:

bool

property mids: ndarray[Any, dtype[float64]]#: Get the centers of the redshift bins as array.

property n_bins: int#: Get the number of redshift bins.

property n_patches: int#: Get the number of spatial patches.

property ndim: int#: The number of dimensions of underlying data if viewed as array.

property patches: Indexer[int | slice | Sequence, PatchedTotal]#

An Indexer attribute that supports iteration over the spatial patches or selecting a subset of the patches.

The indexer always returns new container instances with the indexed data subset or the current item when iterating.

Note

Indexing rules for a one-dimensional numpy array apply.

Returns:: yaw.core.containers.Indexer

sample_sum(config: ResamplingConfig | None = None) → SampledData#

Compute the sum of the data over all patches and samples thereof.

Returns a data container with the sum in each redshift bin and samples generated from the patches using the resampling method specified in the configuration parameter.

Parameters:: config (ResamplingConfig) – Specifies the resampling method and its customisation parameters.
Returns:: SampledData

property shape: tuple[int]#: The shape of underlying data if viewed as array.

property size: int#: The number of items in the underlying data if viewed as array.

to_file(path: TypePathStr) → None#

Serialise the class instance to a new HDF5 file.

Parameters:: path (pathlib.Path, str) – Path at which the HDF5 file is created.

to_hdf(dest: Group) → None[source]#

Serialise the class instance into an existing HDF5 group.

Parameters:: dest (h5py.Group) – Group in which the serialised data structures are created.

totals1: NDArray#

The total number of objects from the first data catalogue per patch and redshift bin.

The array is of shape (N, K), where N is the number of spatial patches, and K is the number of redshift bins.

totals2: NDArray#

The total number of objects from the second data catalogue per patch and redshift bin.

The array is of shape (N, K), where N is the number of spatial patches, and K is the number of redshift bins.