yaw.correlation.paircounts.PatchedCount#
- class yaw.correlation.paircounts.PatchedCount(binning: IntervalIndex, counts: NDArray, *, auto: bool)[source]#
Bases:
PatchedArrayContainer class for pair counts between two samples.
The data in this container are the pair counts between two samples of points. The counts are stored for each spatial patch and per redshift bin, forming a data array of shape shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins.
The container supports comparison of the data elements and the redshift binning with
==and!=. Additionally,PatchedCountcan be added together if they have the same redshift binning and number of patches, e.g. to add pair counts measured on different scales. The count values can be rescaled/multiplied by a floating point number, e.g. to apply a weighting before summing different scales (see alsoyaw.correlation.add_corrfuncs()). Any sequence ofPatchedCountcan be summed together with the built-in python functionsum().Finally, the container supports indexing of and iteration over redshift bins and spatial patches using the special accessor attributes
bins(see alsoSampledData) andpatches. Some examples are listed below.Examples
Create a redshift binning:
>>> import pandas as pd >>> bins = pd.IntervalIndex.from_breaks([0.1, 0.2, 0.3]) >>> bins IntervalIndex([(0.1, 0.2], (0.2, 0.3]], dtype='interval[float64, right]')
Create two data containers with some dummy values:
>>> count1 = PatchedCount.zeros(bins, n_patches=5, auto=False) >>> count1.counts += 1 # set all counts with dummy value 1 >>> count2 = PatchedCount.zeros(bins, n_patches=5, auto=False) >>> count2.counts += 2 # set all counts with dummy value 2 >>> count2 PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2))
Sum the pair counts and compare different methods:
>>> summed = count1 + count2 >>> summed PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2)) >>> sum([count1, count2]) == summed True >>> (summed.counts == 3).all() True
Rescale the pair counts:
>>> count1 * 2.0 PatchedCount(n_bins=2, z='0.100...0.300', shape=(5, 5, 2)) >>> count1 * 2.0 == count2 True
Select a subset of all redshift bins or all spatial patches:
>>> from yaw.examples import patched_count >>> patched_count PatchedCount(n_bins=30, z='0.070...1.420', shape=(64, 64, 30))
Note how the indicated shape changes when a patch subset is selected:
>>> patched_count.patches[:10] PatchedCount(n_bins=30, z='0.070...1.420', shape=(10, 10, 30))
Note how the indicated redshift range and shape change when a bin subset is selected:
>>> patched_count.bins[:3] PatchedCount(n_bins=3, z='0.070...0.205', shape=(64, 64, 3))
An example of iteration over bins, which yields instances with a single redshift bin:
>>> for zbin in patched_count.bins: ... print(zbin) ... break # just show the first item PatchedCount(n_bins=1, z='0.070...0.115', shape=(64, 64, 1))
Construct a new instance from an existing pair count array.
- Parameters:
binning (
pandas.IntervalIndex) – The redshift binning applied to the data.counts (
NDArray) – Internal data array containing the pair counts between spatial patches in bins of redshift. The array must be 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins. Same asas_array().
- Keyword Arguments:
auto (
bool) – Whether the data originates from an autocorrelation measurement.
Methods
__init__(binning, counts, *, auto)Construct a new instance from an existing pair count array.
as_array()Get the underlying data as contiguous array.
concatenate_bins(*data)Concatenate pair count data containers with equal patches.
concatenate_patches(*data)Concatenate pair count data containers with equal redshift binning.
from_file(path)Create a class instance by deserialising data from a HDF5 file.
from_hdf(source)Create a class instance by deserialising data from a HDF5 group.
Get the underlying, exact redshift bin intervals.
get_sum(*args, **kwargs)is_compatible(other[, require])Check whether this instance is compatible with another instance.
keys()Array of patch index pairs with non-zero pair counts.
sample_sum([config])Compute the sum of the data over all patches and samples thereof.
set_measurement(key, item)Set the counts value in all redshift bins for a pair of patch indices.
sum([axis])Shorthand for
PatchedCount.counts.sum()to_file(path)Serialise the class instance to a new HDF5 file.
to_hdf(dest)Serialise the class instance into an existing HDF5 group.
values()Array of non-zero pair count values.
zeros(binning, n_patches, *, auto[, dtype])Create a new instance where all elements of the counts array are initialised to zero.
Attributes
Whether the stored data are from an autocorrelation measurement.
An
Indexerattribute that supports iteration over the bins or selecting a subset of the bins.Specifies on which side the redshift bin intervals are closed, can be:
left,right,both,neither.The numpy data type of the underlying data.
Get the width of the redshift bins as array.
Get the edges of the redshift bins as flat array.
Get the centers of the redshift bins as array.
Get the number of redshift bins.
Get the number of spatial patches.
The number of dimensions of underlying data if viewed as array.
An
Indexerattribute that supports iteration over the spatial patches or selecting a subset of the patches.The shape of underlying data if viewed as array.
The number of items in the underlying data if viewed as array.
Internal data array containing the pair counts between spatial patches in bins of redshift.
- as_array() NDArray[source]#
Get the underlying data as contiguous array.
The array 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins.
- auto = False#
Whether the stored data are from an autocorrelation measurement.
- property bins: Indexer[int | slice | Sequence, PatchedCount]#
An
Indexerattribute that supports iteration over the bins or selecting a subset of the bins.The indexer always returns new container instances with the indexed data subset or the current item when iterating.
Warning
Indexing rules for a one-dimensional numpy array apply, however if the resulting binning is not contiguous or contains repeated bins, some operations on the returned container may fail.
- Returns:
yaw.core.containers.Indexer
- property closed: str#
Specifies on which side the redshift bin intervals are closed, can be:
left,right,both,neither.
- concatenate_bins(*data: PatchedCount) PatchedCount[source]#
Concatenate pair count data containers with equal patches.
The data is merged by appending the data along the redshift binning axis.
Note
Necessary condition for merging is that the patch numbers are identical and that the merged binning is contiguous and non-overlapping. Cannot merge cross- with autocorrelation containers.
- Parameters:
*data – Containers of same type that are appended to the patch dimension of this container.
- Returns:
New instance of this container with combined data.
- concatenate_patches(*data: PatchedCount) PatchedCount[source]#
Concatenate pair count data containers with equal redshift binning.
The data is merged by extending the dimension of the patch axes. The resulting data array will be a block matrix of the input data arrays, i.e. all elements with correlations between different inputs set to zero.
Note
Necessary condition for merging is that the the redshift binning of all inputs is identical. Cannot merge cross- with autocorrelation containers.
- Parameters:
*data – Containers of same type that are appended to the patch dimension of this container.
- Returns:
New instance of this container with combined data.
- counts: NDArray#
Internal data array containing the pair counts between spatial patches in bins of redshift.
The array is 3-dimensional with shape (N, N, K), where N is the number of spatial patches, and K is the number of redshift bins. Same as
as_array().
- property dtype: DTypeLike#
The numpy data type of the underlying data.
- property dz: ndarray[Any, dtype[float64]]#
Get the width of the redshift bins as array.
- property edges: ndarray[Any, dtype[float64]]#
Get the edges of the redshift bins as flat array.
- classmethod from_file(path: TypePathStr) _Thdf#
Create a class instance by deserialising data from a HDF5 file.
- Parameters:
path (
pathlib.Path,str) – Group in an opened HDF5 file that contains the necessary data.- Returns:
HDFSerializable
- classmethod from_hdf(source: Group) PatchedCount[source]#
Create a class instance by deserialising data from a HDF5 group.
- Parameters:
source (
h5py.Group) – Group in an opened HDF5 file that contains the serialised data.- Returns:
HDFSerializablep
- get_binning() IntervalIndex[source]#
Get the underlying, exact redshift bin intervals.
- Returns:
pandas.IntervalIndex
- get_sum(*args, **kwargs)#
Deprecated since version 2.3.1: Renamed to
sample_sum().
- is_compatible(other: _Tbinned, require: bool = False) bool#
Check whether this instance is compatible with another instance.
Ensures that both objects are instances of the same class and that the redshift binning is identical.
- Parameters:
other (
BinnedQuantity) – Object instance to compare to.require (
bool, optional) – Raise a ValueError if any of the checks fail.
- Returns:
bool
- keys() NDArray[source]#
Array of patch index pairs with non-zero pair counts.
The index pairs are ordered by first, then second index. The returned array is of shape (N, 2), where N is the number patches that contain non-zero entries in any of the redshift bins.
- property mids: ndarray[Any, dtype[float64]]#
Get the centers of the redshift bins as array.
- property n_bins: int#
Get the number of redshift bins.
- property n_patches: int#
Get the number of spatial patches.
- property ndim: int#
The number of dimensions of underlying data if viewed as array.
- property patches: Indexer[int | slice | Sequence, PatchedCount]#
An
Indexerattribute that supports iteration over the spatial patches or selecting a subset of the patches.The indexer always returns new container instances with the indexed data subset or the current item when iterating.
Note
Indexing rules for a one-dimensional numpy array apply.
- Returns:
yaw.core.containers.Indexer
- sample_sum(config: ResamplingConfig | None = None) SampledData#
Compute the sum of the data over all patches and samples thereof.
Returns a data container with the sum in each redshift bin and samples generated from the patches using the resampling method specified in the configuration parameter.
- Parameters:
config (
ResamplingConfig) – Specifies the resampling method and its customisation parameters.- Returns:
- set_measurement(key: PatchIDs | tuple[int, int], item: NDArray)[source]#
Set the counts value in all redshift bins for a pair of patch indices.
- Parameters:
key (
yaw.core.containers.PatchIDs, tuple) – Pair of patch indices for which the new values are set.item (
NDArray) – Values to set, must be an array with length matching the number of redshift bins.
- property shape: tuple[int]#
The shape of underlying data if viewed as array.
- property size: int#
The number of items in the underlying data if viewed as array.
- sum(axis: int | tuple[int] | None = None, **kwargs) NDArray[source]#
Shorthand for
PatchedCount.counts.sum()- Parameters:
axis (
tuple,int, optional) – Axis over which the internal 3-dimensional data array is summed.**kwargs – Keyword arguments passed to
numpy.ndarry.sum().
- to_file(path: TypePathStr) None#
Serialise the class instance to a new HDF5 file.
- Parameters:
path (
pathlib.Path,str) – Path at which the HDF5 file is created.
- to_hdf(dest: Group) None[source]#
Serialise the class instance into an existing HDF5 group.
- Parameters:
dest (
h5py.Group) – Group in which the serialised data structures are created.
- values() NDArray[source]#
Array of non-zero pair count values.
The values are ordered in the same way as the indices returned by
keys().
- classmethod zeros(binning: IntervalIndex, n_patches: int, *, auto: bool, dtype: DTypeLike = <class 'numpy.float64'>) PatchedCount[source]#
Create a new instance where all elements of the counts array are initialised to zero.
- Parameters:
binning (
pandas.IntervalIndex) – Redshift binning for the container, determines size of last data array dimension.n_patches (
int) – Number of spatial patches, determines the size of the first two data array dimensions.
- Keyword Arguments:
auto (
bool) – Whether the data originates from an autocorrelation measurement.dtype (
DTypeLike, optional) – Data type to use for the internal data array.