yaw.core.SampledData#

class yaw.core.SampledData(binning: IntervalIndex, data: NDArray, samples: NDArray, method: str)[source]#

Bases: BinnedQuantity

Container for data and resampled data with redshift binning.

Contains the redshift binning, data vector, and resampled data vector (e.g. jackknife or bootstrap samples). The resampled values are used to compute error estimates and covariance/correlation matrices.

Parameters:

binning (pandas.IntervalIndex) – The redshift binning applied to the data.
data (NDArray) – The data values, one for each redshift bin.
samples (NDArray) – The resampled data values (e.g. jackknife or bootstrap samples).
method (str) – The resampling method used, see ResamplingConfig for available options.

The container supports addition and subtraction, which return a new instance of the container, holding the modified data. This requires that both operands are compatible (same binning and same sampling). The operands are applied to the data and samples attribtes.

Furthermore, the container supports indexing and iteration over the redshift bins using the SampledData.bins attribute. This attribute yields instances of SampledData containing a single bin when iterating. Slicing and indexing follows the same rules as the underlying data NDArray. Refer to CorrData for some indexing and iteration examples.

Examples

Create a redshift binning:

>>> import pandas as pd
>>> bins = pd.IntervalIndex.from_breaks([0.1, 0.2, 0.3])
>>> bins
IntervalIndex([(0.1, 0.2], (0.2, 0.3]], dtype='interval[float64, right]')

Create some sample data for the bins with value 1 and five assumed jackknife samples normal-distributed around 1.

>>> import numpy as np
>>> n_bins, n_samples = len(bins), 5
>>> data = np.ones(n_bins)
>>> samples = np.random.normal(1.0, size=(n_samples, n_bins))

Create the container:

>>> values = yaw.core.SampledData(bins, data, samples, method="jackknife")
>>> values
SampledData(n_bins=2, z='0.100...0.300', n_samples=10, method='jackknife')

Add the container to itself and verify that the values are doubled:

>>> summed = values + values
>>> summed.data
array([2., 2.])

The same applies to the samples:

>>> summed.samples / values.samples
array([[2., 2.],
       [2., 2.],
       [2., 2.],
       [2., 2.],
       [2., 2.]])

Methods

`__init__`(binning, data, samples, method)
`concatenate_bins`(*data)	Concatenate pair count data containers with equal patches.
`get_binning`()	Get the underlying, exact redshift bin intervals.
`get_correlation`()	Get value correlation matrix as data frame with its corresponding redshift bin intervals as index and column labels.
`get_covariance`()	Get value covariance matrix as data frame with its corresponding redshift bin intervals as index and column labels.
`get_data`()	Get the data as `pandas.Series` with the binning as index.
`get_error`()	Get value error estimate (diagonal of covariance matrix) as series with its corresponding redshift bin intervals as index.
`get_samples`()	Get the data as `pandas.DataFrame` with the binning as index.
`is_compatible`(other[, require])	Check whether this instance is compatible with another instance.

Attributes

`bins`	An `Indexer` attribute that supports iteration over the bins or selecting a subset of the bins.
`closed`	Specifies on which side the redshift bin intervals are closed, can be: `left`, `right`, `both`, `neither`.
`dz`	Get the width of the redshift bins as array.
`edges`	Get the edges of the redshift bins as flat array.
`error`	The uncertainty (standard error) of the data.
`mids`	Get the centers of the redshift bins as array.
`n_bins`	Get the number of redshift bins.
`n_samples`	Number of samples used for error estimate.
`binning`	The redshift bin intervals.
`data`	The data values, one for each redshift bin.
`samples`	Samples of the data values, shape (# samples, # bins).
`method`	The resampling method used.
`covariance`	Covariance matrix automatically computed from the resampled values.

binning: IntervalIndex#: The redshift bin intervals.

property bins: Indexer[int | slice | Sequence, _Tdata]#

An Indexer attribute that supports iteration over the bins or selecting a subset of the bins.

The indexer always returns new container instances with the indexed data subset or the current item when iterating.

Warning

Indexing rules for a one-dimensional numpy array apply, however if the resulting binning is not contiguous or contains repeated bins, some operations on the returned container may fail.

Returns:: yaw.core.containers.Indexer

property closed: str#: Specifies on which side the redshift bin intervals are closed, can be: left, right, both, neither.

concatenate_bins(*data: _Tdata) → _Tdata[source]#

Concatenate pair count data containers with equal patches.

The data is merged by appending the data along the redshift binning axis.

Note

Necessary condition for merging is that the patch numbers are identical and that the merged binning is contiguous and non-overlapping. Cannot merge cross- with autocorrelation containers.

Parameters:: *data – Containers of same type that are appended to the patch dimension of this container.
Returns:: New instance of this container with combined data.

covariance: NDArray#: Covariance matrix automatically computed from the resampled values.

data: NDArray#: The data values, one for each redshift bin.

property dz: ndarray[Any, dtype[float64]]#: Get the width of the redshift bins as array.

property edges: ndarray[Any, dtype[float64]]#: Get the edges of the redshift bins as flat array.

property error: NDArray#

The uncertainty (standard error) of the data.

Returns:: NDArray

get_binning() → IntervalIndex[source]#

Get the underlying, exact redshift bin intervals.

Returns:: pandas.IntervalIndex

get_correlation() → DataFrame[source]#

Get value correlation matrix as data frame with its corresponding redshift bin intervals as index and column labels.

Returns:: pandas.DataFrame

get_covariance() → DataFrame[source]#

Get value covariance matrix as data frame with its corresponding redshift bin intervals as index and column labels.

Returns:: pandas.DataFrame

get_data() → Series[source]#: Get the data as pandas.Series with the binning as index.

get_error() → Series[source]#

Get value error estimate (diagonal of covariance matrix) as series with its corresponding redshift bin intervals as index.

Returns:: pandas.Series

get_samples() → DataFrame[source]#: Get the data as pandas.DataFrame with the binning as index. The columns are labelled numerically and each represent one of the samples.

is_compatible(other: SampledData, require: bool = False) → bool[source]#

Check whether this instance is compatible with another instance.

Ensures that both objects are instances of the same class, that the redshift binning is identical, that the number of samples agree, and that the resampling method is identical.

Parameters:

other (BinnedQuantity) – Object instance to compare to.
require (bool, optional) – Raise a ValueError if any of the checks fail.

Returns:

bool

method: str#: The resampling method used.

property mids: ndarray[Any, dtype[float64]]#: Get the centers of the redshift bins as array.

property n_bins: int#: Get the number of redshift bins.

property n_samples: int#: Number of samples used for error estimate.

samples: NDArray#: Samples of the data values, shape (# samples, # bins).