yaw.catalogs.scipy.ScipyCatalog#

class yaw.catalogs.scipy.ScipyCatalog(data: DataFrame, ra_name: str, dec_name: str, *, patch_name: str | None = None, patch_centers: BaseCatalog | Coordinate | None = None, n_patches: int | None = None, redshift_name: str | None = None, weight_name: str | None = None, cache_directory: str | None = None, progress: bool = True)[source]#

Bases: BaseCatalog

An implementation of the BaseCatalog using a wrapper around scipy.spatial.cKDTree for the pair counting, which is implemented in yaw.catalogs.scipy.kdtree. Fully supports caching.

Note

This is currently the default backend and has the best support and performance. Currently, trees cannot be shared across the multiprocessing interface and must be rebuilt every time a patch is used for pair counting again.

Build a catalogue from in-memory data.

Catalogs should be instantiated through the factory class, see yaw.catalogs.NewCatalog.from_dataframe().

Methods

__init__(data, ra_name, dec_name, *[, ...])

Build a catalogue from in-memory data.

correlate(config, binned[, other, linkage, ...])

Count pairs between objects at a given separation and in bins of redshift.

from_cache(cache_directory[, progress])

Restore the catalogue from its cache directory.

from_file(filepath, patches, ra, dec, *[, ...])

Build a catalogue from data file.

get_max_redshift()

Get the maximum redshift or None if not available.

get_min_redshift()

Get the minimum redshift or None if not available.

get_totals()

Get an array of the sum of weights or number of objects in each patch.

has_redshifts()

Indicates whether the redshifts() attribute holds data.

has_weights()

Indicates whether the weights() attribute holds data.

is_loaded()

Indicates whether the catalog data is loaded.

load()

Permanently load data from cache into memory.

true_redshifts(config[, sampling_config, ...])

Compute a histogram of the object redshifts from the binning defined in the provided configuration.

unload()

Unload data from memory if a disk cache is provided.

Attributes

centers

Get a vector of sky coordinates of the patch centers in radians.

dec

Get an array of the declination values in radians.

ids

Return a list of unique patch indices in the catalog.

n_patches

The number of spatial patches of this catalogue.

patch

Get the patch indices of each object as array.

pos

Get a vector of the object sky positions in radians.

ra

Get an array of the right ascension values in radians.

radii

Get a vector of angular separations in radians that describe the patch sizes.

redshifts

Get the redshifts as array or None if not available.

total

Get the sum of weights or the number of objects if weights are not available.

weights

Get the object weights as array or None if not available.

property centers: CoordSky#

Get a vector of sky coordinates of the patch centers in radians.

Returns:

yaw.core.coordinates.CoordSky

correlate(config: Configuration, binned: bool, other: ScipyCatalog | None = None, linkage: PatchLinkage | None = None, progress: bool = False) NormalisedCounts | dict[str, NormalisedCounts][source]#

Count pairs between objects at a given separation and in bins of redshift.

If another catalog instance is passed to other, then pairs are formed between these catalogues (cross), otherwise pairs are formed with the catalog (auto). Pairs are counted in bins of redshift, as defined in the configuration object (config). Pairs are only considered within fixed angular scales that are computed from the physical scales in the configuration and the mid of the current redshift bin.

Parameters:
  • config (yaw.Configuration) – Configuration object that defines measurement scales, redshift binning, cosmological model, and various backend specific parameters.

  • binned (bool) – Whether to apply the redshift binning to the second catalogue (see other).

  • other (Catalog instance, optional) – Second catalog instance used for cross-catalogue pair counting. Catalogue must use the same backend.

  • linkage (PatchLinkage, optional) – Linkage object that defines with patches must be correlated for a given scales and which patch combinations can be skipped. Can be used for the scipy backend to count pairs consistently between multiple catalogue instances.

  • progress (bool) – Show a progress indication, depends on backend.

There are three different modes of operation that are determined by the combination of the binned and other parameters:

  1. If no second catalogue is provided, pairs are counted within the catalogue while applying the redshift binning.

  2. If a second catalogue is provided and binned=True, pairs are counted between the catalogues and the binning is applied to both cataluges.

  3. If a second catalogue is provided and binned=False, the redshift binning is not applied to the second catalogue, otherwise above.

The catalogue from the calling instance of correlate() has always redshift binning applied.

property dec: NDArray[np.float64]#

Get an array of the declination values in radians.

classmethod from_cache(cache_directory: str, progress: bool = False) ScipyCatalog[source]#

Restore the catalogue from its cache directory.

Catalogs should be instantiated through the factory class, see yaw.catalogs.NewCatalog.from_cache().

classmethod from_file(filepath: str, patches: str | int | BaseCatalog | Coordinate, ra: str, dec: str, *, redshift: str | None = None, weight: str | None = None, sparse: int | None = None, cache_directory: str | None = None, file_ext: str | None = None, progress: bool = False, **kwargs) BaseCatalog#

Build a catalogue from data file.

Catalogs should be instantiated through the factory class, see yaw.catalogs.NewCatalog.from_file().

get_max_redshift() float[source]#

Get the maximum redshift or None if not available.

get_min_redshift() float[source]#

Get the minimum redshift or None if not available.

get_totals() NDArray[np.float64][source]#

Get an array of the sum of weights or number of objects in each patch.

has_redshifts() bool[source]#

Indicates whether the redshifts() attribute holds data.

has_weights() bool[source]#

Indicates whether the weights() attribute holds data.

property ids: list[int]#

Return a list of unique patch indices in the catalog.

is_loaded() bool[source]#

Indicates whether the catalog data is loaded.

Always True if no cache is used. If the catalog is unloaded, data will be read from cache every time data is accessed.

load() None[source]#

Permanently load data from cache into memory.

Raises a CachingError if no cache is configured.

property n_patches: int#

The number of spatial patches of this catalogue.

property patch: NDArray[np.int64]#

Get the patch indices of each object as array.

property pos: CoordSky#

Get a vector of the object sky positions in radians.

Returns:

yaw.core.coordinates.CoordSky

property ra: NDArray[np.float64]#

Get an array of the right ascension values in radians.

property radii: DistSky#

Get a vector of angular separations in radians that describe the patch sizes.

The radius of the patch is defined as the maximum angular distance of any object from the patch center.

Returns:

yaw.core.coordinates.DistSky

property redshifts: NDArray[np.float64] | None#

Get the redshifts as array or None if not available.

property total: float#

Get the sum of weights or the number of objects if weights are not available.

true_redshifts(config: Configuration, sampling_config: ResamplingConfig | None = None, progress: bool = False) HistData[source]#

Compute a histogram of the object redshifts from the binning defined in the provided configuration.

Parameters:
  • config (Configuration) – Defines the bin edges used for the histogram.

  • sampling_config (ResamplingConfig, optional) – Specifies the spatial resampling for error estimates.

  • progress (bool) – Show a progress bar.

Returns:

Object holding the redshift histogram

Return type:

HistData

unload() None[source]#

Unload data from memory if a disk cache is provided.

property weights: NDArray[np.float64]#

Get the object weights as array or None if not available.