Starting a new project#
The command line tool is designed to operate on separate projects. A project uses a fixed set of parameters (e.g. redshift binning and correlation measurement scales) to compute clustering redshifts with a single reference sample and one or many unknown data sets that spatially overlap.
Note
Using multiplereference samples as input for a project is currently not supported, however it is possible to merge the outputs from different projects.
New projects are created with the yaw_cli init [path] subcommand, where the
path specifies a directory (must not exist) in which all data products are
stored and managed. This command specifies the majority of the
paramters for the correlation measurements, including the measurement scales,
the redshift binning, as well as optional parameters such as the cosmological
model for distance calculations and the automatic generation of
spatial patches. A list of all command line arguments can be
obtained by typing
$ yaw_cli init --help
usage: yaw_cli init [-h] [-v] [-s <file>] [--backend {scipy,treecorr}]
[--cache-path <path>] [--n-patches <int>]
[--cosmology {WMAP1,WMAP3,WMAP5,WMAP7,WMAP9,Planck13,Planck15,Planck18}]
--ref-path <file> --ref-ra <str> --ref-dec <str> --ref-z
<str> [--ref-w <str>] [--ref-patch <str>] [--ref-cache]
[--rand-path <file>] [--rand-ra <str>] [--rand-dec <str>]
[--rand-z <str>] [--rand-w <str>] [--rand-patch <str>]
[--rand-cache] --rmin [<float> ...] --rmax [<float> ...]
[--rweight <float>] [--rbin-num <int>]
[--zbins [<float> ...]]
[--method {comoving,linear,logspace}] [--zmin <float>]
[--zmax <float>] [--zbin-num <int>] [--thread-num <int>]
[--no-crosspatch] [--rbin-slop <float>]
<path>
Initialise and create a project directory with a configuration. Specify the
reference sample data and optionally randoms.
positional arguments:
<path> project directory, must not exist
options:
-h, --help show this help message and exit
-v, --verbose show additional information in terminal, repeat to
show debug messages
-s <file>, --setup <file>
optionl setup YAML file (e.g. from 'yaw_cli run -d')
with base configuration that is overwritten by
arguments below
additional arguments:
--backend {scipy,treecorr}
backend used for pair counting (default: scipy)
--cache-path <path> non-standard location for the cache directory (e.g. on
faster storage, default: [project directory]/cache)
--n-patches <int> split all input data into this number of spatial
patches for covariance estimation (default: patch
index for catalogs)
--cosmology {WMAP1,WMAP3,WMAP5,WMAP7,WMAP9,Planck13,Planck15,Planck18}
cosmological model used for distance calculations (see
astropy.cosmology, default: Planck15)
reference (data):
specify the reference (data) input file
--ref-path <file> input file path
--ref-ra <str> column name of right ascension
--ref-dec <str> column name of declination
--ref-z <str> column name of redshift
--ref-w <str> column name of object weight
--ref-patch <str> column name of patch assignment index
--ref-cache cache the data in the project's cache directory
reference (random):
specify the reference (random) input file (optional)
--rand-path <file> input file path
--rand-ra <str> column name of right ascension
--rand-dec <str> column name of declination
--rand-z <str> column name of redshift
--rand-w <str> column name of object weight
--rand-patch <str> column name of patch assignment index
--rand-cache cache the data in the project's cache directory
measurement scales:
sets the physical scales for the correlation measurements
--rmin [<float> ...] (list of) lower scale limit in kpc (pyhsical)
--rmax [<float> ...] (list of) upper scale limit in kpc (pyhsical)
--rweight <float> weight galaxy pairs by their separation to power
'rweight' (default: no weighting applied)
--rbin-num <int> number of bins in log r used (i.e. resolution) to
compute distance weights (default: 50)
redshift binning:
sets the redshift binning for the clustering redshifts
--zbins [<float> ...]
list of custom redshift bin edges, if method is set to
'manual'
--method {comoving,linear,logspace}
redshift binning method, 'logspace' means equal size
in log(1+z) (default: linear)
--zmin <float> lower redshift limit (default: None)
--zmax <float> upper redshift limit (default: None)
--zbin-num <int> number of redshift bins (default: 30)
backend specific:
parameters that are specific to pair counting backends
--thread-num <int> default number of threads to use (default: all)
--no-crosspatch whether to count pairs across patch boundaries (scipy
backend only)
--rbin-slop <float> TreeCorr 'rbin_slop' parameter (default: 0.01),
without 'rweight' this just a single radial bin,
otherwise 'rbin_num'
Note
The configuration of the redshift bins has two mutually exclusive parameter group. The binning must specifed as either of:
--zbins, i.e. providing a list of bin edges, or--zmin,--zmax, (--zbin-num,--method), i.e. providing parameters used to generate a binning automatically.
If both are provided, --zbins is ignored.
The reference sample#
Since the reference sample used for a project is static, the reference sample is
already specifed at this stage by providing an input path --ref-path and the
requred column names for right ascension (--ref-ra), declination
(--ref-dec, in degrees) and per-object redshifts (--ref-z), weights
(--ref-w) are optional.
Similarly, a random sample for the reference sample can be provided using the
corresponding --rand-* arguments. Note that the reference randoms also
require per-object redshifts. If no reference randoms are provided, randoms for
the unknown sample are required (see yaw_cli cross).
Spatial patches and caching#
It is important to specify consistent spatial patches for a project, since these are used to compute uncertainty estimates and covariances. There are two options:
Generate the patches automatically using a k-means clustering algorithm. The code ensures that all data and random catalogues have the patch centers.
Provide manual patch assignements from a column with integer patch indices
--ref-patchand--rand-patch. The code will only check that the patches align roughly, but the user must ensure that they are consistent for all input samples.
Warning
For performance reasons it is highly recommended to cache all input data
sets using the flags --ref-cache and --rand-cache. For more details
refer to Caching.
Outputs#
The init subcommand creates an empty project directory, in
which all data products are stored. The configuration is stored in the newly
created setup.yaml YAML file, together with a declaration
of input files and processing steps applied (see next page). Logs for debugging
are stored in setup.log, the patch center coordinates are stored in
patch_centers.dat. Finally, the redshift distribution of the reference
sample is computed and stored as true/nz_reference.*.