Classical scripting#

After initialising a new project directory, a number of processing steps can be applied, each implemented in a separate subcommand of the yaw_cli script:

$ yaw_cli [subcommand]

Each subcommand provides an overview over its command line arguments, which can be invoked by yaw_cli [subcommand] -h / yaw_cli [subcommand] --help. A summary of these is provided in the sections below.

Execution order#

Many subcommands depend on outputs from previous steps, therefore subcommands should be called in a specific order:

  • Project setup: init (always required)

  • Counting pairs with cross and/or auto (additionally ztrue on simulations)

  • Removing cached data with drop, estimating redshifts from pair counts with zcc

  • Creating check plots: plot.

The order of commands in each of the groups above does not matter and except init none of the steps above are required.

Note

If a subcommands finds no input data at all, a warning is issued and the process exits normally.

yaw_cli cross#

Description

Responsible for computing crosscorrelations by counting pairs between the reference and unknown samples in bins of redshift and storing the counts. Since the main parameters are already configured with yaw_cli init, this command soley specifies the unknown sample data (and optionally) random catalogues.

The unknown sample is specifed by providing a single or multiple input paths (e.g. to process tomographic bins) with --unk-path together with the requred column names for right ascension (--unk-ra) and declination (--unk-dec, in degrees), weights (--unk-w) are optional.

Similarly, the random sample(s), one for each input catalogue in --unk-path, can be provided using the corresponding --rand-* arguments.

Note

If weights are provided, the total sum of weights in each subset are stored in the special file bin_weights.dat. Otherwise this file lists the total number of objects in each subset.

Inputs

Reference data (and random) sample, unknown data (and random) sample(s).

Outputs

Pair counts between reference and unknown sample(s). Stored per patch and redshift bin as HDF5 files, one file for each unknown sample subset and scale, at estimate/[scale]/cross_[subset].hdf, where subset is a running index.

Depends on

Dependants

zcc, ztrue

Note

It is possible to provide redshift point estimates (--unk-z / --rand-z), e.g. when using simulated data, however these are only relevant for the auto and ztrue subcommands.

yaw_cli cross --help
usage: yaw_cli cross [-h] [-v] [--threads <int>] [--progress] [--rr]
                     --unk-path <file> [<file> ...] --unk-ra <str> --unk-dec
                     <str> [--unk-z <str>] [--unk-w <str>] [--unk-patch <str>]
                     [--unk-idx <int> [<int> ...]] [--unk-cache]
                     [--rand-path <file> [<file> ...]] [--rand-ra <str>]
                     [--rand-dec <str>] [--rand-z <str>] [--rand-w <str>]
                     [--rand-patch <str>] [--rand-idx <int> [<int> ...]]
                     [--rand-cache]
                     <directory>

Specify the unknown data sample(s) and optionally randoms. Measure the angular
cross-correlation function amplitude with the reference sample in bins of
redshift.

positional arguments:
  <directory>           project directory, must exist

options:
  -h, --help            show this help message and exit
  -v, --verbose         show additional information in terminal, repeat to
                        show debug messages
  --threads <int>       number of threads to use (default: from configuration)
  --progress            show a progress bar if the backend supports it
  --rr                  compute random-random pair counts if both randoms are
                        available

unknown (data):
  specify the unknown (data) input file

  --unk-path <file> [<file> ...]
                        (list of) input file paths (e.g. if the data sample is
                        binned tomographically)
  --unk-ra <str>        column name of right ascension
  --unk-dec <str>       column name of declination
  --unk-z <str>         column name of redshift
  --unk-w <str>         column name of object weight
  --unk-patch <str>     column name of patch assignment index
  --unk-idx <int> [<int> ...]
                        integer index to identify the input files (or bins)
                        provided with [--unk-path] (default: 1, 2, ...)
  --unk-cache           cache the data in the project's cache directory

unknown (random):
  specify the unknown (random) input file (optional)

  --rand-path <file> [<file> ...]
                        (list of) input file paths (e.g. if the data sample is
                        binned tomographically)
  --rand-ra <str>       column name of right ascension
  --rand-dec <str>      column name of declination
  --rand-z <str>        column name of redshift
  --rand-w <str>        column name of object weight
  --rand-patch <str>    column name of patch assignment index
  --rand-idx <int> [<int> ...]
                        integer index to identify the input files (or bins)
                        provided with [--rand-path] (default: 1, 2, ...)
  --rand-cache          cache the data in the project's cache directory

yaw_cli auto#

Description

Responsible for computing autocorrelations in bins of redshift by counting pairs in the reference or unknown sample(s) and storing the counts. This subcommand accepts just a few arguments, most importantly --which. If the value is ref (the default), computes the reference sample autocorrelation. If the value is unk, computes the autocorrelation for each of the unknown samples. The flag --no-rr signals to skip counting the random-random pairs.

Inputs

Either reference data and random sample, or unknown data and random sample(s).

Outputs

Autocorrelation pair counts for the reference (and possibly unknown) sample(s). Stored per patch and redshift bin as HDF5 files and for each scale. When computing the reference sample autocorrelation, data is stored at estimate/[scale]/auto_reference.hdf. When computing the unknown sample autocorrelation, data is stored for each subset at estimate/[scale]/auto_unknown_[subset].hdf, where subset is a running index.

Depends on

cross (if computing unknown sample autocorrelation)

Dependants

zcc

Note

When computing the unknown sample autocorrelation, --unk-z and --rand-z must be provided when specifing the unknown sample with the cross subcommand.

$ yaw_cli auto --help
usage: yaw_cli auto [-h] [-v] [--threads <int>] [--progress]
                    [--which {ref,unk}] [--no-rr]
                    <directory>

Measure the angular autocorrelation function amplitude of the reference
sample. Can be applied to the unknown sample if redshift point-estimates are
available.

positional arguments:
  <directory>        project directory, must exist

options:
  -h, --help         show this help message and exit
  -v, --verbose      show additional information in terminal, repeat to show
                     debug messages
  --threads <int>    number of threads to use (default: from configuration)
  --progress         show a progress bar if the backend supports it
  --which {ref,unk}  for which sample the autocorrelation should be computed
                     (default: ref, requires redshifts [--*-z] for data and
                     random sample)
  --no-rr            do not compute random-random pair counts

yaw_cli ztrue#

Description

Computes histograms of the true redshift distribution of the unknown sample(s) if a redshift column (--unk-z) is provided in cross. The typical use case is measuring clustering redshifts on simulated datasets, where the true redshifts are known and a consistently measured distribution is of interest for comparison.

Inputs

Unknown data sample(s).

Outputs

Histogram counts, samples and a covariance, stored as ASCII files with file extensions .dat, .smp, and .cov at true/nz_true_[subset].*, where subset is a running index.

Depends on

cross

Dependants

plot

$ yaw_cli ztrue --help
usage: yaw_cli ztrue [-h] [-v] [--threads <int>] [--progress] <directory>

Compute the redshift distributions of the unknown data sample(s), which
requires providing point-estimate redshifts for the catalog.

positional arguments:
  <directory>      project directory, must exist

options:
  -h, --help       show this help message and exit
  -v, --verbose    show additional information in terminal, repeat to show
                   debug messages
  --threads <int>  number of threads to use (default: from configuration)
  --progress       show a progress bar if the backend supports it

yaw_cli cache#

Print a summary of the data catalogues stored in the cache directory. When providing the --drop flag, deletes the cached data catalogues.

Warning

After running yaw_cli cache --drop none of cross, auto, or ztrue are available anymore if they require cataloges that have been loaded using the --*-cache flags.

$ yaw_cli cache --help
usage: yaw_cli cache [-h] [-v] [--drop] <directory>

Get a summary of the project's cache directory (location, size, etc.) or
remove entries with --drop.

positional arguments:
  <directory>    project directory, must exist

options:
  -h, --help     show this help message and exit
  -v, --verbose  show additional information in terminal, repeat to show debug
                 messages
  --drop         drop all cache entries

yaw_cli zcc#

Description

Converts pair counts to correlation function estimates for each measurement scale. Produces clustering redshift estimates and stores them as ASCII files. The outputs depend on the available inputs:

  • If any autocorrelation has been measured with auto, produces a a correlation function estimate in bins of redshift. Pair counts are resampled using patches to estimate uncertainties and covariances.

  • If the crosscorrelations have been measured with cross, produces a clustering redshift estimate the similar way. If availble, the reference and unknown sample autocorrelation function(s) are used to mitigate galaxy bias.

The command’s arguments specify the correlation estimator used to convert pair counts to correlation functions. Other arguments specify spatial resampling method used for uncertainty and covariance estiamtes. By default, all autocorrelation function data is used for bias mitigation. To omit correcting for the reference or unknown samples biases, the flags --no-bias-ref and --no-bias-unk can be provided.

Note

The script can be run multiple times with different arguments. Each run can be tagged using the --tag argument, the default tag is fid. Data from each tag are stored in different output directories, see the output naming convention below. Each run is also recorded with its respective tag in the setup.yaml file.

Inputs

Pair count files produced by cross and/or auto.

Outputs

Clustering redshift estimates, samples and a covariance, stored as ASCII files with file extensions .dat, .smp, and .cov. The estimates are produced for each scale and tag separately at estimate/[scale]/[tag]/nz_cc_[subset].*, where subset is a running index. Same for any measured autocorrelation functions, but using auto_reference.* and auto_unknown_[subset].* as file name templates.

Depends on

cross and/or auto

Dependants

zcc

$ yaw_cli zcc --help
usage: yaw_cli zcc [-h] [-v] [--tag TAG] [--no-bias-ref] [--no-bias-unk]
                   [--est-cross {PH,DP,HM,LS}] [--est-auto {PH,DP,HM,LS}]
                   [--method {jackknife,bootstrap}] [--no-crosspatch]
                   [--n-boot <int>] [--global-norm] [--seed <int>]
                   <directory>

Compute clustering redshift estimates for the unknown data sample(s),
optionally mitigating galaxy bias estimated from any measured autocorrelation
function.

positional arguments:
  <directory>           project directory, must exist

options:
  -h, --help            show this help message and exit
  -v, --verbose         show additional information in terminal, repeat to
                        show debug messages
  --tag TAG             unique identifier for different configurations
                        (default: fid)
  --no-bias-ref         whether to mitigate the reference sample bias using
                        its autocorrelation function (if available)
  --no-bias-unk         whether to mitigate the unknown sample bias using its
                        autocorrelation functions (if available)

correlation estimators:
  configure estimators for the different types of correlation functions

  --est-cross {PH,DP,HM,LS}
                        correlation estimator for crosscorrelations (default:
                        LS or DP)
  --est-auto {PH,DP,HM,LS}
                        correlation estimator for autocorrelations (default:
                        LS or DP)

resampling:
  configure the resampling used for covariance estimates

  --method {jackknife,bootstrap}
                        resampling method for covariance estimates (default:
                        jackknife)
  --no-crosspatch       whether to include cross-patch pair counts when
                        resampling
  --n-boot <int>        number of bootstrap samples (default: 500)
  --global-norm         normalise pair counts globally instead of patch-wise
  --seed <int>          random seed for bootstrap sample generation (default:
                        12345)

yaw_cli plot#

Description

Generates automatic checkplots of the clustering redshift estimates and sample autocorrelations as function of redshift. If available, adds the measured true redshift distributions from ztrue to the plot of the redshift estimates. Each plot shows all combinations of measurement scales and tags (see zcc), which may result in a very crowded plot. The reference sample autocorrelation plot produces a single panel, whereas the unknown sample and clustering redshift estimates produce multiple panels, one for each subset provided.

Inputs

Correlation function and clustering redshift estimates produced by zcc, as well as redshift distributions from ztrue.

Outputs

Check plots in the estimate/ directory. They are named nz_estimate.png for the clustering redshift estimate and auto_reference.png / auto_unknown.png for the reference / unknown sample autocorrelations, respectively.

Depends on

zcc, ztrue

Dependants

$ yaw_cli plot --help
usage: yaw_cli plot [-h] [-v] <directory>

Plot the autocorrelations and redshift estimates into the 'estimate'
directory.

positional arguments:
  <directory>    project directory, must exist

options:
  -h, --help     show this help message and exit
  -v, --verbose  show additional information in terminal, repeat to show debug
                 messages