Using the command tools#

The command line tool yaw_cli (separate installation required) operate on a single, unified output directory, in which configuration, input and output data are organised automatically.

Creating a new project#

We start by creating a new project called output with the yaw_cli init command and set the minimum required configuration parameters. We also define the input reference sample and random catalog and list the required column names.

Finally we want to split the data catalogs into 32 spatial patches that allow us later to get uncertainties for our clustering redshift measurements. By default these are created automatically using a k-means clustering algorithm.

$ yaw_cli init output \
    --rmin 100 --rmax 1000 \
    --zmin 0.07 --zmax 1.42 \
    --ref-path reference.fits \
    --ref-ra ra \
    --ref-dec dec \
    --ref-z z \
    --rand-path random.fits \
    --rand-ra ra \
    --rand-dec dec \
    --rand-z z \
    --n-patches 32

Note

Every project uses a unique reference sample. We cannot change the reference sample after creating the project.

Measuring correlations#

Next we want to measure the crosscorrelation of the reference sample with the unknown catalog, which we specify when running the yaw_cli cross command. Note that we can in principle provide as many input files with --unk-path as we would like (e.g. tomographic bins).

In the same way we measure the autocorrelation function of the reference sample to mitigate its galaxy bias evolution. In our case, the yaw_cli auto command takes no further inputs since most run parameters, including the reference sample, are already configured at this point.

$ yaw_cli cross output \
    --unk-path unknown.fits \
    --unk-ra ra \
    --unk-dec dec
$ yaw_cli auto output

Note

These two tools only measure the correlation pair counts. Here the cross-correlation contains the data-data and data-random counts, whereas the autocorrelation by defaults also includes the random-random pair counts.

Getting the clustering redshifts#

Finally we transform the pair count into correlation functions and obtain the clustering redshift estimate with the yaw_cli zcc command. We also create a simple check plot with the yaw_cli plot command.

$ yaw_cli zcc output
$ yaw_cli plot output

That is all. The project directory should now contain a number of files, the most important ones are:

output/
├─ estimate/
│  ├─ kpc100t1000/
│  │  └─ fid/
│  │     ├─ auto_reference.dat
│  │     └─ nz_cc_1.dat
│  ├─ auto_reference.png
│  └─ nz_estimate.png
├─ setup.log
└─ setup.yaml

The first file is a YAML configuration file which records all configuration, inputs and tasks applied, which makes this run reproducable.

The estimate directory contains the check plots of the redshift estimate and the reference sample autocorrelation function, which is a proxy for the galaxy bias. The data products are stored in kpc100t100/fid, the default name for our choice of scales. They are named n_cc_1.dat (redshifts estimate) and auto_reference.dat (reference autocorrelation) and are accompanied by a covariance matrix and jackknife samples in separate files.

Finally, there are automatically generated checkplots in the estimate directory, one for the reference sample autocorrelation function and one for the redshift estimate.

Tomographic binning and other subsets#

If the unknown sample is split into different subsets, e.g. tomographic redshift bins, these can be processed easily with yaw_cli by providing a list of unknown (and optionally random) data catalogues, e.g.:

$ yaw_cli cross output \
    --unk-path unknown1.fits unknown2.fits unknown3.fits \
    --unk-ra ra \
    --unk-dec dec

This would produce clustering redshift estimates for three subsets of the unknown data, in each case using the same reference sample as before. The redshift estimates in estimate/kpc100t100/fid are numbered automatically (counting from 1) and are called n_cc_1.dat, n_cc_2.dat, and n_cc_3.dat for this example. The automatically generated checkplot will contain three panels instead of one.