Code structure¶
cosmofit is structured the following way:
a pipeline: compute things given some parameters
emulators: to (optionally) emulate the pipeline
profilers: find likelihood/posterior best fit, 1D profiles and 2D contours
samplers: sample the posterior
cosmofit can be run from Python directly, as examplified in notebooks. This makes it easy to query and plot any quantity calculated by the pipeline for any given set of parameters, which is also helpful for debugging purposes. Yet, it will usually be most convienent to work with configuration files.
Configuration file¶
Below is an example configuration file.
# First, let us define our pipeline, i.e. a sequence of calculators
# (named like, theory, pt, cosmo here)
pipeline:
# We open the namespace "QSO": all parameters will be automatically prefixed by "QSO"
# (except if those are defined in higher-level namespace)
QSO:
like: # Calculator name, which is purely arbitrary
# We need to specify where our likelihood "like" is defined, in the format module.Class --* it can be anywhere!
# We could also specify a pythonpath: location where to search for the said module
class: cosmofit.likelihoods.galaxy_clustering.PowerSpectrumMultipolesLikelihood
init: # Whatever parameters used for initialization of the likelihood class
# Note the f'' syntax: data_dir is replaced by its value in the configuration file (scroll down!)
data: f'{data_dir}/data.npy' # data path
covariance: f'{data_dir}/mock_*.npy' # list of mocks to use for on-the-fly covariance computation
klim: # k-limits
0: [0.02, 0.2]
2: [0.02, 0.2]
4: [0.02, 0.2]
kstep: 0.01 # k-binning
zeff: 0.5
fiducial: DESI # fiducial cosmology
# Since loading all mocks take a couple of seconds, save the likelihood to disk...
save: f'{output_dir}/QSO.like.npy'
# ... it will be directly reloaded by specifying load: True
#load: True
# This calculator has a method "plot" (to plot theory vs. data power spectrum)
# Give arguments there: it will be called when running 'cosmofit do config.yaml'
plot: f'{output_dir}/power.png'
theory:
class: cosmofit.theories.galaxy_clustering.LPTMomentsTracerPowerSpectrumMultipoles
# Some parameters are attached to this calculator
# Those are already specified in the yaml file in cosmofit/theories/galaxy_clustering directory,
# but we can update them here:
params:
b1:
# Reset b1 prior as a normal distribution of mean 1. and standard deviation 3.
prior:
dist: norm
loc: 1.
scale: 3.
# Look at the wildcard syntax, which captures all alpha0, alpha2, etc. parameters
alpha*:
# '.auto' is a keyword, meaning this parameter is analytically solved for
# '.auto' is equivalent to '.best' when running likelihood profiling (parameter is adjusted to minimize chi2 at each step)
# '.auto' is equivalent to '.marg' when sampling the posterior (parameter is marginalized over)
derived: '.auto'
sn*:
derived: '.auto'
pt:
class: cosmofit.theories.galaxy_clustering.LPTPowerSpectrumMultipoles
#load: f'{output_dir}/emulator.npy'
# This calculator is computationally expensive, so we compute an emulator for it; to do so, run e.g.:
# mpiexec -np 20 cosmofit emulate config.yaml
# Then, uncomment the 'load' line above
emulator:
init:
# Specify the derivative order for each parameter
# By default ('*'), 1, and for q's (qpar and qper), second order
order: {'*': 1, 'q*': 2}
save: f'{output_dir}/emulator.npy'
shapefit:
# Shapefit parameterization
class: cosmofit.theories.galaxy_clustering.ShapeFitPowerSpectrumParameterization
cosmo:
class: cosmofit.theories.primordial_cosmology.Cosmoprimo
# What emulator engine to use; here, Taylor expansion
emulate:
class: TaylorEmulator
# What sampler to use; here emcee
sample:
class: cosmofit.samplers.EmceeSampler
init:
#nwalkers: 40
# How many chains to run in parallel (rest of processes for internal likelihood parallelization)
chains: 1
max_tries: 10
run:
max_iterations: 10000
# Dump to disk every 100 step
check_every: 100
# Perform convergence checks
check: True
save: f'{output_dir}/chain_*.npy'
# What profiler to use; here minuit
profile:
class: cosmofit.profilers.MinuitProfiler
maximize:
# How many posterior maximation to run (from different seeds)
niterations: 10
save: f'{output_dir}/profiles.npy'
# Summary for chains
summarize:
source:
fn: f'{output_dir}/chain_0.npy'
burnin: 0.5
plot_triangle: f'{output_dir}/triangle.png'
plot_trace: f'{output_dir}/trace.png'
plot_autocorrelation_time: f'{output_dir}/autocorrelation_time.png'
plot_gelman_rubin:
fn: f'{output_dir}/gelman_rubin.png'
nsplits: 4
plot_geweke: f'{output_dir}/geweke.png'
# Summary for profiles
summarize:
source: f'{output_dir}/profiles.npy'
stats: f'{output_dir}/stats.tex'
# Run the pipeline at a given point (here, mean of the chains)
do:
source: f'{output_dir}/chain_0.npy'
do: plot
# Any variable that you can use anywhere
data_dir: _pk
# Note the '${}' syntax: ${HOME} will be replaced by the environment variable
output_dir: f'${HOME}/_fs'
If you are familiar with Cobaya, this should not look so different, but there are some notable differences. Before going into the details, let us just specify that:
cosmofit install config.yaml # installs all dependencies
cosmofit emulate config.yaml # builds the emulator for cosmofit.theories.galaxy_clustering.LPTPowerSpectrumMultipoles
# Then, just uncomment load: True in pipeline.QSO.pt section of config.yaml
cosmofit profile config.yaml # runs best fits
cosmofit summarize config.yaml # print bestfits
cosmofit sample config.yaml # sample posterior
# Re-arrange summary for chains as last one in config.yaml (last definition is kept)
cosmofit summarize config.yaml # some chain plots