All traceplots (and other plots) are subsampled to keep this file below 100MB; rejection plateaus can’t be seen in traceplots
lwidth
, gwidth
prior is not approx.
\(\frac{1}{x}\) but essentially a
positive constant prior for reasonable values of lwidth
,
gwidth
.initial.fit.EM
is not MAP; if anything it’s MLE
but it’s not even that, it is more like a heuristic / Blackbox.prop.NormPropToWidth
calculates the proposal
in Appendix D.Simple synthetic signal has 4 peaks. Plotted later.
library("spectralmc")
# get synthetic data
data <- data.synthetic.stormyclouds(seed = 11, kp = 4, noise = 0.001)
x <- data$x
y <- data$y
Highly uninformative priors, uniform prior for position
pos
and a essentially positive constant prior for
lwidth
and gwidth
. Improper prior for
amplitude amp
, forces amp
to be positive.
Noise of the signal, noise_sigma
, is assumed to be known
and set to the true noise sigma present in the data. Framework allows to
easily model noise_sigma
as stochastic variable aswell, not
done here.
logprior <- LogPrior(
amp = prior.Positive(),
lwidth = prior.Gamma(1.01, 0.01), #shape rate
gwidth = prior.Gamma(1.01, 0.01), #shape rate
pos = prior.Unif(0, 1000), # min, max
a = prior.Norm(0, 0.00001), # mean, sd
b = prior.Unif(-1000, 1000)
)
fit_kp <- 4 # number of peaks to fit
signal_model <- voigt.model
posterior <- Posterior(Loglikeli(signal_model, noise_sigma = 0.001), logprior)
Starting point of the chain is calculated with a heuristic based on
the expectation-maximization algorithm. Essentially a blackbox for now.
Proposal for metropolis hastings (mhmc) is a simple normal distribution
with a diagonal covariance matrix. Implemented by sampling independently
for every variable type such amp
, pos
and
b
. See Proposal
in code. The mhmc algorithm is
run for 40000 iterations.
good_start <- initial.fit.EM(x, y, fit_kp, seed = 43423)
|================================================================================| 100% elap: 1s eta: 0s
proposal <- Proposal(
is_symmetric = TRUE, #flag proposal as symmetric; speeds up chains.mhmc
amp = prop.Norm(0.0005), #sd
lwidth = prop.Norm(0.003),
gwidth = prop.Norm(0.003),
pos = prop.Norm(0.0005),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
res <- chains.mhmc(x, y, good_start, posterior, proposal, iter = 40000, print_bar = FALSE)
|================================================================================| 100% elap: 20s eta: 0s
plt.fit.chain.interactive
Plots for each iteration the fit induced by the sample. If
step_width
is specified it only plots iterations which are
multiples of step_width
. Allows to understand why a chain
gets stuck (for example 2 peaks fitted as one), identify missing peaks
and visualize the burn-in process.
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 4000)
plt.traceplot
Groups traceplots for different peaks by parameter type. While the normal coda plots are needed for publications, there are simply to many plots (100+) when fitting many peaks.
Usage Example: We want to figure out the lwidth
of the
peak at around pos = 486
. First we plot the
`pos
traceplot, quickly we can then identify the peak
around pos = 486
to be pos[1]
. Zoom to
magnify
plt.traceplot(res$samples, "pos", steps=2000)
Now that we know that the peak is 1
we can look at the
lwidth
plot, and extract the lwidth[1]
trace
by double clicking the legend. Thus we can read of the
lwidth
to be 1.08
. This is near impossible to
do with coda plots for a high number of peaks.
plt.traceplot(res$samples, "lwidth", steps=2000)
All other traceplots are in the Appendix A
plt.posterior
Another very useful diagnostic is the posterior plot, which plots the
logposterior value of the i-th sample. This diagnostic should be roughly
monotonically increasing during burn-in and flat afterwards. It is a
computational cheap diagnostic that generalizes to a high number of
peaks as well. In the plot below (zoom in on the top) one can see that
the chain is still not burned-in, as the posterior is not flat. This is
in fact reflected in a slow drift of lwidths
which are
still quite wrong for the wide peaks. This is not trivial to see using
traceplots and other diagnostics. Zoom to magnify
plt.posterior(res$post_vals, steps = 1000)
plt.rolling_acceptions
We might also be interested in the acceptance rate. In the case that our chain hasn’t fully burned in yet a rolling mean / windowed version of the acceptance rate is useful.
plt.rolling_acceptions(res$acceptions, windowsize = 500, steps = 1000)
res_MHMC1 <- res # save for appendix
Finding a good proposal distribution, such as the one above, takes a lot of hyperparameter tuning.
The thinner and higher a peak is the preciser its pos
,
lwidth
and gwidth
becomes. Thus thinner peaks
need finer proposals. The traceplots (shown above) for the thinnest peak
are very good (pos[1]
, lwidth[1]
,
gwidth[1]
) but the proposal is too fine for the other peaks
(any other trace). Hence there is drift in their traces
(i.e. pos[2]
). Due to the thin peak a coarser proposal
yields a massive drop in the acceptance rate and isn’t a viable solution
to the problem either. Consequently there is no way to find
hyperparameters that have high acceptance and a low drift in the
traces.
One simple fix that gets around this problem is to have an asymmetric
proposal for which the standard deviation (sd
) of the
pos
, gwidth
and lwidth
proposal
is given by
sd <- unit_sd * sqrt(gwidth^2 + lwidth^2)/unit_width
(with hyperparameters unit_sd
and unit_width
)
(implementation in Appendix
D). Thus wider peaks have a bigger sd
and thinner peaks
a smaller sd
. This proposal actually is rather important
and fixes issues that occur in the burn-in of the chain and is therefore
also used in the following section ‘Burn-in Problems’.
proposal2 <- Proposal(
is_symmetric = FALSE,
amp = prop.Norm(0.0005), #sd
lwidth = prop.NormPropToWidth(0.003, 1.4),
gwidth = prop.NormPropToWidth(0.003, 1.4), # unit_sd, unit_width. Such that for a peak of width = unit_width the sd is unit_sd. Scales linearly for different peak widths.
pos = prop.NormPropToWidth(0.0005, 1.4),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002) #b = prop.Unif(0.02)
)
res <- chains.mhmc(x, y, good_start, posterior, proposal2, iter = 40000, print_bar = FALSE)
|================================================================================| 100% elap: 25s eta: 0s
The resulting chain is far better, which becomes obvious when looking
at the taceplot for lwidth
and comparing it to the
traceplot for lwidth
using an symmetric proposal
(here). All other
diagnostics in the Appendix
B
# Compare this to previous 'Trace plot of lwidth'
plt.traceplot(res$samples, "lwidth", steps=2000)
res_MHMC2 <- res #save for appendix
Below is a chain shown that is stuck in a local max / saddle point. Why is it stuck? The starting point is already burned-in as it is a sample from the burned-in chain from the following section ‘Burn-in Problems’. Obtaining this burned-in chain takes roughly a million iterations and 5 different proposals. The traceplots and posterior plots are flat which heavily suggests that the chain is indeed burned-in, this too confirms the assumption that the starting point is already burned in.
However if this is the case, the chain is stuck in a quite bad local max / saddle point, only fitting 3 peaks. As shown below, the chain can’t leave the saddle point even after 300k iterations. There is no way to fix this trivially with a local method (any local proposal) as this is an innate property of the posterior.
More precisely, the reason for this issue here is that there is a
misplaced peak (pos[3]
) instead of it being at the location
of the first twin peaks it is located in a flat region of the
signal.
This might be a saddle point and resolve itself after a few million iterations. However it is in-feasible to calculate so many iterations.
start <- list(
pos = c(139.4636, 486.4108, 873.3819, 419.1684),
gwidth = c(25.5351788, 0.9830332, 266.8899655, 2.7727781),
lwidth = c(0.05998143, 1.08466880, 25.27266471, 9.86956400),
amp = c(2.0895949, 2.8062770, 0.8353660, 0.5985949),
a = -2.748646e-06,
b = 20.01264
)
# non symmetric proposal to allow for better position diffusion
proposal2 <- Proposal(
is_symmetric = FALSE,
amp = prop.Norm(0.0005),
lwidth = prop.NormPropToWidth(0.003, 1.4),
gwidth = prop.NormPropToWidth(0.003, 1.4), # unit_sd, unit_width. Such that for a peak of width = unit_width the sd is unit_sd. Scales linearly for different peak widths.
pos = prop.NormPropToWidth(0.0005, 1.4),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
res <- chains.mhmc(x, y, start, posterior, proposal2, iter = 300000, print_bar = FALSE)
|================================================================================| 100% elap: 5m eta: 0s
In the following plots we can see how it fails to fit the twin peaks,
and in the traceplot of pos
the missing/misplaced peak can
be found at pos approx. 900
. All other diagnostics are in
the Appendix C
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 60000)
plt.traceplot(res$samples, "pos", steps = 2000)
res_MHMC3 <- res
Major burn-in problems when starting from an problem agnostic
starting point that is random (prop.voigt.rnd_start
). A
coarse proposal is needed for a fast burn-in. This however causes a tiny
acceptance rate once burned in. Thus different proposals for the burn-in
and burned-in state are used. If we were to use the fine proposal from
the previous sections the burn-in is 10+ million iterations long.
Instead we start with a very coarse proposal and make it every few 10000
iterations finer. Clearly this is quite messy and very time intensive as
one has to tune, in this case, 5 proposals.
In the following very long burn-in code, a few interesting things occur. First a peak (peak 3) moves in a terrible position due to a odd saddle point in the posterior, this can be seen very obviously in the second interactive fit plots. Secondly, in the early stages of the burn-in two peaks fit the large peak with exactly the same position. It takes 400k iterations with the special asymmetric proposal to pull them apart and to get one of them to fit the small peak next to the big one.
start <- prop.voigt.rnd_start(343, fit_kp)
# first proposal, burns-in b.
b_burnin <- Proposal(
is_symmetric = TRUE,
amp = prop.Norm(0.0005),
lwidth = prop.Norm(0.003),
gwidth = prop.Norm(0.003),
pos = prop.Norm(0.005),
a = prop.Norm(0.00000005),
b = prop.Norm(0.2)
)
res <- chains.mhmc(x, y, start, posterior, b_burnin, iter = 1000, print_bar = FALSE)
|================================================================================| 100% elap: 1s eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 200)
start <- res$samples[[length(res$samples)]]
# second proposal, burns-in pos. interestingly as plt.fit.chain.interactive shows one peak moves into the the wrong direction.
# This is due to the non-trivial local-maxima / saddle point structure of the posterior.
pos_burnin <- Proposal(
is_symmetric = TRUE,
amp = prop.Norm(0.0005),
lwidth = prop.Norm(0.003),
gwidth = prop.Norm(0.003),
pos = prop.Norm(0.5),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
res <- chains.mhmc(x, y, start, posterior, pos_burnin, iter = 20000, print_bar = FALSE)
|================================================================================| 100% elap: 20s eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 1000)
start <- res$samples[[length(res$samples)]]
# burn in for widths
width_burnin <- Proposal(
is_symmetric = TRUE,
amp = prop.Norm(0.0005),
lwidth = prop.Norm(0.009),
gwidth = prop.Norm(0.009),
pos = prop.Norm(0.04),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
res <- chains.mhmc(x, y, start, posterior, width_burnin, iter = 20000, print_bar = FALSE)
|================================================================================| 100% elap: 18s eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 1000)
start <- res$samples[[length(res$samples)]]
# next burn in
next_burnin <- Proposal(
is_symmetric = TRUE,
amp = prop.Norm(0.0005), #sd
lwidth = prop.Norm(0.009),
gwidth = prop.Norm(0.009), # can also use prop.Unif(radius) for uniform proposal
pos = prop.Norm(0.04),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002) #b = prop.Unif(0.02)
)
res <- chains.mhmc(x, y, start, posterior, next_burnin, iter = 100000, print_bar = FALSE)
|================================================================================| 100% elap: 1m eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 5000)
start <- res$samples[[length(res$samples)]]
asym_burnin <- Proposal(
is_symmetric = FALSE,
amp = prop.Norm(0.0005),
lwidth = prop.NormPropToWidth(0.003, 1.4),
gwidth = prop.NormPropToWidth(0.003, 1.4), # unit_sd, unit_width. Such that for a peak of width = unit_width the sd is unit_sd. Scales linearly for different peak widths.
pos = prop.NormPropToWidth(0.0005, 1.4),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
# almost final burn in (please)
res <- chains.mhmc(x, y, start, posterior, asym_burnin, iter = 150000, print_bar = FALSE)
|================================================================================| 100% elap: 2m eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 5000)
start <- res$samples[[length(res$samples)]]
# can't be burned in yet as posterior isn't flat
plt.posterior(res$post_vals)
# non symmetric proposal to allow for better position diffusion
proposal2 <- Proposal(
is_symmetric = FALSE,
amp = prop.Norm(0.0005),
lwidth = prop.NormPropToWidth(0.003, 1.4),
gwidth = prop.NormPropToWidth(0.003, 1.4), # unit_sd, unit_width. Such that for a peak of width = unit_width the sd is unit_sd. Scales linearly for different peak widths.
pos = prop.NormPropToWidth(0.0005, 1.4),
a = prop.Norm(0.00000005),
b = prop.Norm(0.00002)
)
start <- res$samples[[length(res$samples)]]
res <- chains.mhmc(x, y, start, posterior, proposal2, iter = 300000, print_bar = FALSE)
|================================================================================| 100% elap: 5m eta: 0s
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 60000)
plt.traceplot(res$samples, "pos", steps = 2000)
plt.traceplot(res$samples, "amp", steps = 2000)
plt.traceplot(res$samples, "gwidth", steps = 2000)
plt.traceplot(res$samples, "lwidth", steps = 2000)
plt.posterior(res$post_vals, steps = 1000)
plt.rolling_acceptions(res$acceptions, windowsize = 1000, steps = 1000)
Plots for the first Chain.
plt.traceplot(res_MHMC1$samples, "gwidth", steps=2000)
plt.traceplot(res_MHMC1$samples, "amp", steps=2000)
plt.traceplot(res_MHMC1$samples, "a", steps=2000)
plt.traceplot(res_MHMC1$samples, "b", steps=2000)
Plots for the second Chain.
plt.fit.chain.interactive(x, y, res_MHMC2$samples, signal_model, step_width = 5000)
plt.traceplot(res_MHMC2$samples, "pos", steps=2000)
plt.traceplot(res_MHMC2$samples, "gwidth", steps=2000)
plt.traceplot(res_MHMC2$samples, "amp", steps=2000)
plt.traceplot(res_MHMC2$samples, "a", steps=2000)
plt.traceplot(res_MHMC2$samples, "b", steps=2000)
plt.posterior(res_MHMC2$post_vals, steps = 1000)
plt.rolling_acceptions(res_MHMC2$acceptions, windowsize = 1000, steps = 1000)
plt.fit.chain.interactive(x, y, res_MHMC3$samples, signal_model, step_width = 5000)
plt.traceplot(res_MHMC3$samples, "pos", steps=2000)
plt.traceplot(res_MHMC3$samples, "gwidth", steps=2000)
plt.traceplot(res_MHMC3$samples, "amp", steps=2000)
plt.traceplot(res_MHMC3$samples, "a", steps=2000)
plt.traceplot(res_MHMC3$samples, "b", steps=2000)
plt.posterior(res_MHMC3$post_vals, steps = 1000)
plt.rolling_acceptions(res_MHMC3$acceptions, windowsize = 1000, steps = 1000)
# prop.NormPropToWidth essentially does, where x could be pos, lwidth or gwidth:
width <- sqrt(params$gwidth^2 + params$lwidth^2)
sd <- width / self$unit_width * self$unit_sd
rnorm(length(x), mean = x, sd)
Ideas: - Change of variables; instead of lwidth, gwidth use FWHM and eta.
RWTH Aachen, philipp.meissner@rwth-aachen.de↩︎
This work was developed under a Seed Fund Project (2021) of the RWTH Aachen University, funded under "the Excellence Strategy of the Federal Government and the Länder". Project: "Spectra-Bayes: A Bayesian statistical machine learning model for spectral reconstruction" (Project Leaders: Prof. Dr. Maria Kateri, Prof. Dr.-Ing. Hans-Jürgen Koß) ↩︎