RJMCMC-burnin Gibbs Sampler
(fits exactly) >>
Gibbs Sampler
As Pdf
Data
library(spectralmc)
true_kp <- 9
fit_kp <- 9
noise_sigma <- 0.001
signal_model <- voigt.model
data <- data.synthetic.stormyclouds(seed = 53252, kp = true_kp, noise = 0.001)
x <- data$x
y <- data$y
Gibbs Sampler
Uses a collapsed (marginalizes over a
, b
,
amp
numerically) gibbs sampler which approximates the
conditionals with slice sampling. Also re-parameterizes density with
(sqrt(gwidth^2+lwidth^2), gwidth-lwidth)
instead of
(gwidth, lwitdh)
if
min(gwidth/lwidth, lwidth/gwidth) > 0.05
(otherwise
transformation becomes unstable). Uses a Likelihood-ratio test to stop
slice sampling early. Slice sampling rerun a few times (3-5) to generate
draws from which one is sampled according to their posterior density
(prevents slice sampling failure).
- Takes 1-4s per peak and iteration
- 22h runtime for this notebook
- Thus only usable with a remote cluster (I wont use anymore of my
google.colab quota for this)
- short burn-in
- start_point is essentially irrelevant (modulo the label switching
problem, gelman-rubin will be terrible due to that)
- small auto-correlation (looks like it in traces at least)
- fits ok
- proposal free
- non-informative priors
- prior_a = Unif(1e-2, 1e2)
- prior_b = Unif(-Inf, Inf)
- prior_amp = Unif(0, Inf)
- prior_pos = Unif(100, 900)
- prior_lwidth = Unif(0, Inf)
- prior_gwidth = Unif(0, Inf)
- pos, lwidth, gwidth prior can be chosen arbitrarily; not done here
due to time constraints
max_a <- 1e-2
lower <- c(-Inf, -max_a, 0) #lower_b, lower_a, lower_amp; encodes lower bounds of prior.Unif
upper <- c(Inf, max_a, Inf) #upper_b, upper_a, upper_amp; encodes lower bounds of prior.Unif
start_params <- prop.voigt.rnd_start(seed = 33, kp = fit_kp)
res <- chains.collapsed_slicer_gibbs(
x, y, signal_model,
start_params, lower, upper, noise_sigma,
iter = 1000, print_bar = TRUE, half_steps = TRUE, decorr_trafo = TRUE
)
Results
Fit
All
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 50)
Burn-in
plt.fit.chain.interactive(x, y, res$samples[seq(1, 40)], signal_model, step_width = 2)
Traceplots
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 10)
plt.traceplot(res$samples, "pos")
plt.traceplot(res$samples, "amp")
plt.traceplot(res$samples, "lwidth")
plt.traceplot(res$samples, "gwidth")
plt.traceplot(res$samples, "a")
plt.traceplot(res$samples, "b")
RJMCMC-burnin Gibbs
Sampler
Gibbs Sampler starts by fitting one peak and adds a peak every
add_peak_every
iterations until fit_kp
is
reached at which point it just becomes a normal gibbs sampler. Helps
resolving the peak deconvolution problem seen above and is an strong
hint on the power of a full RJMCMC implementation.
- Better fit than EM
- Peak deconvolution (
pos[1]
and pos[7]
)
takes ages to resolve
peak_spawner <- function(){
params <- list(
amp = rnorm(1, 1, 0.001),
lwidth = rnorm(1, 0.1, 0.001),
gwidth = rnorm(1, 4, 0.001),
pos = runif(1, 100, 900),
a = 0,
b = 0
)
return(params)
}
res <- chains.rjmcmc_like_collapsed_slicer_gibbs(
x,
y,
signal_model,
peak_spawner,
lower,
upper,
noise_sigma,
fit_kp = fit_kp,
add_peak_every = 10,
iter = 5000,
print_bar = TRUE,
half_steps = TRUE,
decorr_trafo = TRUE,
)
Results
Fit
All
plt.fit.chain.interactive(x, y, res_rjmcmc$samples, signal_model, step_width = 40)
Burn-in
plt.fit.chain.interactive(x, y, res_rjmcmc$samples[seq(1, 100)], signal_model, step_width = 2)
Traceplots
plt.traceplot(res_rjmcmc$samples, "pos")
plt.traceplot(res_rjmcmc$samples, "amp")
plt.traceplot(res_rjmcmc$samples, "lwidth")
plt.traceplot(res_rjmcmc$samples, "gwidth")
plt.traceplot(res_rjmcmc$samples, "a")
plt.traceplot(res_rjmcmc$samples, "b")
Todo
Critical Bug amp, a, b are not sampled instead
they are set to roughly their means (not really either); Easy to
fix.
Not a bug: Uses a safeguard which prevents
messing up the fit if the slice sampler fails, this however destroys the
gibbs sampler property and causes rejection plateaus.
Done: nu, be transformation becomes bad (why?)
if either lwidth or gwidth is really small compared to the other, should
switch to nu,be only if
min(lwidth/gwidth, gwidth/lwidth) > 0.05
Peak deconvolution is the central problem (i.e. fitting strongly
overlapping peaks). See the second fit where only 1 Peak is misplaced.
RJMCMC would most likely instantly solve this problem as it would
add an peak on the right flank shortly increasing \(kp\) to 10 and then removing the misplaced
peak putting \(kp\) to 9 again. This is
an massive advantage that seems to be impossible to model with fixed
\(kp\). If a given \(kp\) is needed the \(kp\) prior could be slowly changed to
degenerate to only the given value of \(kp\).
Done prop.voigt.rnd_start spawns to wide peaks
for gibbs (the tuning for the spawner is very impactful)
Do Speed ups Develop heuristic to evaluate
pmvnorm fast; better slice sampler; better early termination of slice
sampler. Run profiler to figure out if pmvnorm is actually the
bottleneck or X construction. If it is X could use caching and speed up
code by 90%
---
title: "Collapsed Gibbs Sampler with Slice Sampling"
author: Jan Meißner^[RWTH Aachen, philipp.meissner@rwth-aachen.de]
output:
  html_notebook:
    toc: true
    toc_depth: 5
    toc_float:
      collapsed: false
    number_sections: true
---

<script type="text/javascript">
  function jump_header(key){
    $(':header:contains('+key+')')[0].scrollIntoView();
  }
  
  function jump_marked(key){
    $('#' + key)[0].scrollIntoView();
  }

</script>
**RJMCMC-burnin Gibbs Sampler <a onclick='jump_marked("exactgibbs");'>(fits exactly)</a> >> Gibbs Sampler**

# Data


```{r}
library(spectralmc)

true_kp <- 9
fit_kp <- 9
noise_sigma <- 0.001
signal_model <- voigt.model

data <- data.synthetic.stormyclouds(seed = 53252, kp = true_kp, noise = 0.001)
x <- data$x
y <- data$y
```

# Gibbs Sampler
Uses a collapsed (marginalizes over `a`, `b`, `amp` numerically)
gibbs sampler which approximates the conditionals with slice sampling. Also re-parameterizes density with `(sqrt(gwidth^2+lwidth^2), gwidth-lwidth)` instead of `(gwidth, lwitdh)` if `min(gwidth/lwidth, lwidth/gwidth) > 0.05` (otherwise transformation becomes unstable). Uses a Likelihood-ratio test to stop slice sampling early. Slice sampling rerun a few times (3-5) to generate draws from which one is sampled according to their posterior density (prevents slice sampling failure).

- Takes 1-4s per peak and iteration
  - **30-40h runtime** for this notebook
  - Thus only usable with a remote cluster (I wont use anymore of my google.colab quota for this)
- short burn-in
- start_point is essentially irrelevant (modulo the label switching problem, gelman-rubin will be terrible due to that)
- small auto-correlation (looks like it in traces at least)
- fits ok
- proposal free
- non-informative priors
  - prior_a = Unif(1e-2, 1e2)
  - prior_b = Unif(-Inf, Inf)
  - prior_amp = Unif(0, Inf)
  - prior_pos = Unif(100, 900)
  - prior_lwidth = Unif(0, Inf)
  - prior_gwidth = Unif(0, Inf)
- pos, lwidth, gwidth prior can be chosen arbitrarily; not done here due to time constraints
```{r eval = FALSE}
max_a <- 1e-2
lower <- c(-Inf, -max_a, 0) #lower_b, lower_a, lower_amp; encodes lower bounds of prior.Unif
upper <- c(Inf, max_a, Inf) #upper_b, upper_a, upper_amp; encodes lower bounds of prior.Unif

start_params <- prop.voigt.rnd_start(seed = 33, kp = fit_kp)

res <- chains.collapsed_slicer_gibbs(
        x, y, signal_model,
        start_params, lower, upper, noise_sigma,
        iter = 1000, print_bar = TRUE, half_steps = TRUE, decorr_trafo = TRUE
)
```
```{r  echo=FALSE}
setwd('C:/Users/Jan/Desktop/ISW_SpectraBayes/ISW_SpectraBayes/bayes_mcmc')
res <- list()
res$samples_with_half <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples_500_noRJMCMC.RData")
remove_half_steps <- function(samples, fit_kp) {
  samples[seq(from=1, to=length(samples), by=fit_kp + 1)]
}
res$samples <- remove_half_steps(res$samples, fit_kp = fit_kp)

```
## Results

### Fit

#### All
```{r}
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 50)
```

#### Burn-in
```{r}
plt.fit.chain.interactive(x, y, res$samples[seq(1, 40)], signal_model, step_width = 2)
```
### Traceplots
```{r}
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 10)
```
```{r}
plt.traceplot(res$samples, "pos")
plt.traceplot(res$samples, "amp")
plt.traceplot(res$samples, "lwidth")
plt.traceplot(res$samples, "gwidth")
plt.traceplot(res$samples, "a")
plt.traceplot(res$samples, "b")
```
# RJMCMC-burnin Gibbs Sampler
Gibbs Sampler starts by fitting one peak and adds a peak every `add_peak_every` iterations until `fit_kp` is reached at which point it just becomes a normal gibbs sampler. Helps resolving the peak deconvolution problem seen above and is an strong hint on the power of a full RJMCMC implementation.

- Better fit than EM
- Peak deconvolution (`pos[1]` and `pos[7]`) takes ages to resolve
```{r eval = FALSE}
peak_spawner <- function(){
  params <- list(
          amp = rnorm(1, 1, 0.001),
          lwidth = rnorm(1, 0.1, 0.001),
          gwidth = rnorm(1, 4, 0.001),
          pos = runif(1, 100, 900),
          a = 0,
          b = 0
  )
  return(params)
}

res <- chains.rjmcmc_like_collapsed_slicer_gibbs(
        x,
        y,
        signal_model,
        peak_spawner,
        lower,
        upper,
        noise_sigma,
        fit_kp = fit_kp,
        add_peak_every = 10,
        iter = 5000,
        print_bar = TRUE,
        half_steps = TRUE,
        decorr_trafo = TRUE,
)
```
```{r  echo=FALSE}
remove_half_steps_rjmcmc <- function(true_samples, fit_kp, add_peak_every, iter = 5000, half_steps = TRUE) {
  res_list <- list()
  tot_index <- 0

  curr <- list(pos = c(0))
  samples <- list()
  #####
  tot_index <- tot_index + 1
  if (length(true_samples)<tot_index) {return(res_list)}
  res_list <- c(res_list, list(true_samples[[tot_index]]))
  #####
  samples <- c(samples, list(curr))
  for (i in seq(iter)) {

    if (i %% add_peak_every == 0 && i > 0 && length(curr$pos) < fit_kp){
      curr$pos <- c(curr$pos, 0)
    }

    kp <- length(curr$pos)
    for (k in seq(kp)) {
      if (half_steps) {
        #####
        tot_index <- tot_index + 1
        if (length(true_samples)<tot_index) {return(res_list)}
        #res_list <- c(res_list, list(true_samples[[tot_index]]))
        #####
        samples <- c(samples, list(curr))
      } #only if half steps
    }
    #####
    tot_index <- tot_index + 1
    if (length(true_samples)<tot_index) {return(res_list)}
    res_list <- c(res_list, list(true_samples[[tot_index]]))
    #####
    samples <- c(samples, list(curr))
  }
  return(true_samples)
}

remove_half_steps <- function(samples, fit_kp) {
  samples[seq(from=1, to=length(samples), by=fit_kp + 1)]
}

pad_with_zeros <- function(samples, fit_kp){
  for (i in seq_along(samples)){
    sample <- samples[[i]]
    for (name in names(sample)){
      if (name != 'a' && name != 'b' && name != 'amp'){
        sample[[name]] <- c(rep(1, fit_kp-length(sample[[name]])), sample[[name]])
      }
      if (name == 'amp'){
        sample[[name]] <- c(rep(0, fit_kp-length(sample[[name]])), sample[[name]])
      }
    }
    samples[[i]] <- sample
  }
  samples
}
setwd('C:/Users/Jan/Desktop/ISW_SpectraBayes/ISW_SpectraBayes/bayes_mcmc')
res_rjmcmc <- list()
res_rjmcmc$samples_with_half <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples_1_2k_1044.RData")
res_rjmcmc$sec <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples.RData")

res_rjmcmc$samples1 <- remove_half_steps_rjmcmc(res_rjmcmc$samples_with_half, fit_kp = fit_kp, add_peak_every = 10, iter = 10000, half_steps = TRUE)
res_rjmcmc$samples2 <- remove_half_steps(res_rjmcmc$sec, fit_kp = fit_kp)

res_rjmcmc$samples1 <- pad_with_zeros(res_rjmcmc$samples1, fit_kp = fit_kp)
res_rjmcmc$samples2 <- pad_with_zeros(res_rjmcmc$samples2, fit_kp = fit_kp)

res_rjmcmc$samples <- c(res_rjmcmc$samples1, res_rjmcmc$samples2[seq(2, length(res_rjmcmc$samples2))])

# 1-2k has 24h execution but due to infer_c slower take 15h
```
## Results

### Fit
#### All
<div id="exactgibbs"></div>
```{r}
plt.fit.chain.interactive(x, y, res_rjmcmc$samples, signal_model, step_width = 40)
```
#### Burn-in
```{r}
plt.fit.chain.interactive(x, y, res_rjmcmc$samples[seq(1, 100)], signal_model, step_width = 2)
```
### Traceplots
```{r}
plt.traceplot(res_rjmcmc$samples, "pos")
plt.traceplot(res_rjmcmc$samples, "amp")
plt.traceplot(res_rjmcmc$samples, "lwidth")
plt.traceplot(res_rjmcmc$samples, "gwidth")
plt.traceplot(res_rjmcmc$samples, "a")
plt.traceplot(res_rjmcmc$samples, "b")
```

# Todo
- **Critical Bug** amp, a, b are not sampled instead they are set to roughly their means (not really either); Easy to fix.
- **Not a bug**: Uses a safeguard which prevents messing up the fit if the slice sampler fails, this however destroys the gibbs sampler property and causes rejection plateaus.
- **Done**: nu, be transformation becomes bad (why?) if either lwidth or gwidth is really small compared to the other, should switch to nu,be only if `min(lwidth/gwidth, gwidth/lwidth) > 0.05`
<!--
- Unclean: Instead of trying to regularize the -Inf in logprobs away just reject the slice sampler and rerun until it does find a new point.
- How to deal with peak degeneracy? i.e. same pos gwidth lwidth but amps are split between two peaks -> runs deeper than this connected to peak deconvolution problem -> increase fit_kp slowly like in EM (but then this isnt a gibbs anymore)
- Peak deconvolution is actually the last massive problem, that is when peaks strongly interact with each other and the posterior becomes strongly correlated. The gibbs sampler becomes as usual very slow. Not possible to fix with hastings as proposal is impossible to find. A subproblem of that is that sometimes a single peak is fit with as many as 3 peaks, gibbs (and hastings) can't resolve this issue easily. One possible way to fix this is to set amp prior that penalizes larger `amps`; gaussian trunct prior could be used but not very satisfactory prior other priors cant really be used due to collapse
- **Idea:**(!amp is not height of peak! amp*Voigt(...), voigt has an width dependent height factor aswell!) Set prior on amp to something like $$\mathcal{N}(x | -5, sd=10) 1_{x \in [0, \infty)} $$ to penalize larger amps such that there is a local gradient in the posterior that rewards fittings peaks with as little peaks as possible.
-->
- Peak deconvolution is the central problem (i.e. fitting strongly overlapping peaks). See the second fit where only 1 Peak is misplaced. RJMCMC would most likely instantly solve this problem as it would *add* an peak on the right flank shortly increasing $kp$ to 10 and then removing the misplaced peak putting $kp$ to 9 again. This is an massive advantage that seems to be impossible to model with fixed $kp$. If a given $kp$ is needed the $kp$ prior could be slowly changed to degenerate to only the given value of $kp$.

- **Done** prop.voigt.rnd_start spawns to wide peaks for gibbs (the tuning for the spawner is very impactful)
- **Do Speed ups** Develop heuristic to evaluate pmvnorm fast; better slice sampler; better early termination of slice sampler. Run profiler to figure out if pmvnorm is actually the bottleneck or X construction. If it is X could use caching and speed up code by 90%
