RJMCMC-burnin Gibbs Sampler (fits exactly) >> Gibbs Sampler

As Pdf

1 Data

library(spectralmc)

true_kp <- 9
fit_kp <- 9
noise_sigma <- 0.001
signal_model <- voigt.model

data <- data.synthetic.stormyclouds(seed = 53252, kp = true_kp, noise = 0.001)
x <- data$x
y <- data$y

2 Gibbs Sampler

Uses a collapsed (marginalizes over a, b, amp numerically) gibbs sampler which approximates the conditionals with slice sampling. Also re-parameterizes density with (sqrt(gwidth^2+lwidth^2), gwidth-lwidth) instead of (gwidth, lwitdh) if min(gwidth/lwidth, lwidth/gwidth) > 0.05 (otherwise transformation becomes unstable). Uses a Likelihood-ratio test to stop slice sampling early. Slice sampling rerun a few times (3-5) to generate draws from which one is sampled according to their posterior density (prevents slice sampling failure).

  • Takes 1-4s per peak and iteration
    • 22h runtime for this notebook
    • Thus only usable with a remote cluster (I wont use anymore of my google.colab quota for this)
  • short burn-in
  • start_point is essentially irrelevant (modulo the label switching problem, gelman-rubin will be terrible due to that)
  • small auto-correlation (looks like it in traces at least)
  • fits ok
  • proposal free
  • non-informative priors
    • prior_a = Unif(1e-2, 1e2)
    • prior_b = Unif(-Inf, Inf)
    • prior_amp = Unif(0, Inf)
    • prior_pos = Unif(100, 900)
    • prior_lwidth = Unif(0, Inf)
    • prior_gwidth = Unif(0, Inf)
  • pos, lwidth, gwidth prior can be chosen arbitrarily; not done here due to time constraints
max_a <- 1e-2
lower <- c(-Inf, -max_a, 0) #lower_b, lower_a, lower_amp; encodes lower bounds of prior.Unif
upper <- c(Inf, max_a, Inf) #upper_b, upper_a, upper_amp; encodes lower bounds of prior.Unif

start_params <- prop.voigt.rnd_start(seed = 33, kp = fit_kp)

res <- chains.collapsed_slicer_gibbs(
        x, y, signal_model,
        start_params, lower, upper, noise_sigma,
        iter = 1000, print_bar = TRUE, half_steps = TRUE, decorr_trafo = TRUE
)

2.1 Results

2.1.1 Fit

2.1.1.1 All

plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 50)

2.1.1.2 Burn-in

plt.fit.chain.interactive(x, y, res$samples[seq(1, 40)], signal_model, step_width = 2)

2.1.2 Traceplots

plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 10)
plt.traceplot(res$samples, "pos")
plt.traceplot(res$samples, "amp")
plt.traceplot(res$samples, "lwidth")
plt.traceplot(res$samples, "gwidth")
plt.traceplot(res$samples, "a")
plt.traceplot(res$samples, "b")

3 RJMCMC-burnin Gibbs Sampler

Gibbs Sampler starts by fitting one peak and adds a peak every add_peak_every iterations until fit_kp is reached at which point it just becomes a normal gibbs sampler. Helps resolving the peak deconvolution problem seen above and is an strong hint on the power of a full RJMCMC implementation.

  • Better fit than EM
  • Peak deconvolution (pos[1] and pos[7]) takes ages to resolve
peak_spawner <- function(){
  params <- list(
          amp = rnorm(1, 1, 0.001),
          lwidth = rnorm(1, 0.1, 0.001),
          gwidth = rnorm(1, 4, 0.001),
          pos = runif(1, 100, 900),
          a = 0,
          b = 0
  )
  return(params)
}

res <- chains.rjmcmc_like_collapsed_slicer_gibbs(
        x,
        y,
        signal_model,
        peak_spawner,
        lower,
        upper,
        noise_sigma,
        fit_kp = fit_kp,
        add_peak_every = 10,
        iter = 5000,
        print_bar = TRUE,
        half_steps = TRUE,
        decorr_trafo = TRUE,
)

3.1 Results

3.1.1 Fit

3.1.1.1 All

plt.fit.chain.interactive(x, y, res_rjmcmc$samples, signal_model, step_width = 40)

3.1.1.2 Burn-in

plt.fit.chain.interactive(x, y, res_rjmcmc$samples[seq(1, 100)], signal_model, step_width = 2)

3.1.2 Traceplots

plt.traceplot(res_rjmcmc$samples, "pos")
plt.traceplot(res_rjmcmc$samples, "amp")
plt.traceplot(res_rjmcmc$samples, "lwidth")
plt.traceplot(res_rjmcmc$samples, "gwidth")
plt.traceplot(res_rjmcmc$samples, "a")
plt.traceplot(res_rjmcmc$samples, "b")

4 Todo

  • Critical Bug amp, a, b are not sampled instead they are set to roughly their means (not really either); Easy to fix.

  • Not a bug: Uses a safeguard which prevents messing up the fit if the slice sampler fails, this however destroys the gibbs sampler property and causes rejection plateaus.

  • Done: nu, be transformation becomes bad (why?) if either lwidth or gwidth is really small compared to the other, should switch to nu,be only if min(lwidth/gwidth, gwidth/lwidth) > 0.05

  • Peak deconvolution is the central problem (i.e. fitting strongly overlapping peaks). See the second fit where only 1 Peak is misplaced. RJMCMC would most likely instantly solve this problem as it would add an peak on the right flank shortly increasing \(kp\) to 10 and then removing the misplaced peak putting \(kp\) to 9 again. This is an massive advantage that seems to be impossible to model with fixed \(kp\). If a given \(kp\) is needed the \(kp\) prior could be slowly changed to degenerate to only the given value of \(kp\).

  • Done prop.voigt.rnd_start spawns to wide peaks for gibbs (the tuning for the spawner is very impactful)

  • Do Speed ups Develop heuristic to evaluate pmvnorm fast; better slice sampler; better early termination of slice sampler. Run profiler to figure out if pmvnorm is actually the bottleneck or X construction. If it is X could use caching and speed up code by 90%


  1. RWTH Aachen, ↩︎

  2. This work was developed under a Seed Fund Project (2021) of the RWTH Aachen University, funded under "the Excellence Strategy of the Federal Government and the Länder". Project: "Spectra-Bayes: A Bayesian statistical machine learning model for spectral reconstruction" (Project Leaders: Prof. Dr. Maria Kateri, Prof. Dr.-Ing. Hans-Jürgen Koß) ↩︎

---
title: "Collapsed Gibbs Sampler with Slice Sampling"
author: Jan Meißner^[RWTH Aachen, philipp.meissner@rwth-aachen.de]
output:
  html_notebook:
    toc: true
    toc_depth: 5
    toc_float:
      collapsed: false
    number_sections: true
---

<script type="text/javascript">
  function jump_header(key){
    $(':header:contains('+key+')')[0].scrollIntoView();
  }
  
  function jump_marked(key){
    $('#' + key)[0].scrollIntoView();
  }

</script>
**RJMCMC-burnin Gibbs Sampler <a onclick='jump_marked("exactgibbs");'>(fits exactly)</a> >> Gibbs Sampler**

# Data


```{r}
library(spectralmc)

true_kp <- 9
fit_kp <- 9
noise_sigma <- 0.001
signal_model <- voigt.model

data <- data.synthetic.stormyclouds(seed = 53252, kp = true_kp, noise = 0.001)
x <- data$x
y <- data$y
```

# Gibbs Sampler
Uses a collapsed (marginalizes over `a`, `b`, `amp` numerically)
gibbs sampler which approximates the conditionals with slice sampling. Also re-parameterizes density with `(sqrt(gwidth^2+lwidth^2), gwidth-lwidth)` instead of `(gwidth, lwitdh)` if `min(gwidth/lwidth, lwidth/gwidth) > 0.05` (otherwise transformation becomes unstable). Uses a Likelihood-ratio test to stop slice sampling early. Slice sampling rerun a few times (3-5) to generate draws from which one is sampled according to their posterior density (prevents slice sampling failure).

- Takes 1-4s per peak and iteration
  - **30-40h runtime** for this notebook
  - Thus only usable with a remote cluster (I wont use anymore of my google.colab quota for this)
- short burn-in
- start_point is essentially irrelevant (modulo the label switching problem, gelman-rubin will be terrible due to that)
- small auto-correlation (looks like it in traces at least)
- fits ok
- proposal free
- non-informative priors
  - prior_a = Unif(1e-2, 1e2)
  - prior_b = Unif(-Inf, Inf)
  - prior_amp = Unif(0, Inf)
  - prior_pos = Unif(100, 900)
  - prior_lwidth = Unif(0, Inf)
  - prior_gwidth = Unif(0, Inf)
- pos, lwidth, gwidth prior can be chosen arbitrarily; not done here due to time constraints
```{r eval = FALSE}
max_a <- 1e-2
lower <- c(-Inf, -max_a, 0) #lower_b, lower_a, lower_amp; encodes lower bounds of prior.Unif
upper <- c(Inf, max_a, Inf) #upper_b, upper_a, upper_amp; encodes lower bounds of prior.Unif

start_params <- prop.voigt.rnd_start(seed = 33, kp = fit_kp)

res <- chains.collapsed_slicer_gibbs(
        x, y, signal_model,
        start_params, lower, upper, noise_sigma,
        iter = 1000, print_bar = TRUE, half_steps = TRUE, decorr_trafo = TRUE
)
```
```{r  echo=FALSE}
setwd('C:/Users/Jan/Desktop/ISW_SpectraBayes/ISW_SpectraBayes/bayes_mcmc')
res <- list()
res$samples_with_half <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples_500_noRJMCMC.RData")
remove_half_steps <- function(samples, fit_kp) {
  samples[seq(from=1, to=length(samples), by=fit_kp + 1)]
}
res$samples <- remove_half_steps(res$samples, fit_kp = fit_kp)

```
## Results

### Fit

#### All
```{r}
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 50)
```

#### Burn-in
```{r}
plt.fit.chain.interactive(x, y, res$samples[seq(1, 40)], signal_model, step_width = 2)
```
### Traceplots
```{r}
plt.fit.chain.interactive(x, y, res$samples, signal_model, step_width = 10)
```
```{r}
plt.traceplot(res$samples, "pos")
plt.traceplot(res$samples, "amp")
plt.traceplot(res$samples, "lwidth")
plt.traceplot(res$samples, "gwidth")
plt.traceplot(res$samples, "a")
plt.traceplot(res$samples, "b")
```
# RJMCMC-burnin Gibbs Sampler
Gibbs Sampler starts by fitting one peak and adds a peak every `add_peak_every` iterations until `fit_kp` is reached at which point it just becomes a normal gibbs sampler. Helps resolving the peak deconvolution problem seen above and is an strong hint on the power of a full RJMCMC implementation.

- Better fit than EM
- Peak deconvolution (`pos[1]` and `pos[7]`) takes ages to resolve
```{r eval = FALSE}
peak_spawner <- function(){
  params <- list(
          amp = rnorm(1, 1, 0.001),
          lwidth = rnorm(1, 0.1, 0.001),
          gwidth = rnorm(1, 4, 0.001),
          pos = runif(1, 100, 900),
          a = 0,
          b = 0
  )
  return(params)
}

res <- chains.rjmcmc_like_collapsed_slicer_gibbs(
        x,
        y,
        signal_model,
        peak_spawner,
        lower,
        upper,
        noise_sigma,
        fit_kp = fit_kp,
        add_peak_every = 10,
        iter = 5000,
        print_bar = TRUE,
        half_steps = TRUE,
        decorr_trafo = TRUE,
)
```
```{r  echo=FALSE}
remove_half_steps_rjmcmc <- function(true_samples, fit_kp, add_peak_every, iter = 5000, half_steps = TRUE) {
  res_list <- list()
  tot_index <- 0

  curr <- list(pos = c(0))
  samples <- list()
  #####
  tot_index <- tot_index + 1
  if (length(true_samples)<tot_index) {return(res_list)}
  res_list <- c(res_list, list(true_samples[[tot_index]]))
  #####
  samples <- c(samples, list(curr))
  for (i in seq(iter)) {

    if (i %% add_peak_every == 0 && i > 0 && length(curr$pos) < fit_kp){
      curr$pos <- c(curr$pos, 0)
    }

    kp <- length(curr$pos)
    for (k in seq(kp)) {
      if (half_steps) {
        #####
        tot_index <- tot_index + 1
        if (length(true_samples)<tot_index) {return(res_list)}
        #res_list <- c(res_list, list(true_samples[[tot_index]]))
        #####
        samples <- c(samples, list(curr))
      } #only if half steps
    }
    #####
    tot_index <- tot_index + 1
    if (length(true_samples)<tot_index) {return(res_list)}
    res_list <- c(res_list, list(true_samples[[tot_index]]))
    #####
    samples <- c(samples, list(curr))
  }
  return(true_samples)
}

remove_half_steps <- function(samples, fit_kp) {
  samples[seq(from=1, to=length(samples), by=fit_kp + 1)]
}

pad_with_zeros <- function(samples, fit_kp){
  for (i in seq_along(samples)){
    sample <- samples[[i]]
    for (name in names(sample)){
      if (name != 'a' && name != 'b' && name != 'amp'){
        sample[[name]] <- c(rep(1, fit_kp-length(sample[[name]])), sample[[name]])
      }
      if (name == 'amp'){
        sample[[name]] <- c(rep(0, fit_kp-length(sample[[name]])), sample[[name]])
      }
    }
    samples[[i]] <- sample
  }
  samples
}
setwd('C:/Users/Jan/Desktop/ISW_SpectraBayes/ISW_SpectraBayes/bayes_mcmc')
res_rjmcmc <- list()
res_rjmcmc$samples_with_half <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples_1_2k_1044.RData")
res_rjmcmc$sec <- readRDS("notebooks/collapsed_slicer_transformed_gibbs/samples.RData")

res_rjmcmc$samples1 <- remove_half_steps_rjmcmc(res_rjmcmc$samples_with_half, fit_kp = fit_kp, add_peak_every = 10, iter = 10000, half_steps = TRUE)
res_rjmcmc$samples2 <- remove_half_steps(res_rjmcmc$sec, fit_kp = fit_kp)

res_rjmcmc$samples1 <- pad_with_zeros(res_rjmcmc$samples1, fit_kp = fit_kp)
res_rjmcmc$samples2 <- pad_with_zeros(res_rjmcmc$samples2, fit_kp = fit_kp)

res_rjmcmc$samples <- c(res_rjmcmc$samples1, res_rjmcmc$samples2[seq(2, length(res_rjmcmc$samples2))])

# 1-2k has 24h execution but due to infer_c slower take 15h
```
## Results

### Fit
#### All
<div id="exactgibbs"></div>
```{r}
plt.fit.chain.interactive(x, y, res_rjmcmc$samples, signal_model, step_width = 40)
```
#### Burn-in
```{r}
plt.fit.chain.interactive(x, y, res_rjmcmc$samples[seq(1, 100)], signal_model, step_width = 2)
```
### Traceplots
```{r}
plt.traceplot(res_rjmcmc$samples, "pos")
plt.traceplot(res_rjmcmc$samples, "amp")
plt.traceplot(res_rjmcmc$samples, "lwidth")
plt.traceplot(res_rjmcmc$samples, "gwidth")
plt.traceplot(res_rjmcmc$samples, "a")
plt.traceplot(res_rjmcmc$samples, "b")
```

# Todo
- **Critical Bug** amp, a, b are not sampled instead they are set to roughly their means (not really either); Easy to fix.
- **Not a bug**: Uses a safeguard which prevents messing up the fit if the slice sampler fails, this however destroys the gibbs sampler property and causes rejection plateaus.
- **Done**: nu, be transformation becomes bad (why?) if either lwidth or gwidth is really small compared to the other, should switch to nu,be only if `min(lwidth/gwidth, gwidth/lwidth) > 0.05`
<!--
- Unclean: Instead of trying to regularize the -Inf in logprobs away just reject the slice sampler and rerun until it does find a new point.
- How to deal with peak degeneracy? i.e. same pos gwidth lwidth but amps are split between two peaks -> runs deeper than this connected to peak deconvolution problem -> increase fit_kp slowly like in EM (but then this isnt a gibbs anymore)
- Peak deconvolution is actually the last massive problem, that is when peaks strongly interact with each other and the posterior becomes strongly correlated. The gibbs sampler becomes as usual very slow. Not possible to fix with hastings as proposal is impossible to find. A subproblem of that is that sometimes a single peak is fit with as many as 3 peaks, gibbs (and hastings) can't resolve this issue easily. One possible way to fix this is to set amp prior that penalizes larger `amps`; gaussian trunct prior could be used but not very satisfactory prior other priors cant really be used due to collapse
- **Idea:**(!amp is not height of peak! amp*Voigt(...), voigt has an width dependent height factor aswell!) Set prior on amp to something like $$\mathcal{N}(x | -5, sd=10) 1_{x \in [0, \infty)} $$ to penalize larger amps such that there is a local gradient in the posterior that rewards fittings peaks with as little peaks as possible.
-->
- Peak deconvolution is the central problem (i.e. fitting strongly overlapping peaks). See the second fit where only 1 Peak is misplaced. RJMCMC would most likely instantly solve this problem as it would *add* an peak on the right flank shortly increasing $kp$ to 10 and then removing the misplaced peak putting $kp$ to 9 again. This is an massive advantage that seems to be impossible to model with fixed $kp$. If a given $kp$ is needed the $kp$ prior could be slowly changed to degenerate to only the given value of $kp$.

- **Done** prop.voigt.rnd_start spawns to wide peaks for gibbs (the tuning for the spawner is very impactful)
- **Do Speed ups** Develop heuristic to evaluate pmvnorm fast; better slice sampler; better early termination of slice sampler. Run profiler to figure out if pmvnorm is actually the bottleneck or X construction. If it is X could use caching and speed up code by 90%
