Help for package BayesSIM

Type:

Package

Title:

Integrated Interface of Bayesian Single Index Models using 'nimble'

Version:

0.2.1

Author:

Seowoo Jung [aut, cre], Eun-kyung Lee [aut]

Maintainer:

Seowoo Jung <jsw1347@ewha.ac.kr>

Description:

Provides tools for fitting Bayesian single index models with flexible choices of priors for both the index and the link function. The package implements model estimation and posterior inference using efficient MCMC algorithms built on the 'nimble' framework, allowing users to specify, extend, and simulate models in a unified and reproducible manner. The following methods are implemented in the package: Antoniadis et al. (2004) https://www.jstor.org/stable/24307224, Wang (2009) <doi:10.1016/j.csda.2008.12.010>, Choi et al. (2011) <doi:10.1080/10485251003768019>, Dhara et al. (2019) <doi:10.1214/19-BA1170>, McGee et al. (2023) <doi:10.1111/biom.13569>.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Depends:

R (≥ 4.1), nimble

Imports:

coda, magrittr, MASS, methods, mvtnorm, patchwork, tidyr, ggplot2, dplyr

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

Encoding:

UTF-8

LazyData:

true

NeedsCompilation:

Packaged:

2025-12-14 15:05:06 UTC; user

RoxygenNote:

7.3.3

Repository:

CRAN

Date/Publication:

2025-12-15 08:10:14 UTC

BayesSIM: Integrated Interface of Bayesian Single Index Models using 'nimble'

Description

Provides tools for fitting Bayesian single index models with flexible choices of priors for both the index and the link function. The package implements model estimation and posterior inference using efficient MCMC algorithms built on the 'nimble' framework, allowing users to specify, extend, and simulate models in a unified and reproducible manner. The following methods are implemented in the package: Antoniadis et al. (2004) https://www.jstor.org/stable/24307224, Wang (2009) doi:10.1016/j.csda.2008.12.010, Choi et al. (2011) doi:10.1080/10485251003768019, Dhara et al. (2019) doi:10.1214/19-BA1170, McGee et al. (2023) doi:10.1111/biom.13569.

Author(s)

Maintainer: Seowoo Jung jsw1347@ewha.ac.kr

Authors:

Eun-kyung Lee lee.eunk@ewha.ac.kr

Integrated Function for Bayesian Single-Index Regression

Description

Fitting a single–index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n in single integrated function.

Usage

BayesSIM(
  formula,
  data,
  indexprior = "fisher",
  link = "bspline",
  prior = NULL,
  init = NULL,
  method = "FB",
  lowerB = NULL,
  upperB = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

BayesSIM_setup(
  formula,
  data,
  indexprior = "fisher",
  link = "bspline",
  prior = NULL,
  init = NULL,
  method = "FB",
  lowerB = NULL,
  upperB = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

## S3 method for class 'bsim'
print(x, digits = 3, ...)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

indexprior

Index vector prior among "fisher" (default), "sphere", "polar", "spike".

link

Link function among "bspline" (default), "gp"

prior

Optional named list of prior settings. Further descriptions are in every specific model's Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in every specific model's Details section.

method

Character, gpSphere model has 3 different types of sampling method, fully Bayesian method ("FB"), empirical Bayes approach ("EB"), and empirical Gibbs sampler ("EG"). Assign one sampler method. Empirical sampling approach is recommended in high-dimensional data. By default, fully Bayesian approach is assigned.

lowerB

This parameter is only for gpSphere model. Numeric vector of element-wise lower bounds for the "L-BFGS-B" method. When the empirical Bayes or Gibbs sampler method is used, the marginal likelihood is optimized via optim(method = "L-BFGS-B"). The vector must be ordered as c(index vector, lengthscale, amp, sigma2). Note that sigma2 is included only for the empirical Bayes method (omit it for Gibbs). By default, the lower bounds are -1 for the index vector and -1e2 for logarithm of lengthscale, amp, and (if present) sigma2.

upperB

This parameter is only for gpSphere model. Numeric vector of element-wise upper bounds for the "L-BFGS-B" method. When the empirical Bayes or Gibbs sampler method is used, the marginal likelihood is optimized via optim(method = "L-BFGS-B"). The vector must be ordered as c(index vector, lengthscale, amp, sigma2). Note that sigma2 is included only for the empirical Bayes method (omit it for Gibbs). By default, the upper bounds are 1 for the index vector and 1e2 for logarithm of lengthscale, amp, and (if present) sigma2.

monitors

Optional character vector of additional monitor nodes. To check the names of the nodes, fit the model_setup function and then inspect the variable names stored in the model object using getVarMonitor.

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

x

A fitted BayesSIM object.

digits

Number of digits to display.

...

Additional arguments.

Details

Integrated function for Bayesian single-index model. Default model is von-Mises Fisher distribution for index vector with B-spline link function.

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version - bsFisher
fit1 <- BayesSIM(y ~ ., data = simdata,
                 niter = 5000, nburnin = 1000,
                 nchain = 1)

# Split version- bsFisher
models <- BayesSIM_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Simulated Single–index Data (n = 200, p = 4)

Description

Synthetic data from a single–index model y = f(X'\theta) + \varepsilon with f(u) = u^2 \exp(u) and \varepsilon \sim N(0,\sigma^2). The index vector is \theta = (2,1,1,1) / \| (2,1,1,1) \|_2 and \sigma = 0.5.

Usage

data(DATA1)

Format

X: Numeric matrix of size 200 \times 4, entries i.i.d. Unif(-1, 1)

y: Numeric vector of length 200.

Examples

data(DATA1)
str(DATA1)

Goodness of Fit for `BayesSIM`

Description

Generic function applied to BayesSIM. It extracts goodness of fit of the BayesSIM.

Usage

GOF(object)

## S3 method for class 'bsim'
GOF(object, ...)

Arguments

object

A fitted object of BayesSIM or individual model.

...

Additional arguments passed to other methods.

Value

Mean squared error of model with mean of MCMC sample is applied.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Additional Functions for Spline

Description

Additional Functions for Spline

Usage

SplineState

Format

An object of class list of length 1.

Construct a Fitted Model Object from Model Setup and MCMC Output

Description

Create a fitted bsim object by combining a BayesSIM setup object with MCMC samples returned by runMCMC().

Usage

as_bsim(setup, mcmc.out)

Arguments

setup

A BayesSIM setup object, typically the output of a _setup function.

mcmc.out

MCMC output corresponding to the result of a call to runMCMC().

Details

This function is mainly intended for workflows where the model structure and the MCMC sampling are performed separately. It collects the MCMC draws across chains, and returns an object of class "bsim" that can be used with generic functions such as summary(), plot(), and predict().

Value

An object of class "bsim" containing posterior samples, point estimates, fitted values, and related model information.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Bayesian Single-Index Regression with B-Spline Link and von Mises-Fisher Prior

Description

Fits a single-index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n where the link f(\cdot) is represented by B-spline and the index vector \theta has von Mises–Fisher prior.

Usage

bsFisher(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

bsFisher_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model uses a m-order polynomial spline with k interior knots as follows:

f(t) = \sum_{j=1}^{m+k} B_j(t)\,\beta_j

on [a, b] with t_i = X_i' \theta, i = 1,\cdots, n and \|\theta\|_2 = 1. \{\beta_j\}_{j=1}^{m+k} are spline coefficients and a_\theta, b_\theta are boundary knots where a_{\theta} = min(t_i, i = 1, \cdots, n) - \delta, and b_{\theta} = max(t_i, i = 1,\cdots, n) + \delta.

Priors

von Mises–Fisher prior on the index \theta with direction and concentration.
Conditioned on \theta and \sigma^2, the link coefficients \beta = (\beta_1,\ldots,\beta_{m+k})^\top follow normal distribution with estimated mean vector \hat{\beta}_{\theta} = (X_{\theta}'X_{\theta})^{-1}X_{\theta}'Y and covariance \sigma^2 (X_{\theta}^\top X_{\theta})^{-1}, where X_{\theta} is the B-spline basis design matrix.
Inverse gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling Random walk metropolis algorithm is used for index vector \theta. Given \theta, \sigma^2 and \beta are sampled from posterior distribution. Further sampling method is described in Antoniadis et al(2004).

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: von Mises–Fisher prior for the projection vector \theta (index). index_direction is a unit direction vector of the von Mises–Fisher distribution. By default, the estimated vector from projection pursuit regression is assigned. index_dispersion is the positive concentration parameter. By default, 150 is assigned.
Link function: B-spline basis and coefficient of B-spline setup.
- basis: For the basis of B-spline, link_basis_df is the number of basis functions (default 21), link_basis_degree is the spline degree (default 2) and link_basis_delta is a small jitter for boundary knots spacing control (default 0.001).
- beta: For the coefficient of B-spline, multivariate normal prior is assigned with mean link_beta_mu, and covariance link_beta_cov. By default, \mathcal{N}_p\!\big(0, \mathrm{I}_p\big) is assigned.
Error variance (sigma2): An Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 0.001, sigma2_rate = 100)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector: Initial unit index vector \theta. By default, the vector is randomly sampled from the von Mises–Fisher prior.
Link function: Initial spline coefficients (link_beta). By default, \big(X_{\theta}^\top X_{\theta} + \rho\, \mathrm{I}\big)^{-1} X_{\theta}^\top Y is computed, where X_{\theta} is the B-spline basis design matrix.
Error variance (sigma2): Initial scalar error variance (default 0.01).

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Antoniadis, A., Grégoire, G., & McKeague, I. W. (2004). Bayesian estimation in single-index models. Statistica Sinica, 1147-1164.

Hornik, K., & Grün, B. (2014). movMF: an R package for fitting mixtures of von Mises-Fisher distributions. Journal of Statistical Software, 58, 1-31.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- bsFisher(y ~ ., data = simdata,
                 niter = 5000, nburnin = 1000, nchain = 1)

# Split version
models <- bsFisher_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Bayesian Single-Index Regression with B-Spline Link and One-to-One Polar Transformation

Description

Fits a single-index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n where the link f(\cdot) is represented by B-spline link function and the index vector \theta is parameterized on the unit sphere via a one-to-one polar transformation.

Usage

bsPolar(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

bsPolar_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model uses a m-order polynomial spline with k interior knots as follows:

f(t) = \sum_{j=1}^{m+k} B_j(t)\,\beta_j

on [a, b] with t_i = X_i' \theta, i = 1,\cdots, n and \|\theta\|_2 = 1. \{\beta_j\}_{j=1}^{m+k} are spline coefficient and a_\theta, b_\theta are boundary knots where a_\theta = min(t_i, i = 1, \cdots, n) - \delta, and b_\theta = max(t_i, i = 1,\cdots, n) + \delta. \theta lies on the unit sphere (\|\theta\|_2=1) to ensure identifiability and is parameterized via a one-to-one polar transformation with angle \psi_1,...,\psi_{p-1}, where p is the dimension of predictor. Sampling is performed on the angular parameters \\psi defining the index vector.

The mapping is

\begin{aligned} \theta_1 &= \sin(\psi_1),\\ \theta_i &= \Big(\prod_{j=1}^{i-1}\cos(\psi_j)\Big)\sin(\psi_i), \quad i=2,\dots,p-1,\\ \theta_p &= \prod_{j=1}^{p-1}\cos(\psi_j). \end{aligned}

The vector is then scaled to unit length.

Priors

\psi is p-1 dimension of polar angle of index vector and re-scaled Beta distribution on [0, \pi] is assigned.
Conditioned on \theta and \sigma^2, the link coefficients \beta = (\beta_1,\ldots,\beta_{m+k})^\top follow normal distribution with estimated mean vector \hat{\beta}_{\theta} = (X_{\theta}'X_{\theta})^{-1}X_{\theta}'Y and covariance \sigma^2 (X_{\theta}^\top X_{\theta})^{-1}, where X_{\theta} is the B-spline basis design matrix.
Inverse gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling Samplers are automatically assigned by nimble.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: Only shape parameter index_psi_alpha of p-1 dimension vector is needed since rate parameters is computed to satisfy \theta_{j, \text{MAP}}. By default, the shape parameter for each element of the polar vector is set to 5000.
Link function: B-spline basis and coefficient of B-spline setup.
- basis: For the basis of B-spline, link_basis_df is the number of basis functions (default 21), link_basis_degree is the spline degree (default 2) and link_basis_delta is a small jitter for boundary-knot spacing control (default 0.001).
- beta: For the coefficient of B-spline, multivariate normal prior is assigned with mean link_beta_mu, and covariance link_beta_cov. By default, \mathcal{N}_p\!\big(0, \mathrm{I}_p\big) is assigned.
Error variance (sigma2): An Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 0.001, sigma2_rate = 100)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector: Initial vector of polar angle index_psi (p-1 dimension). Each element of angle is between 0 and \pi. By default, it is randomly draw from uniform distribution.
Link function: Initial spline coefficients(link_beta). By default, \big(X_{\theta}^\top X_{\theta} + \rho\, \mathrm{I}\big)^{-1} X_{\theta}^\top Y is computed, where X_{\theta} is the B-spline basis design matrix.
Error variance (sigma2): Initial scalar error variance (default 0.01).

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Antoniadis, A., Grégoire, G., & McKeague, I. W. (2004). Bayesian estimation in single-index models. Statistica Sinica, 1147-1164.

Hornik, K., & Grün, B. (2014). movMF: an R package for fitting mixtures of von Mises-Fisher distributions. Journal of Statistical Software, 58, 1-31.

Dhara, K., Lipsitz, S., Pati, D., & Sinha, D. (2019). A new Bayesian single index model with or without covariates missing at random. Bayesian analysis, 15(3), 759.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- bsPolar(y ~ ., data = simdata,
                niter = 5000, nburnin = 1000, nchain = 1)

# Split version
models <- bsPolar_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Bayesian Single-Index Regression with B-Spline Link and Half-Unit Sphere Prior

Description

Fits a single-index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n where the link f(\cdot) is represented by B-spline link and the index vector \theta is on half-unit sphere.

Usage

bsSphere(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

bsSphere_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model uses a m-order polynomial spline with k interior knots and intercept as follows:

f(t) = \sum_{j=1}^{1+m+k} B_j(t)\,\beta_j

on [a, b] with t_i = X_i' \theta, i = 1,\cdots, n and \|\theta\|_2 = 1. \{\beta_j\}_{j=1}^{m+k+1} are spline coefficients and a_\theta, b_\theta are boundary knots where a_{\theta} = min(t_i, i = 1, \cdots, n), and b_{\theta} = max(t_i, i = 1,\cdots, n). Variable selection is encoded by a binary vector \nu, equivalently setting components of \theta to zero when \nu_i = 0.

Priors

The variable selection indicator \nu has Beta–Bernoulli hierarchy prior. Beta hyper-prior on the Bernoulli–inclusion probability w, inducing p(\nu) \propto \mathrm{Beta}(r_1 + n_\nu, r_2 + p - n_\nu) where n_\nu = \Sigma_{i=1}^{p}I(\nu_i = 1). r_1, r_2 are shape and rate parameter of beta distribution.
Free‑knot prior: the number of knots k with mean \lambda_k. The knot locations \xi_i, i = 1,...,k have a Dirichlet prior on the scaled interval [0, 1].
Index vector prior is uniform on the half‑sphere of dimension n_\nu with first nonzero positive.
Conjugate normal–inverse-gamma on (\beta, \sigma^2) enables analytic integration for RJMCMC with covariance \tau \Sigma_0.

Sampling Posterior exploration follows a hybrid RJMCMC with six move types: add/remove predictor \nu, update \theta, add/delete/relocate a knot. The \theta update is a random‑walk Metropolis via local rotations in a two‑coordinate subspace. Knot changes use simple proposals with tractable acceptance ratios. Further sampling method is described in Wang (2009).

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: index_nu_r1, index_nu_r2 gives the shape and rate parameter of beta-binomial prior, respectively. (default index_nu_r1 = 1, index_nu_r2 = 1).
Link function: B-spline knots, basis and coefficient setup.
- knots: Free-knot prior for the spline. link_knots_lambda_k is the Poisson mean for the number of interior knot k (default 5). link_knots_maxknots is the maximum number of interior knots. If link_knots_maxknots is NULL, the number of interior knots is randomly drawn from a Poisson distribution.
- basis: For the basis of B-spline, link_basis_degree is the spline degree (default 2).
- beta: For the coefficient of B-spline, By default, link_beta_mu is a zero vector, link_beta_tau is set to the sample size, and link_beta_Sigma0 is the identity matrix of dimension 1 + k + m, where k is the number of interior knots and m is the spline order.
Error variance (sigma2): Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. Small values for shape and rate parameters yield a weakly-informative prior on \sigma^{2}. (default sigma2_shape = 0.001, sigma2_rate = 0.001)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector: index_nu is binary vector indicating active predictors for the index. index is initial unit-norm index vector \theta (automatically normalized if necessary, with the first nonzero element made positive for identifiability). The vector length must match the number of predictor. Ideally, positions where index_nu has a value of 1 should correspond to nonzero elements in \theta; elements corresponding to index_nu = 0 will be set to zero.
Link function: link_k is initial number of interior knots. link_knots is initial vector of interior knot positions in [0, 1], automatically scaled to the true boundary. Length of this vector should be equal to the initial value of k. link_beta is initial vector of spline coefficients. Length should be equal to the initial number of basis functions with intercept (1 + k + m).
Error variance (sigma2): Initial scalar error variance. (default 0.01)

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Wang, H.-B. (2009). Bayesian estimation and variable selection for single index models. Computational Statistics & Data Analysis, 53, 2617–2627.

Hornik, K., & Grün, B. (2014). movMF: an R package for fitting mixtures of von Mises-Fisher distributions. Journal of Statistical Software, 58, 1-31.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- bsSphere(y ~ ., data = simdata,
                 niter = 5000, nburnin = 1000, nchain = 1)

# Split version
models <- bsSphere_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Bayesian Single-Index Regression with B-Spline Link and Spike-and-Slab Prior

Description

Usage

bsSpike(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

bsSpike_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model uses a m-order polynomial spline with k interior knots as follows:

f(t) = \sum_{j=1}^{m+k} B_j(t)\,\beta_j

on [a, b] with t_i = X_i' \theta, i = 1,\cdots, n and \|\theta\|_2 = 1. \{\beta_j\}_{j=1}^{m+k} are spline coefficient and a_\theta and b_\theta are boundary knots where a_\theta = min(t_i, i = 1, \cdots, n) - \delta, and b_\theta = max(t_i, i = 1,\cdots, n) + \delta. \theta is a p-dimensional index vector subject to a spike-and-slab prior for variable selection with binary indicator variable \nu.

Priors

The variable selection indicator \nu has Beta–Bernoulli hierarchy prior. Beta hyper-prior on the Bernoulli–inclusion probability w, inducing p(\nu) \propto \mathrm{Beta}(r_1 + n_\nu, r_2 + p - n_\nu) where n_\nu = \Sigma_{i=1}^{p}I(\nu_i = 1). r_1, r_2 are shape and rate parameter of beta distribution.
Slab coefficients \theta have normal distribution with zero mean and \sigma_\theta^2 variance.
Conditioned on \theta and \sigma^2, the link coefficients \beta = (\beta_1,\ldots,\beta_{m+k})^\top follow normal distribution with estimated mean vector \hat{\beta}_{\theta} = (X_{\theta}'X_{\theta})^{-1}X_{\theta}'Y and covariance \sigma^2 (X_{\theta}^\top X_{\theta})^{-1}, where X_{\theta} is the B-spline basis design matrix.
Inverse gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling Samplers are automatically assigned by nimble.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: index_nu_r1, index_nu_r2 gives the shape and rate parameter of beta-binomial prior, respectively. For slab prior, normal distribution with zero mean is assigned for selected variables \theta. index_sigma_theta is for variance of \theta, and it is assigned 0.25 by default.
Link function: B-spline basis and coefficient of B-spline setup.
- basis: For the basis of B-spline, link_basis_df is the number of basis functions (default 21), link_basis_degree is the spline degree (default 2) and link_basis_delta is a small jitter for boundary-knot spacing control (default 0.01).
- beta: For the coefficient of B-spline, multivariate normal prior is assigned with mean link_beta_mu, and covariance link_beta_cov. By default, \mathcal{N}_p\!\big(0, \mathrm{I}_p\big)
Error variance (sigma2): Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 0.001, sigma2_rate = 100)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector:
- index_pi Initial selecting variable probability. (default: 0.5)
- index_nu Initial vector of inclusion indicators . By default, each nu is randomly drawn by Bernoulli(1/2)
- index Initial vector of index. By default, each element of index vector, which is chosen by \nu, is proposed by normal distribution.
Link function: Initial spline coefficients (link_beta). By default, \big(X_{\theta}^\top X_{\theta} + \rho\, \mathrm{I}\big)^{-1} X_{\theta}^\top Y is computed, where X_{\theta} is the B-spline basis design matrix.
Error variance (sigma2): Initial scalar error variance (default 0.01).

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Antoniadis, A., Grégoire, G., & McKeague, I. W. (2004). Bayesian estimation in single-index models. Statistica Sinica, 1147-1164.

Hornik, K., & Grün, B. (2014). movMF: an R package for fitting mixtures of von Mises-Fisher distributions. Journal of Statistical Software, 58, 1-31.

McGee, G., Wilson, A., Webster, T. F., & Coull, B. A. (2023). Bayesian multiple index models for environmental mixtures. Biometrics, 79(1), 462-474.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- bsSpike(y ~ ., data = simdata,
                niter = 5000, nburnin = 1000, nchain = 1)

# Split version
models <- bsSpike_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                   summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Extract Index Vector Coefficients from `BayesSIM`

Description

Computes posterior summaries of the single-index model index vector from a fitted BayesSIM. Users may choose either the posterior mean or median as the point estimate.

Usage

## S3 method for class 'bsim'
coef(object, method = c("mean", "median"), se = FALSE, ...)

Arguments

object

A fitted object of BayesSIM or individual model.

method

Character string indicating the summary statistic to compute. Options are "mean" or "median". Default is "mean".

se

Logical value whether computing standard error for index estimates. If method is "mean", standard deviation of index vector MCMC samples is gained. If method is "median", median absolute deviation of index vector MCMC samples is gained. FALSE is default.

...

Additional arguments passed to other methods.

Value

A numeric vector or data.frame of estimated coefficient and standard error of the index vector.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Compile a 'nimble' Model and Its MCMC

Description

Compiles a nimble model object and a corresponding (uncompiled) MCMC algorithm and returns the compiled pair.

Usage

compileModelAndMCMC(object)

Arguments

object

Class BayesSIM_setup object

Details

The function first compiles nimble model object, then compiles nimble sampler. Both compiled model and compiled MCMC samplers are returned as a list.

Value

A list with two elements:

model: the compiled NIMBLE model (external pointer object).
mcmc: the compiled MCMC function/algorithm bound to the model.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

UCI Concrete Compressive Strength (n = 1030, p = 8)

Description

Concrete compressive strength dataset from the UCI Machine Learning Repository. No missing variables and there are 8 quantitative inputs and 1 quantitative output.

Usage

data(concrete)

Format

Numeric data.frame of size 1030 \times 8.

cement: Numeric. Cement content (kg/m^3).
blast_furnace_slag: Numeric. Blast furnace slag (kg/m^3).
fly_ash: Numeric. Fly ash (kg/m^3).
water: Numeric. Mixing water (kg/m^3).
superplasticizer: Numeric. Superplasticizer (kg/m^3).
coarse_aggreate: Numeric. Coarse aggregate (kg/m^3).
fine_aggregate: Numeric. Fine aggregate (kg/m^3).
age: Numeric. Curing age (days; typically 1–365).
strength: Numeric. Concrete compressive strength (MPa).

Details

Source Concrete Compressive Strength in UCI Machine Learning Repository. This data is integrated by experimental data from 17 different sources to check the realiability of the strength. This dataset compiles experimental concrete mixes from multiple studies and is used to predict compressive strength and quantify how mixture ingredients and curing age affect that strength.

Variables.

Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer, Coarse Aggregate, Fine Aggregate: quantities in kg per m^3 of mixture.
Age: curing time in days (1–365).
Target(strength): compressive strength in MPa.

References

Yeh, I. (1998). Concrete Compressive Strength [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5PK67.

Yeh, I. (1998). Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12), 1797-1808.

Examples

data(concrete)
str(concrete)
plot(density(concrete$strength), main = "Concrete compressive strength (MPa)")

Extract Fitted Values from `BayesSIM`

Description

Computes fitted values from a BayesSIM, using either the posterior mean or median of the estimated link function with index values. Fitted values can be returned on the response scale or on the linear predictor scale.

Usage

## S3 method for class 'bsim'
fitted(
  object,
  type = c("response", "linpred"),
  method = c("mean", "median"),
  ...
)

Arguments

object

A fitted object of BayesSIM or individual model.

type

Character string indicating the scale on which fitted values are returned. Default is "response".

"response": fitted response values \hat{y}
"linpred": linear predictor X'\theta

method

Character string specifying the summary statistic used to compute the fitted values. Options are "mean" or "median". Default is "mean".

...

Additional arguments passed to other methods.

Value

A numeric vector of fitted values.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Extract Residuals from `BayesSIM`

Description

Returns the model residuals based on the posterior fitted values of a BayesSIM. Residuals can be computed using either the posterior mean or median fitted values.

Usage

## S3 method for class 'bsim'
residuals(object, method = c("mean", "median"), ...)

Arguments

object

A fitted object of BayesSIM or individual model.

method

Character string specifying the summary statistic used to compute the fitted values. Options are "mean" or "median". Default is "mean".

...

Additional arguments passed to other methods.

Value

A numeric vector of residuals. (\mathbf{r} = \mathbf{Y} - \hat{\mathbf{Y}}) \hat{\mathbf{Y}} can be mean or median of MCMC samples.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Get nimbleModel and nimbleSampler Object from the Result of `compileModelAndMCMC`

Description

Return compiled nimble model object and a corresponding MCMC samplers.

Usage

get_model(object)

get_sampler(object)

Arguments

object

The result of compileModelAndMCMC function.

Value

get_model returns compiled Nimble model object. get_sampler returns compiled Nimble sampler object, directly using in runMCMC function.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Get Initial Value of the Model

Description

Functions for getting list of initial values of the nimble model.

Usage

getInit(object)

Arguments

object

A fitted object of BayesSIM, BayesSIM_setup or individual model.

Details

The list of initial values are returned. This can be helpful to use when you use BayesSIM_setup. You should be aware of that if you want to get more than 1 chain of MCMC samples, you should change nchain argument in BayesSIM_setup. The output of initial values would be different, depending on the number of chain.

You can apply BayesSIM object when you want to check the list of initial values.

Value

BUGS code of the model definition.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)


# Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Get Definition of the Model

Description

Functions for identifying definition of the nimble model.

Usage

getModelDef(object)

Arguments

object

A fitted object of BayesSIM or individual model.

Details

The function that contain Bayes SIM model structure in nimble. This function is for advanced users.

Value

BUGS code of the model definition.

Examples

simdata2 <- data.frame(DATA1$X, y = DATA1$y)
models <- bsFisher_setup(y ~ ., data = simdata2)
# Get model definition
getModelDef(models)

Retrieve Monitorable Variables

Description

Functions for retrieving the variables that can be monitored.

Usage

getVarMonitor(object, type = c("name", "list"))

Arguments

object

A fitted object of BayesSIM, BayesSIM_setup or individual model.

type

Options for variables. By default, type = "name" is used that it only prints the name of the node. If you put name of the nodes, the MCMC outputs gave you all elements of the variable, in case of the vector. If type = "list", the dimension of the nodes are printed. If you put name and dimension of the nodes, only specific location of vector or matrix can be seen in summary or nimTraceplot.

Details

The function returns a list of variables that can be used in monitors2 in the bayesSIM function. You can also refer to getModelDef to understand the model structure and designate necessary variables. Stochastic nodes of the model are recommended to be monitored.

Value

A vector of variables that can be monitored in the model.

Examples

simdata2 <- data.frame(DATA1$X, y = DATA1$y)
models <- BayesSIM_setup(y ~ ., data = simdata2)

# Get monitorable variables
getVarMonitor(models)
# Get the list of variables with dimension
getVarMonitor(models, type = "list")

Bayesian Single-Index Regression with Gaussian Process Link and von Mises-Fisher Prior

Description

Fits a single–index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n, where the index \theta lies on the unit sphere with von Mises-Fisher prior, and the link f(\cdot) is represented with Gaussian process.

Usage

gpFisher(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpFisher_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single-index model uses Gaussian process with zero mean and and covariance kernel \eta \cdot \text{exp}(-\frac{(t_i-t_j)^2}{l}) as a link function, where t_i, t_j, j = 1, \ldots, n is index value. Index vector should be length 1.

Priors

von Mises–Fisher prior on the index \theta with direction and concentration.
Covariance kernel: Amplitude, \eta, follows log normal distribution with mean a_\eta and variance b_\eta. Length-scale parameter follows gamma distribution with shape parameter \alpha_l and rate parameter \beta_l.
Inverse gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling All sampling parameters' samplers are assigned by nimble.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: von Mises–Fisher prior for the projection vector \theta (index). index_direction is a unit direction vector of the von Mises–Fisher distribution. By default, the estimated vector from projection pursuit regression is assigned. index_dispersion is the positive concentration parameter. By default, 150 is assigned.
Link function:
- Length-scale:Gamma distribution is assigned for length-scale parameter, l. link_lengthscale_shape is shape parameter (default 1/8) and link_lengthscale_rate is rate parameter of lengthscale. (default 1/8)
- Amplitude: Log-normal distribution is assigned for amplitude parameter, \eta. link_amp_a is mean (default -1), and link_amp_b is variance. (default 1)
Error variance (sigma2): An inverse-gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 1, sigma2_rate = 1)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector (index): Initial unit index vector \theta. By default, the vector is sampled from the von Mises–Fisher prior.
Link function: link_lengthscale is initial scalar length-scale parameter. (default: 0.1) link_amp is initial scalar amplitude parameter. (default: 1)
Error variance (sigma2): Initial scalar error variance. (default: 1)

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Antoniadis, A., Grégoire, G., & McKeague, I. W. (2004). Bayesian estimation in single-index models. Statistica Sinica, 1147-1164.

Choi, T., Shi, J. Q., & Wang, B. (2011). A Gaussian process regression approach to a single-index model. Journal of Nonparametric Statistics, 23(1), 21-36.

Hornik, K., & Grün, B. (2014). movMF: an R package for fitting mixtures of von Mises-Fisher distributions. Journal of Statistical Software, 58, 1-31.

Examples


set.seed(123)
N <- 60; p <- 2
x1 <- runif(N, -3, 5)
x2 <- runif(N, -3, 5)
beta1 <- 0.45; beta2 <- sqrt(1-beta1^2)
sigma <- sqrt(0.0036)
xlin <- x1*beta1 + x2*beta2
eta <- 0.1*xlin + sin(0.5*xlin)^2
y <- rnorm(N, eta, sigma)
x <- matrix(c(x1, x2), ncol = 2)
simdata <- data.frame(x = x, y = y)
colnames(simdata) <- c("X1", "X2", "y")

# One tool version
fit1 <- gpFisher(y ~ ., data = simdata, nchain = 1, niter = 1000, nburnin = 100)

# Split version
models <- gpFisher_setup(y ~ ., data = simdata, nchain = 1)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 1000, nburnin = 100, thin = 1,
                   nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Bayesian Single-Index Regression with Gaussian Process Link and One-to-One Polar Transformation

Description

Fits a single–index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n, where the index \theta is specified and computed via a one-to-one polar transformation, and the link f(\cdot) is represented with a Gaussian process.

Usage

gpPolar(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpPolar_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpPolarHigh(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpPolarHigh_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model is specified as Y_i = f(X_i'{\theta}) + \epsilon_i, where the index vector \theta lies on the unit sphere with (\|\theta\|_2=1) with non-zero first component to ensure identifiability and is parameterized via a one-to-one polar transformation with angle \psi_1,...,\psi_{p-1}.

The mapping is

\begin{aligned} \theta_1 &= \sin(\psi_1),\\ \theta_i &= \Big(\prod_{j=1}^{i-1}\cos(\psi_j)\Big)\sin(\psi_i), \quad i=2,\dots,p-1,\\ \theta_p &= \prod_{j=1}^{p-1}\cos(\psi_j). \end{aligned}

The vector is then scaled to unit length.

Sampling is performed on the angular parameters \theta defining the index vector. The link function f(\cdot) is modeled by a Gaussian process prior with zero mean and an Ornstein–Uhlenbeck (OU) covariance kernel \exp(-\kappa \cdot |t_i - t_j|), i, j = 1,\ldots, n, where \kappa is a bandwidth (smoothness) parameter and t_i, t_j is index value (t_i = X_i'\theta).

Priors

\psi is p-1 dimension of polar angle of index vector and re-scaled Beta distribution on [0, \pi] is assigned for prior.
Prior for \kappa (bandwidth parameter) is discrete uniform of equally spaced grid points in [\kappa_{\text{min}}, \kappa_{\text{max}}].
Inverse gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling For gpPolar, \theta is sampled by Metropolis-Hastings and updated with f, \kappa is chosen by grid search method that maximizes likelihood, \sigma^2 are sampled from their full conditional distributions using Gibbs sampling. Since \kappa is sampled by grid search, more than 5 dimension of variables gpPolarHigh is recommended. For gpPolarHigh, all sampling parameters' samplers are assigned by nimble.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: Only shape parameter index_psi_alpha of p-1 dimension vector is needed since rate parameters is computed to satisfy \theta_{j, \text{MAP}}. By default, the shape parameter for each element of the polar vector is set to 5000.
Link function: link_kappa_min is minimum value of kappa (default 0.5), link_kappa_max is maximum value of kappa (default 4), and link_kappa_grid_width is space (default 0.1).
Error variance (sigma2): An Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 2, sigma2_rate = 0.01)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector: Initial vector of polar angle index_psi with p-1 dimension. Each element of angle is between 0 and \pi.
Link function: Initial scalar scale parameter of covariance kernel link_kappa. (default: 2)
Error variance (sigma2): Initial scalar error variance. (default: 0.01)

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Dhara, K., Lipsitz, S., Pati, D., & Sinha, D. (2019). A new Bayesian single index model with or without covariates missing at random. Bayesian analysis, 15(3), 759.

Examples


library(MASS)
N <- 100    # Sample Size
p <- 3
mu <- c(0,0,0)
rho <- 0
Cx <- rbind(c(1,rho,rho), c(rho,1,rho), c(rho, rho,1))
X <- mvrnorm(n = N, mu=mu, Sigma=Cx, tol=1e-8)
alpha <- c(1,1,1)
alpha <- alpha/sqrt(sum(alpha^2))
z <- matrix(0,N)
z <- X %*% alpha
z <- z[,1]
sigma <- 0.3
f <- exp(z)
y <- f + rnorm(N, 0, sd=sigma) # adding noise
y <- y-mean(y)
f_all <- f
x_all <- X
y_all <- y
simdata <- cbind(x_all, y, f)
simdata <- as.data.frame(simdata)
colnames(simdata) = c('x1', 'x2', 'x3', 'y','f')

# One tool version
fit1 <- gpPolar(y ~ x1 + x2 + x3, data = simdata,
                niter = 5000, nburnin = 1000, nchain = 1)
fit2 <- gpPolarHigh(y ~ x1 + x2 + x3, data = simdata,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Split version
models1 <- gpPolar_setup(y ~ x1 + x2 + x3, data = simdata)
models2 <- gpPolarHigh_setup(y ~ x1 + x2 + x3, data = simdata)
Ccompile1 <- compileModelAndMCMC(models1)
Ccompile2 <- compileModelAndMCMC(models2)
sampler1 <- get_sampler(Ccompile1)
sampler2 <- get_sampler(Ccompile2)
initList1 <- getInit(models1)
initList2 <- getInit(models2)
mcmc.out1 <- runMCMC(sampler1, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, init = initList1,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
mcmc.out2 <- runMCMC(sampler2, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, init = initList2,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
fit1_split <- as_bsim(models1, mcmc.out1)
fit2_split <- as_bsim(models2, mcmc.out2)

Bayesian Single-Index Regression with Gaussian Process Link and Unit Sphere Prior

Description

Fits a single–index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n, where the index \theta lies on the unit sphere, and the link f(\cdot) is represented with Gaussian process.

Usage

gpSphere(
  formula,
  data,
  prior = NULL,
  init = NULL,
  method = "FB",
  lowerB = NULL,
  upperB = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpSphere_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  method = "FB",
  lowerB = NULL,
  upperB = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

method

lowerB

upperB

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Priors

von Mises–Fisher prior on the index \theta with direction and concentration.
Covariance kernel: Amplitude, \eta, follows log normal distribution with mean a_\eta and variance b_\eta. Length-scale parameter follows gamma distribution with shape parameter \alpha_l and rate parameter \beta_l.
Inverse-Gamma prior on \sigma^2 with shape parameter a_\sigma and rate parameter b_\sigma.

Sampling In the fully Bayesian approach, \theta, l, and \eta are updated via the Metropolis–Hastings algorithm, while f and \sigma^2 are sampled using Gibbs sampling.

In the empirical Bayes approach, \theta, l, \eta, and \sigma^2 are estimated by maximum a posteriori (MAP), and f is sampled from its full conditional posterior distribution.

In the empirical Gibbs sampler, \theta, l, and \eta are estimated by MAP, whereas f and \sigma^2 are sampled via Gibbs sampling.

For estimation via MAP, effective sample size or potential scale reduction factor is meaningless.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: Nothing to assign.
Link function:
- Length-scale:Gamma distribution is assigned for length-scale parameter, l. link_lengthscale_shape is shape parameter (default 1/8) and link_lengthscale_rate is rate parameter of lengthscale. (default 1/8)
- Amplitude: Log-normal distribution is assigned for amplitude parameter, \eta. link_amp_a is mean (default -1), and link_amp_b is variance. (default 1)
Error variance (sigma2): inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 1, sigma2_rate = 1)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param.

Index vector (index): Initial unit index vector. By default, vector is randomly drawn from normal distribution and standardized.
Link function: link_lengthscale is initial scalar length-scale parameter. (default: 0.1) link_amp is initial scalar amplitude parameter. (default: 1)
Error variance (sigma2): Initial scalar error variance. (default: 1)

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

Choi, T., Shi, J. Q., & Wang, B. (2011). A Gaussian process regression approach to a single-index model. Journal of Nonparametric Statistics, 23(1), 21-36.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- gpSphere(y ~ ., method = "EB", data = simdata,
                 niter = 1000, nburnin = 100)

# Split version
models <- gpSphere_setup(y ~ ., method = "EB", data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 1000, nburnin = 100, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
# The estimates computed by MAP - standard error of the esitmate is meaningless.
summary(fit2)

Bayesian Single-Index Regression with Gaussian Process Link and Spike-and-Slab Prior

Description

Fits a single-index model Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n, where index vector \theta has a spike and slab prior and the link f(\cdot) is represented by Gaussian process and the

Usage

gpSpike(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpSpike_setup(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model is specified as Y_i = f(X_i' \theta) + \epsilon_i, where \theta is a p-dimensional index vector subject to a spike-and-slab prior for variable selection. The link function f(\cdot) is modeled using a Gaussian process prior with zero mean and squared exponential covariance kernel K(x_1, x_2) = \exp\{-\rho {(x_1 - x_2)^{T}\theta}^2\}, where \rho determines the smoothness of f. The covariance kernel is re-parameterized to \exp\{-{(x_1 - x_2)^{T}\theta^{*}}^2\} where \rho = ||\theta^{*}|| and \theta = ||\theta||^{-1}\theta^{*}. Therefore, \theta^{*} is sampled in MCMC.

Priors

The variable selection indicator \nu has Beta–Bernoulli hierarchy prior. Beta hyper-prior on the Bernoulli–inclusion probability w, inducing p(\nu) \propto \mathrm{Beta}(r_1 + n_\nu, r_2 + p - n_\nu) where n_\nu = \Sigma_{i=1}^{p}I(\nu_i = 1). r_1, r_2 are shape and rate parameter of beta distribution.
Slab coefficients \theta have normal distribution with zero mean and \sigma_\theta^2 variance.
GP precision \lambda^{-1} follows gamma distribution with shape parameter a_\lambda, and rate parameter b_\lambda.
Error precision (\sigma^2)^{-1} follows gamma distribution with shape parameter a_\sigma, and rate parameter b_\sigma.

Sampling A random walk Metropolis algorithm is used to sample \lambda^{-1} and a Metropolis-Hastings algorithm is used for the main parameters (\theta^{*}, \nu). The variance \sigma^2 is directly sampled from posterior distribution. f is not directly sampled by MCMC.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

Index vector: index_nu_r1, index_nu_r2 gives the shape and rate parameter of beta-binomial prior, respectively. For slab prior, normal distribution with zero mean is assigned for selected variables \theta. index_sigma_theta is for variance of \theta, and it is assigned 0.25 by default.
Link function: Inverse gamma prior is assigned for hyper-parameters \lambda^{-1} link_inv_lambda_shape is shape parameter and link_inv_lambda_rate is rate parameter of inverse gamma distribution. (default link_inv_lambda_shape = 1, link_inv_lambda_rate = 0.1)
Error variance (sigma2): An Inverse gamma prior is assigned to \sigma^2 where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 0.001, sigma2_rate = 100)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param

Index vector:
- index_pi: Initial selecting variable probability. (default: 0.5)
- index_nu: Initial vector of inclusion indicators . By default, each index_nu is randomly drawn by Bernoulli(1/2)
- index: Initial vector of index. By default, each element of index vector, which is chosen by indicator, is proposed by normal distribution.
Link function: Initial scalar of lambda (link_inv_lambda) for covariance kernel of Gaussian process.
Error variance (sigma2): Initial scalar error variance. (default: 0.01)

Value

A list typically containing:

coefficients: Mean estimates of index vector. Return list of model_setup does not include it.
ses: Mean standard error of index vector. Return list of model_setup does not include it.
residuals: Residuals with mean estimates of fitted values. Return list of model_setup does not include it.
fitted.values: Mean estimates of fitted values. Return list of model_setup does not include it.
linear.predictors: Mean estimates of single-index values. Return list of model_setup does not include it.
gof: Goodness-of-fit. Return list of model_setup function does not include it.
samples: Posterior draws of variables for computing fitted values of the model, including \theta, \sigma^2. Return list of model_setup does not include it.
input: List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.
defModel: Nimble model object.
defSampler: Nimble sampler object.
modelName: Name of the model.

References

McGee, G., Wilson, A., Webster, T. F., & Coull, B. A. (2023). Bayesian multiple index models for environmental mixtures. Biometrics, 79(1), 462-474.

Examples


set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- gpSpike(y ~ ., data = simdata,
               niter = 5000, nburnin = 1000)

# Split version
models <- gpSpike_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)

Build an Initial Value List for BayesSIM Models

Description

init_param is a convenience helper that constructs a nested initial value list for a given combination of index vector and link function. It starts from the model-specific default prior, and then overwrites only those components for which the user supplies non-NULL arguments.

This allows users to modify selected hyper-parameters without having to know or manually reconstruct the full nested prior list structure.

Usage

init_param(
  indexprior,
  link,
  index = NULL,
  index_nu = NULL,
  index_psi = NULL,
  index_pi = NULL,
  link_beta = NULL,
  link_k = NULL,
  link_knots = NULL,
  link_lengthscale = NULL,
  link_amp = NULL,
  link_kappa = NULL,
  link_inv_lambda = NULL,
  sigma2 = NULL
)

Arguments

indexprior

Character scalar indicating the prior for the index. Typically one of "fisher", "sphere", "polar", or "spike". The valid options mirror those used in the corresponding model functions.

link

Character scalar indicating the link function family. Typically "bspline" for B-spline link functions or "gp" for Gaussian process link functions. The valid options mirror those used in the corresponding model functions.

index, index_nu, index_psi, index_pi

Optional initial values for index and related parameter values.

link_beta, link_k, link_knots, link_lengthscale, link_amp, link_kappa, link_inv_lambda

Optional initial values for components under link functions.

sigma2

Optional numeric scalar giving the initial value of \sigma^2.

Details

init_param(indexprior, link) can be used to obtain the random initial values list for the requested combination of index prior and link function. For any argument that is not NULL, the corresponding field in the nested prior list is overwritten.

The detailed meaning and recommended choices for each initial values depend on the specific model, index vector and link function. For those details, please refer to the documentation of the corresponding model-fitting functions.

Value

A nested list with components index, link, and sigma2.

Examples

## Default initial values for Fisher index + B-spline link:
i0 <- init_param("fisher", "bspline")

## Modify only a few initial values:
i1 <- init_param(
  indexprior = "fisher",
  link       = "bspline",
  index      = c(1, 0, 0),      # initial direction of the index
  link_beta  = rep(0, 21),      # initial values for spline coefficients
  sigma2     = 0.1              # initial value for sigma^2
)

## Example with GP link:
i2 <- init_param(
  indexprior        = "sphere",
  link              = "gp",
  link_lengthscale  = 0.2,      # initial GP length-scale
  link_amp          = 1.5,      # initial GP amplitude
  sigma2            = 1         # initial variance
)

Traceplot for `BayesSIM`

Description

Provides traceplot for objects of BayesSIM.

Usage

nimTraceplot(x, ...)

Arguments

x

A fitted object of BayesSIM or individual model.

...

Further arguments passed to plot.

Value

Traceplots for MCMC samples are displayed. By default, the index vector and error variance are only included in the summary. Extra monitor variables are included, if specified.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Plot Method for `BayesSIM`

Description

Produce diagnostic plots for a fitted Bayesian single-index model.

Usage

## S3 method for class 'bsim'
plot(x, method = c("mean", "median"), interval = TRUE, alpha = 0.95, ...)

## S3 method for class 'bsimPred'
plot(x, ...)

Arguments

x

A fitted object of BayesSIM or individual model.

method

Character string specifying the summary used for the posterior fitted values. Options are "mean" or "median". Default is "mean".

interval

A logical value indicating whether a credible interval is included in the fitted plot. Default is TRUE.

alpha

Numeric value between 0 and 1 specifying the credible level. By default, alpha = 0.95 produces a 95% credible interval.

...

Additional arguments passed to underlying plotting functions.

Details

The function internally calls predict() on the fitted model object to obtain posterior summaries of \hat{y}. Predicted value of y is \hat{f}(X'\hat{\theta}).

If interval = TRUE, the function requests posterior credible intervals and overlays them on the fitted curve.
If interval = FALSE, only the posterior mean or median curve is drawn.

Value

The output consists of two plots:

Observed vs Predicted plot: a diagnostic scatter plot comparing actual outcomes with posterior fitted values to visually assess model fit.
Fitted curve plot: posterior \hat{y} as a function of the estimated single index, optionally with 100 \times \alpha % credible intervals.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Prediction Method for `BayesSIM`

Description

Generate predictions from a fitted Bayesian single-index model.

Usage

## S3 method for class 'bsim'
predict(
  object,
  newdata = NULL,
  se.fit = FALSE,
  type = c("response", "latent", "index"),
  method = c("mean", "median"),
  interval = c("none", "credible"),
  level = 0.95,
  ...
)

Arguments

object

A fitted object of BayesSIM or individual model.

newdata

Optional data frame for which predictions should be made. If NULL, predictions are returned for the training data.

se.fit

A logical value indicating whether standard errors are required. Default is FALSE.

type

Character string specifying the type of prediction. "response" is the default.

"response": Posterior predictive summaries of the response Y. This corresponds to draws from the posterior predictive distribution Y^{(m)} \sim N(f(X'\theta^{(m)}), \sigma^{2(m)}) and therefore incorporates both the uncertainty in the link function and the variability of the error term for each m^{th} MCMC sample.
"latent": Posterior summaries of the latent mean structure E(Y \mid X) = f^{(m)}(t^{(m)}), where t^{(m)} = X'\theta^{(m)}. Unlike "response", it excludes the noise term and calculated by f^{(m)}(X'\theta^{(m)}) for each m^{th} MCMC sample of \theta.
"index": Posterior summaries of the single index t^{(m)} = X'\theta^{(m)}.

method

Character string determining the posterior summary used for point predictions. Options are "mean" or "median". Default is "mean".

interval

Character string indicating whether a credible interval should be returned. Default is "none".

"none": Return only point predictions.
"credible": Return a 100 \times \text{level}\% credible interval.

level

Numeric value between 0 and 1 specifying the credible level. level = 0.95 yields a 95% credible interval. Default is 0.95.

...

Additional arguments.

Details

This method extracts MCMC posterior samples stored in a BayesSIM object and computes posterior summaries of:

the posterior predictive response Y \mid X (type "response"),
the latent link function evaluation E(Y \mid X) = f(X'\theta) (type "latent"), or
the single index X'\theta (type "index").

The key distinction is that "response" incorporates the posterior variability of the error term \epsilon, whereas "latent" represents the noiseless conditional expectation E(Y \mid X) computed directly from the link function and the posterior draws of \theta.

When interval = "credible", the returned object includes lower and upper credible bounds computed via posterior quantiles for the chosen prediction scale.

If newdata is supplied, predictions are evaluated at the new covariate values by computing the corresponding posterior index t = X'\theta and applying the link function.

Value

A list containing at least the following components:

fitted: Posterior mean or median predictions. If se.fit = TRUE or interval = "credible", standard error and lower, upper bounds of the credible interval is also included.
truey: True response value of test data. When true response value is not available, NULL is saved.
idxValue: Linear index value is saved where \theta is estimated by method.
level: Credible level.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

Build a Prior List for BayesSIM Models

Description

prior_param is a convenience helper that constructs a nested prior list for a given combination of index prior and link function. It starts from the model-specific default prior, and then overwrites only those components for which the user supplies non-NULL arguments.

This allows users to modify selected hyper-parameters without having to know or manually reconstruct the full nested prior list structure.

Usage

prior_param(
  indexprior,
  link,
  index_direction = NULL,
  index_dispersion = NULL,
  index_nu_r1 = NULL,
  index_nu_r2 = NULL,
  index_psi_alpha = NULL,
  index_sigma_theta = NULL,
  index_r1 = NULL,
  index_r2 = NULL,
  link_basis_df = NULL,
  link_basis_degree = NULL,
  link_basis_delta = NULL,
  link_knots_lambda_k = NULL,
  link_knots_maxknots = NULL,
  link_beta_mu = NULL,
  link_beta_cov = NULL,
  link_beta_tau = NULL,
  link_beta_Sigma0 = NULL,
  link_lengthscale_shape = NULL,
  link_lengthscale_rate = NULL,
  link_amp_a = NULL,
  link_amp_b = NULL,
  link_kappa_min = NULL,
  link_kappa_max = NULL,
  link_kappa_grid_width = NULL,
  link_inv_lambda_shape = NULL,
  link_inv_lambda_rate = NULL,
  sigma2_shape = NULL,
  sigma2_rate = NULL
)

Arguments

indexprior

Character scalar indicating the prior for the index. Typically one of "fisher", "sphere", "polar", or "spike". The valid options mirror those used in the corresponding model functions.

link

index_direction, index_dispersion, index_nu_r1, index_nu_r2, index_psi_alpha, index_sigma_theta, index_r1, index_r2

Optional overrides for hyper-parameters related to the index prior.

link_basis_df, link_basis_degree, link_basis_delta

Optional overrides for the B-spline basis hyper-parameters, such as the effective degrees of freedom, spline degree, and penalty parameter.

link_knots_lambda_k, link_knots_maxknots

Optional overrides for the B-spline knot-selection hyper-parameters in, used for models with adaptive knot placement.

link_beta_mu, link_beta_cov, link_beta_tau, link_beta_Sigma0

Optional overrides for the prior on spline coefficients. The detailed interpretation of these hyper-parameters depends on the specific model and is described in the documentation of each model-fitting function.

link_lengthscale_shape, link_lengthscale_rate

Optional overrides for the hyper-parameters of the GP length-scale prior.

link_amp_a, link_amp_b

Optional overrides for the hyper-parameters of the GP amplitude prior.

link_kappa_min, link_kappa_max, link_kappa_grid_width

Optional overrides for the hyper-parameters in used in models with polar index and GP-type link, to control the grid or support for the concentration parameter \kappa in gpPolar.

link_inv_lambda_shape, link_inv_lambda_rate

Optional overrides for spike-and-slab–type GP link priors.

sigma2_shape, sigma2_rate

Optional overrides for the inverse-gamma prior on the observation variance \sigma^2.

Details

prior_param(indexprior, link) can be used to obtain the default prior list for the requested combination of index prior and link function. For any argument that is not NULL, the corresponding field in the nested prior list is overwritten.

The detailed meaning and recommended choices for each hyper-parameter depend on the specific model, prior of index vector and link function. For those details, please refer to the documentation of the corresponding model-fitting functions.

Value

A nested list with top-level elements index, link, and sigma2, suitable for passing to the prior argument of the various BayesSIM model fitting functions.

Examples

## Default prior for Fisher index + B-spline link:
p0 <- prior_param("fisher", "bspline")

## Modify only a few hyper-parameters:
p1 <- prior_param(
  indexprior          = "fisher",
  link                = "bspline",
  sigma2_shape        = 0.5,
  link_basis_df       = 30,
  index_direction     = c(1, 0, 0)
)

Summarize `BayesSIM`

Description

Provides a summary for BayesSIM.

Usage

## S3 method for class 'bsim'
summary(object, ...)

## S3 method for class 'summary.bsim'
print(x, digits = 3, ...)

Arguments

object

A fitted object of BayesSIM or individual model.

...

Further arguments passed.

x

A summary output of BayesSIM or individual model.

digits

The minimum number of significant digits to be printed.

Details

A list of summary statistics for MCMC samples, including data.frame table for the results. Each row corresponds to a model parameter, and columns report the statistics.

Value

The function summarizes posterior MCMC samples by reporting key statistics, including:

Posterior mean and median
Empirical standard deviation
95% credible interval (lower and upper quantiles)
Potential scale reduction factor (gelman) for multiple chains
Effective sample size (ESS)

By default, the index vector and error variance are only included in the summary. If variable selection methods are used, such as uniform sphere and spike-and-slab prior, the indicator vector (nu) is also included. Note that the potential scale reduction factor for nu can be reported as NaN or Inf, since the indicator rarely changes during the MCMC run.

If the model is fitted with single chain, both all.chain and chain have identical information.

Examples


simdata2 <- data.frame(DATA1$X, y = DATA1$y)

# 1. One tool version
fit_one <- BayesSIM(y ~ ., data = simdata2,
                    niter = 5000, nburnin = 1000, nchain = 1)

# Check median index vector estimates with standard errors
coef(fit_one, method = "median", se = TRUE)

# Fitted index values of median prediction
fitted(fit_one, type = "linpred", method = "median")

# Residuals of median prediction
residuals(fit_one, method = "median")

# Summary of the model
summary(fit_one)

# Convergence diagnostics
nimTraceplot(fit_one)

# Goodness of fit
GOF(fit_one)

# Fitted plot
plot(fit_one)

# Prediction with 95% credible interval at new data
newx <- data.frame(X1 = rnorm(3), X2 = rnorm(3), X3 = rnorm(3), X4 = rnorm(3))
pred <- predict(fit_one, newdata = newx, interval = "credible", level = 0.95)
plot(pred)


# 2. Split version
models <- BayesSIM_setup(y ~ ., data = simdata2)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)

# "fit_split" becomes exactly the same as the class of "fit_one" object and apply generic functions.
fit_split <- as_bsim(models, mcmc.out)

BayesSIM: Integrated Interface of Bayesian Single Index Models using 'nimble'

Description

Author(s)

Integrated Function for Bayesian Single-Index Regression

Description

Usage

Arguments

Details

Value

See Also

Examples

Simulated Single–index Data (n = 200, p = 4)

Description

Usage

Format

Examples

Goodness of Fit for BayesSIM

Description

Usage

Arguments

Value

Examples

Additional Functions for Spline

Description

Usage

Format

Construct a Fitted Model Object from Model Setup and MCMC Output

Description

Usage

Arguments

Details

Value

Examples

Bayesian Single-Index Regression with B-Spline Link and von Mises-Fisher Prior

Description

Usage

Arguments

Details

Value

References

Examples

Bayesian Single-Index Regression with B-Spline Link and One-to-One Polar Transformation

Description

Usage

Arguments

Details

Value

References

Examples

Bayesian Single-Index Regression with B-Spline Link and Half-Unit Sphere Prior

Description

Usage

Arguments

Details

Value

References

Examples

Bayesian Single-Index Regression with B-Spline Link and Spike-and-Slab Prior

Description

Usage

Arguments

Details

Value

References

Examples

Extract Index Vector Coefficients from BayesSIM

Description

Usage

Arguments

Value

Examples

Compile a 'nimble' Model and Its MCMC

Description

Usage

Arguments

Details

Value

See Also

Examples

UCI Concrete Compressive Strength (n = 1030, p = 8)

Goodness of Fit for `BayesSIM`

Extract Index Vector Coefficients from `BayesSIM`

Extract Fitted Values from `BayesSIM`

Extract Residuals from `BayesSIM`

Get nimbleModel and nimbleSampler Object from the Result of `compileModelAndMCMC`