Additional functions

library(cardinalR)

These are helper functions included in the package.

Generating background noise

The gen_bkgnoise() function allows users to generate multivariate Gaussian noise to serve as background data in high-dimensional spaces.

# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4, 
                         m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#>       x1    x2     x3     x4
#>    <dbl> <dbl>  <dbl>  <dbl>
#> 1  0.172  1.93  1.17   0.341
#> 2 -0.168  2.61 -0.424  3.70 
#> 3 -1.99   2.61  0.526  1.77 
#> 4  1.88   3.05  2.71   2.28 
#> 5  3.90   1.90  1.78   1.77 
#> 6  1.02   1.76  2.75  -0.455

The generated data has independent dimensions with specified means (m) and standard deviations (s).

Randomizing rows

randomize_rows() ensures the rows of the input data is randomized.

randomized_data <- randomize_rows(bkg_data)
head(randomized_data)
#> # A tibble: 6 × 4
#>      x1    x2     x3     x4
#>   <dbl> <dbl>  <dbl>  <dbl>
#> 1 1.37  1.61  -2.17  -1.73 
#> 2 2.57  0.889  0.677 -0.339
#> 3 3.29  0.418 -1.71  -3.49 
#> 4 0.811 1.93  -1.59   0.487
#> 5 1.61  3.18   1.41  -1.89 
#> 6 2.52  2.41   1.36   3.04

Relocating clusters

relocate_clusters() allows users to translate clusters in any dimension(s). This is achieved by centering each cluster (subtracting its mean) and then adding a translation vector from a provided matrix (vert_mat).

df <- tibble::tibble(
  x1 = rnorm(12),
  x2 = rnorm(12),
  x3 = rnorm(12),
  x4 = rnorm(12),
  cluster = rep(1:3, each = 4)
)

vert_mat <- matrix(c(
  5, 0, 0, 0,
  0, 5, 0, 0,
  0, 0, 5, 0
), nrow = 3, byrow = TRUE)

relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#>       x1      x2      x3     x4 cluster
#>    <dbl>   <dbl>   <dbl>  <dbl>   <int>
#> 1  4.02  -0.0629 -1.66   -0.284       1
#> 2  6.33  -0.536   0.427   0.373       1
#> 3 -1.48   0.277   5.29    0.408       3
#> 4  1.39   5.35    0.0729 -0.168       2
#> 5  0.775  0.739   4.62   -1.11        3
#> 6 -0.950  4.92   -0.559  -0.598       2

Generating Rotation Matrices

The gen_rotation() function creates a rotation matrix in high-dimensional space for given planes and angles.


rotations_4d <- list(
  list(plane = c(1, 2), angle = 60),
  list(plane = c(3, 4), angle = 90)
)

rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#>           [,1]       [,2]         [,3]          [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00  0.000000e+00
#> [2,] 0.8660254  0.5000000 0.000000e+00  0.000000e+00
#> [3,] 0.0000000  0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000  0.0000000 1.000000e+00  6.123234e-17

Normalize data

When combining clusters or transforming data geometrically, magnitudes can differ drastically. The normalize_data() function rescales the entire dataset to fit within ([-1, 1]) based on its maximum absolute value.

norm_data <- normalize_data(bkg_data)
head(norm_data)
#>            x1        x2          x3          x4
#> 1  0.02656759 0.2984105  0.18036334  0.05277507
#> 2 -0.02603029 0.4040477 -0.06554493  0.57150235
#> 3 -0.30738692 0.4042966  0.08132158  0.27364181
#> 4  0.29066189 0.4719561  0.41956323  0.35199022
#> 5  0.60253147 0.2935891  0.27472307  0.27423222
#> 6  0.15783466 0.2721365  0.42606165 -0.07031077

Generating cluster locations

To place clusters in different positions, gen_clustloc() generates points forming a simplex-like arrangement ensuring each cluster center is equidistant from others as much as possible.


centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#>            [,1]      [,2]       [,3]       [,4]       [,5]
#> [1,] -1.2824747  1.448226  0.2216148 -0.9641933  0.5768274
#> [2,]  0.7149715 -0.710979  0.5859915 -0.4137638 -0.1762202
#> [3,]  1.7893379 -1.630507  0.9483273 -0.7277021 -0.3794563
#> [4,]  1.2629620 -1.598145 -1.0087634  1.6116427 -0.2676959

Numeric generators

Two helper functions, gen_nproduct() and gen_nsum(), generate numeric vectors of positive integers that approximately satisfy a user-specified target product or sum, respectively.

The function gen_nsum(n, k) divides a total sum n into k positive integers. It first assigns an equal base value to each element and then randomly distributes any remainder, ensuring the elements sum exactly to n.

gen_nsum(n = 100, k = 3)
#> [1] 33 33 34

The function gen_nproduct(n, p) aims to produce p positive integers whose product is approximately n. It starts with all elements equal to the rounded \(p^{th}\) root of n and iteratively adjusts elements up or down in a randomized manner until the product is within a small tolerance of n. This accommodates the fact that exact integer solutions for a given product are often impossible.

gen_nproduct(n = 500, p = 4)
#> [1] 4 5 5 5