These are helper functions included in the package.
The gen_bkgnoise() function allows users to generate
multivariate Gaussian noise to serve as background data in
high-dimensional spaces.
# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4,
m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#> x1 x2 x3 x4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.172 1.93 1.17 0.341
#> 2 -0.168 2.61 -0.424 3.70
#> 3 -1.99 2.61 0.526 1.77
#> 4 1.88 3.05 2.71 2.28
#> 5 3.90 1.90 1.78 1.77
#> 6 1.02 1.76 2.75 -0.455The generated data has independent dimensions with specified means
(m) and standard deviations (s).
randomize_rows() ensures the rows of the input data is
randomized.
relocate_clusters() allows users to translate clusters
in any dimension(s). This is achieved by centering each cluster
(subtracting its mean) and then adding a translation vector from a
provided matrix (vert_mat).
df <- tibble::tibble(
x1 = rnorm(12),
x2 = rnorm(12),
x3 = rnorm(12),
x4 = rnorm(12),
cluster = rep(1:3, each = 4)
)
vert_mat <- matrix(c(
5, 0, 0, 0,
0, 5, 0, 0,
0, 0, 5, 0
), nrow = 3, byrow = TRUE)
relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#> x1 x2 x3 x4 cluster
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 4.02 -0.0629 -1.66 -0.284 1
#> 2 6.33 -0.536 0.427 0.373 1
#> 3 -1.48 0.277 5.29 0.408 3
#> 4 1.39 5.35 0.0729 -0.168 2
#> 5 0.775 0.739 4.62 -1.11 3
#> 6 -0.950 4.92 -0.559 -0.598 2The gen_rotation() function creates a rotation matrix in
high-dimensional space for given planes and angles.
rotations_4d <- list(
list(plane = c(1, 2), angle = 60),
list(plane = c(3, 4), angle = 90)
)
rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#> [,1] [,2] [,3] [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00 0.000000e+00
#> [2,] 0.8660254 0.5000000 0.000000e+00 0.000000e+00
#> [3,] 0.0000000 0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000 0.0000000 1.000000e+00 6.123234e-17When combining clusters or transforming data geometrically,
magnitudes can differ drastically. The normalize_data()
function rescales the entire dataset to fit within ([-1, 1]) based on
its maximum absolute value.
norm_data <- normalize_data(bkg_data)
head(norm_data)
#> x1 x2 x3 x4
#> 1 0.02656759 0.2984105 0.18036334 0.05277507
#> 2 -0.02603029 0.4040477 -0.06554493 0.57150235
#> 3 -0.30738692 0.4042966 0.08132158 0.27364181
#> 4 0.29066189 0.4719561 0.41956323 0.35199022
#> 5 0.60253147 0.2935891 0.27472307 0.27423222
#> 6 0.15783466 0.2721365 0.42606165 -0.07031077To place clusters in different positions, gen_clustloc()
generates points forming a simplex-like arrangement
ensuring each cluster center is equidistant from others as much as
possible.
centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2824747 1.448226 0.2216148 -0.9641933 0.5768274
#> [2,] 0.7149715 -0.710979 0.5859915 -0.4137638 -0.1762202
#> [3,] 1.7893379 -1.630507 0.9483273 -0.7277021 -0.3794563
#> [4,] 1.2629620 -1.598145 -1.0087634 1.6116427 -0.2676959Two helper functions, gen_nproduct() and
gen_nsum(), generate numeric vectors of positive integers
that approximately satisfy a user-specified target product or sum,
respectively.
The function gen_nsum(n, k) divides a total sum
n into k positive integers. It first assigns
an equal base value to each element and then randomly distributes any
remainder, ensuring the elements sum exactly to n.
The function gen_nproduct(n, p) aims to produce
p positive integers whose product is approximately
n. It starts with all elements equal to the rounded \(p^{th}\) root of n and
iteratively adjusts elements up or down in a randomized manner until the
product is within a small tolerance of n. This accommodates
the fact that exact integer solutions for a given product are often
impossible.