Simulation of genotypes with no family — sim_genotypes_no

This function is used to simulate genotypes with no family and accompanying phenotype data.

Usage

sim_genotypes_no_family(
  n,
  disease,
  path,
  overwrite = T,
  n_blocks = min(n, 350)
)

Arguments

n: Integer specifying amount of genotypes/indivduals to simulate.
disease: A list with all the disease parameters. Can be created using the sim_disease() function.
path: Path to where .rds file should be saved, or where one is stored if overwriting existing .rds file (DO NOT SPECIFY FILE EXTENSION).
overwrite: Boolean value used to determine if existing .rds file with specified name should be overwritten (Default value TRUE).
n_blocks: Integer used to determine number of blocks to run simulation in (Default value is 350). Set higher if running into memory issues such as freezing or crashing. Setting n_blocks higher reduces the memory size of each block, but slightly slows the calculation time.

Value

Returns list object, also refered to as a rds object, containing an FMB.code256 with genotypes, MAF tibble containing information on SNPs and FAM tibble containing phenotype information on genotypes.

Details

Simulating a 100.000x100.000 dataset will take up around 9.76 GB of space. Since the running time depends on a number of variables, such as the parallelization settings, core speed and core amount, we cannot accurately give an estimation how long the simulation will take. The default n_blocks parameter has been set to 350 as this is the number at which a 100.000x100.000 use a maximum of 2 GB of RAM for calculating a single block. Instead we simply warn the user that simulations might take upwards of multiple hours for large datasets such as a 100.000x100.000. Simulation can be performed using parallelization if a parallelization plan has been set prior to execution in the global environment. WARNING: using parallelization will, with a n_blocks of 350, use up to a maximum of 2 GB of RAM for EACH process when running a simulation of 100.000x100.000 with 2 siblings for each genotype.