Skip to contents

This function is used to simulate genotypes with no family and accompanying phenotype data.

Usage

sim_genotypes_no_family(
  n,
  disease,
  path,
  overwrite = T,
  n_blocks = min(n, 350)
)

Arguments

n

Integer specifying amount of genotypes/indivduals to simulate.

disease

A list with all the disease parameters. Can be created using the sim_disease() function.

path

Path to where .rds file should be saved, or where one is stored if overwriting existing .rds file (DO NOT SPECIFY FILE EXTENSION).

overwrite

Boolean value used to determine if existing .rds file with specified name should be overwritten (Default value TRUE).

n_blocks

Integer used to determine number of blocks to run simulation in (Default value is 350). Set higher if running into memory issues such as freezing or crashing. Setting n_blocks higher reduces the memory size of each block, but slightly slows the calculation time.

Value

Returns list object, also refered to as a rds object, containing an FMB.code256 with genotypes, MAF tibble containing information on SNPs and FAM tibble containing phenotype information on genotypes.

Details

Simulating a 100.000x100.000 dataset will take up around 9.76 GB of space. Since the running time depends on a number of variables, such as the parallelization settings, core speed and core amount, we cannot accurately give an estimation how long the simulation will take. The default n_blocks parameter has been set to 350 as this is the number at which a 100.000x100.000 use a maximum of 2 GB of RAM for calculating a single block. Instead we simply warn the user that simulations might take upwards of multiple hours for large datasets such as a 100.000x100.000. Simulation can be performed using parallelization if a parallelization plan has been set prior to execution in the global environment. WARNING: using parallelization will, with a n_blocks of 350, use up to a maximum of 2 GB of RAM for EACH process when running a simulation of 100.000x100.000 with 2 siblings for each genotype.