nievergeltlab of Simulations
7/11/2017 - 9:19 PM

Evaluate the efficiency of meta analysis with small study sizes and rare variants

Evaluate the efficiency of meta analysis with small study sizes and rare variants

###Simulate whether or not we'll get valid test stats doing a meta analysis of small samples

case_af <- .1
control_af <- .07

#Sample same population st cases are enumerated in batches of 50
#till we get 10000 cases --  assuming 5% of the population is cases, we need a pop of 200k people. maybe i should just do this in plink?? #

#Save the meta analysis test stat in the first column, the whole data regression analysis in the second
 results <- matrix(ncol=2,nrow=nsim)
 for (sim in 1:nsim)
 genotype <- rbinom(nsubs,2,p=rep(c(case_af,control_af),nsubs))
 dat <- data.frame(cbind(rep(c(1,0),ncases),genotype))

 #Split into abritrary sets , do assoc analysis
 pres <- ncases/setsize
 stat <- matrix(nrow=pres*2,ncol=2)
 for (rep in 1:(2*pres))

  ds <- dat[((rep-1)*setsize + 1):(rep*setsize),]
  stat[rep,] <- summary(glm(V1~ genotype,family="binomial",data=ds))$coefficients[2,1:2]

 zscore <- T/sem

 #results[sim,1] <- rma(yi=stat[,1],sei=stat[,2],method="FE")$zval
 results[sim,1] <- zscore
 results[sim,2] <- summary(glm(V1~genotype,data=dat,family="binomial"))$coefficients[2,3]

 #Test whether or not the analyses produce different test stats

 #When the AF is common, having 50 samples per study results in a perhaps 5% loss in efficiency. This represents a worst case scenario, as most studies have > 50
 #However, at low MAF, un-estimatable logistic estimates become common - e.g. with af 10% and 7% in cases and controls respectively, 3% of results had huge SEs

#Conclusion: For common variation, At lower mafs, efficiency reduces exponentially due to misestiamtion due to sparse cell counts
#The realistic loss of efficiency could be a an interpolated number between the losses at n=500 and n=50
#With AF 10%, approx 1/25 of results will have at least one study with a mis-estimated parameter