nievergeltlab
3/22/2018 - 3:56 PM

Convert 2 column to 1 column probablity in PLINK

PLink by default has 2 col probablities. This converts to 1 column.


#Need to get N subjects from .fam file, so we can increment the loop correctly
 nsub=$(wc -l crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.fam | awk '{print $1}')

#Need to keep the original header from the plink. zip file

 zcat crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.gz | head -n1 > header.txt

#Create the format=1 dosage file
 zcat crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.gz | awk -v s=$nsub 'NR>1{ printf $1 " " $2 " " $3; for(i=1; i<=s; i++) printf " " $(i*2+2)*2+$(i*2+3); printf "\n" }' | cat header.txt - | gzip > 1dosecrp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.gz

#Compare regression outputs between the 2 and 1 format dosage files
./plink.exe --dosage crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.gz --fam crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.fam --logistic --out 2dosetxt
./plink.exe --dosage 1dosecrp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.gz format=1 --fam crp_dos_pts_vets_mix_am-qc.hg19.ch.fl.allchr.out.dosage.fam --logistic --out 1dosetxt