s4553711
6/19/2014 - 10:13 PM

Generate Low Complexity Region (LCR) bedfile of masked regions from UCSC repeatmasker data and its complement for use with bcftools

Generate Low Complexity Region (LCR) bedfile of masked regions from UCSC repeatmasker data and its complement for use with bcftools

#!/bin/bash
wget 'http://hgdownload.soe.ucsc.edu/goldenPath/ce10/database/rmsk.txt.gz' -O LCR_rmsk.txt.gz
gunzip -kfc LCR_rmsk.txt.gz | grep 'Low_complexity' | cut -f 6,7,8 > LCR_ce10_rmsk.bed
rm LCR_rmsk.txt.gz


# Generate the set of regions complementary (e.g. NOT low complexity)
# Download c. elegans chromosome information
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from ce10.chromInfo"  > ce10.genome
bedtools complement -i LCR_ce10_rmsk.bed  -g ce10.genome | sort -k 1,1 -k2,2n > LCR_complement_ce10.bed