meta-sourcetracker
# First you need to download the files
# Using as example ERP002469 in /bioinf/projects/megx/meta_sourcetrack/
# We will use the file ERP002469.txt downloaded from ENA, the script assumes
# that the file has the following fields and tab delimited (I think is the default). Example:
#PRJEB1786 ERP002469 SAMEA1906452 ERS235496 ERX234720 ERR260132 9606 Homo sapiens Illumina HiSeq 2000 PAIRED ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/ERA206/ERA206883/fastq/NG-5636_304_1_sequence.fastq.gz;ftp.sra.ebi.ac.uk/vol1/ERA206/ERA206883/fastq/NG-5636_304_2_sequence.fastq.gz ftp.sra.ebi.ac.uk/vol1/ERA206/ERA206883/fastq/NG-5636_304_1_sequence.fastq.gz;ftp.sra.ebi.ac.uk/vol1/ERA206/ERA206883/fastq/NG-5636_304_2_sequence.fastq.gz
# We want to get the field number 11
tail -n+2 ERP002469.txt | cut -f 11 | tr ';' $'\n' | awk '{print "http://"$0}' | grep _ > links_ena.txt
# The file links_ena.txt contains:
#ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_1.fastq.gz
#ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR260/ERR260132/ERR260132_2.fastq.gz
# Then we can use wget, curl, aria to download the files from links_ena.txt
# Once we get the files we will execute ./scripts/meta_sourcetracker/sge_runner.sh
# from the folder /bioinf/projects/megx/meta_sourcetrack where ERP002469 is the folder
# we want to crunch
./scripts/meta_sourcetracker/sge_runner.sh ERP002469
# The script will distribute the jobs in the cluster, for now 5 simultaneous jobs due our space restrictions
# Interesting results are in:
# ERR260132_kaiju_report.txt that reports at genus level
# % reads genus
# ---------------------------------
# 15.79 1038842 Bacteroides
# 7.66 504018 Eubacterium
# 4.73 311178 Roseburia
# and ERR260132_kaiju_domain.txt that summarises the classified/unclassified and at the domain level
# Classified:5633122
# UnClassified:944306
# Viruses:4266
# Archaea:5799
# Bacteria:5590819
# Eukaryota:16645