- The hypothesis stated above is false. (read the post
- In bowtie2, the best a true multiread (AS=XS) can get is MAPQ=1 regardless of how low or high its multiplicity. This occurs when there are 0 or 1 mismatches over perfect base calls in the read, or when AS=XS goes down to -6. When there are 2-5 mismatches over perfect base calls (or the AS=XS <= -12 ---- i.e. -12 to -30.6), the MAPQ becomes 0.
- If someone wanted to exclude "true multireads" from their data set, using MAPQ >= 2 would work.
- However, this would also exclude any uniquely mapping reads with >=4 mismatches over high quality bases.
- In terms of high quality bases and unireads, MAPQ >= 3 allows up to 3 mismatches, MAPQ >= 23 allows up to 2 mismatches, MAPQ >= 40 allows up to 1 mismatch, and MAPQ >= 42 allows 0 mismatches. There will also be other "maxireads" in most or all of these sets.
My test using bowtie2 on the uORF ribo-seq datasets
with MAPQ >31 we tend to the report of bowtie in term of "uniquely mapped". But many people say that this report is not accurate anyway