samtools view -H Capture the FLAG.Īs we discussed earlier, the FLAG field in the BAM format encodes several key pieces of information regarding how an alignment aligned to the reference genome. One can ask the view command to report solely the header by using the -H option. The “header” in a BAM file records important information regarding the reference genome to which the reads were aligned, as well as other information about how the BAM has been processed. samtools view | wc -l Inspect the header.
samtools view -? Count the total number of alignments. You can use the detailed help to get a better sense of what each character in the human “readable” FLAG means. Let us start by inspecting the first five alignments in our BAM in detail. samtools view | head -n 5 Let’s make the FLAG more readable It’s main function, not surprisingly, is to allow you to convert the binary (i.e., easy for the computer to read and process) alignments in the BAM file view to text-based SAM alignments that are easy for humans to read and process. The samtools view command is the most versatile tool in the samtools package. How many alignments are there in this region? samtools view 1:33000000-34000000 | wc -l For now, just do it without understanding. To do this, we use the samtools view command, which we will give proper treatment in the next section. Now, let’s exploit the index to extract alignments from the 33rd megabase of chromosome 1.
List ( ls) the contents of the current directory and look for the new index file. This will create an additional “index” file. Moreover, indexing is required by genome viewers such as IGV so that the viewers can quickly display alignments in each genomic region to which you navigate. Indexing a genome sorted BAM file allows one to quickly extract alignments overlapping particular genomic regions. Notice anything different about the coordinates of the alignments? Now let’s check the order: samtools view | head That is, ordered positionally based upon their alignment coordinates on each chromosome. It must be sorted such that the alignments occur in “genome order”. samtools view sample.bam | headĭoing anything meaningful such as calling variants or visualizing alignments in IGV) requires that the BAM is further manipulated. In other words, the BAM file is in the order that the sequences occurred in the input FASTQ files. When you align FASTQ files with all current sequence aligners, the alignments produced are in random order with respect to their position in the reference genome. samtools view sample.bam | head samtools “sort” Now, we can use the samtools view command to convert the BAM to SAM so we mere mortals can read it. samtools view -S -b sample.sam > sample.bam Samtools follows the UNIX convention of sending its output to the UNIX STDOUT, so we need to use a redirect operator (“>”) to create a BAM file from the output. We must also say that we want the output to be BAM (by default it produces BAM) with the -b option. We must specify that our input is in SAM format (by default it expects BAM) using the -S option. To convert SAM to BAM, we use the samtools view command. However, it is consequently very difficult for humans to read. The binary format is much easier for computer programs to work with. To do anything meaningful with alignment data from BWA or other aligners (which produce text-based SAM output), we need to first convert the SAM to its binary counterpart, BAM format. Samtools depth Converting SAM to BAM with samtools “view” curl > Īs you can see, there are multiple “subcommands” and for samtools to work you must tell it which subcommand you want to use. cd samtools-demoĭownload the example gzipped SAM file I have provided. you may already have a src directoryĬreate a new directory from your home directory called “samtools-demo”.
#BAM FILE FORMAT HOW TO#
Our goal is to work through examples that demonstrate how to explore, process and manipulate SAM and BAM files with the samtools software package.įor future reference, use the samtools documentation.