MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads

Next-generation sequencing technologies provide exciting new avenues for transcriptomics and population genomics research. There is an increasing need to conduct spliced and unspliced alignments of the short transcript reads onto a reference genome and to detect SNP from sequences of population samples.

MapNext provides four mainly analysis: (i) unspliced alignment and clustering of reads, (ii) spliced alignment of transcriptomic reads, (iii) SNP detection and calculation of SNP frequency from population sequences, and (iv) storage of result data into database to make it available for more flexible query and further analyses.

Download:

SourceCode:mapnext.cpp
format_reference.pl   get_splice_seq.pl   get_splice_posi.pl
manual_mapnext.pdf
1.0-bit(x86_64, Linux)

Important notes:
Read the manual first.
Format your reference sequence file using format_reference.pl.
Format your reads file using dos2unix under Unix operation system.

Testing data:
Short reads mapping:
Reference sequence file:Chromosome 1 of arabidopsis
Solexa reads file:Solexa reads   (We simulated 1893118 reads(35bp length) from 5796 coding DNA sequences of chromosome I of Arabidopsis thaliana for the query dataset.)
Unspiced alignment results: Unspiced alignment of simulated reads
Spliced alignment results: Spliced alignment of simulated reads  (-w 9)
Reads cluster results: Cluster of simulated reads
Command: mapnext -s reads.fa -g NC_003070.fa   for unspliced alignment
Command: mapnext -s reads.fa -g NC_003070.fa -t y -w 9   for both unspliced and spliced alignment

SNP detection of simulated population sequences:
Population sequences file:Population sequences (50 individuals (haploid), 2162 simulated true SNPs)
Reference sequence file:Reference sequence for population sequences
Reads file:Simulated Solexa reads  (6X coverage per individual)
Real SNP sites:Simulated real SNP sites
Candidate SNPs detected by MapNext: Candidate SNPs results
Accuracy of SNP detection at 6X per individual:SNP statistics  (6X coverage per individual)
Command: mapnext -a reads.fq -g ref.fa -n y -e 50 -f 0.01

SNP detection of population sequences:
Reference sequence file:75 genes of Sonneratia alba
Solexa reads of population samples:Solexa reads   (8161413 short reads ,35bp length, generated by Illumina-Solexa Genome Analyzer from 75 genes(total 80Kbp length) of 100 Sonneratia alba individuals.)
Alignment results: Unspiced alignment of short reads
Candidate SNPs detected by MapNext: Candidate SNPs results

Command: mapnext -a reads.fastaq -c ref.fasta -n y -e 800 -f 0.01