[minimap2] minimap2 설치 및 수행 방법

안녕하세요 한주현입니다.

오늘은 Heng Li 의 새로운 mapping tool 인 minimap2 의 설치 및 수행 방법에 대해 작성해 보겠습니다.

1. minimap2 다운로드

2. minimap2 설치

3. minimap2 수행

1. minimap2 다운로드

1) 사이트에 접속합니다

https://github.com/lh3/minimap2

2) 자료 다운로드

저는 git clone 으로 자료를 다운받았습니다.

사이트에서 빨간색으로 네모 친 부분을 눌러 git 주소를 복사 합니다.

터미널에서 git clone 명령어로 다운 받습니다.

1
$ git clone https://github.com/lh3/minimap2.git

2. minimap2 설치

1) 다운 받은 minimap2 디렉터리에 들어갑니다.

2) make 명령어로 설치합니다.

3) 설치 후 ./minimap2 로 실행 합니다. 다음과 같이 나오면 설치 성공!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
$ ./minimap2
Usage: minimap2 [options] <target.fa>|<target.idx> [query.fa] [...]
Options:
  Indexing:
    -H           use homopolymer-compressed k-mer (preferrable for PacBio)
    -k INT       k-mer size (no larger than 28) [15]
    -w INT       minizer window size [10]
    -I NUM       split index for every ~NUM input bases [4G]
    -d FILE      dump index to FILE []
  Mapping:
    -f FLOAT     filter out top FLOAT fraction of repetitive minimizers [0.0002]
    -g NUM       stop chain enlongation if there are no minimizers in INT-bp [5000]
    -G NUM       max intron length (effective with -xsplice; changing -r) [200k]
    -F NUM       max fragment length (effective with -xsr or in the fragment mode) [800]
    -r NUM       bandwidth used in chaining and DP-based alignment [500]
    -n INT       minimal number of minimizers on a chain [3]
    -m INT       minimal chaining score (matching bases minus log gap penalty) [40]
    -X           skip self and dual mappings (for the all-vs-all mode)
    -p FLOAT     min secondary-to-primary score ratio [0.8]
    -N INT       retain at most INT secondary alignments [5]
  Alignment:
    -A INT       matching score [2]
    -B INT       mismatch penalty [4]
    -O INT[,INT] gap open penalty [4,24]
    -E INT[,INT] gap extension penalty; a k-long gap costs min{O1+k*E1,O2+k*E2} [2,1]
    -z INT[,INT] Z-drop score and inversion Z-drop score [400,200]
    -s INT       minimal peak DP alignment score [80]
    -u CHAR      how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n]
  Input/Output:
    -a           output in the SAM format (PAF by default)
    -Q           don't output base quality in SAM
    -L           write CIGAR with >65535 ops at the CG tag
    -R STR       SAM read group line in a format like '@RG\tID:foo\tSM:bar' []
    -c           output CIGAR in PAF
    --cs[=STR]   output the cs tag; STR is 'short' (if absent) or 'long' [none]
    --MD         output the MD tag
    --eqx        write =/X CIGAR operators
    -Y           use soft clipping for supplementary alignments
    -t INT       number of threads [3]
    -K NUM       minibatch size for mapping [500M]
    --version    show version number
  Preset:
    -x STR       preset (always applied before other options; see minimap2.1 for details) []
                 - map-pb/map-ont: PacBio/Nanopore vs reference mapping
                 - ava-pb/ava-ont: PacBio/Nanopore read overlap
                 - asm5/asm10/asm20: asm-to-ref mapping, for ~0.1/1/5% sequence divergence
                 - splice: long-read spliced alignment
                 - sr: genomic short-read mapping
 
See `man ./minimap2.1' for detailed description of these and other advanced command-line options.

3. minimap2 수행

1) reference 파일의 index 만들기

ucsc.hg19.fasta 파일이 들어있는 디렉터리로 가서

다음 명령어를 실행합니다.

$ minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta

명령어를 수행하면 다음과 같은 로그가 나옵니다.

1
2
3
4
5
6
7
8
9
10
$ minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta  
[M::mm_idx_gen::75.576*1.76] collected minimizers
[M::mm_idx_gen::90.135*1.96] sorted minimizers
[M::main::102.770*1.83] loaded/built the index for 93 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 93
[M::mm_idx_stat::103.674*1.82] distinct minimizers: 100029963 (38.72% are singletons); average occurrences: 5.458; average spacing: 5.746
[M::main] Version: 2.14-r886-dirty
[M::main] CMD: minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta
[M::main] Real time: 103.849 sec; CPU: 189.092 sec; Peak RSS: 11.213 GB
 

대충 2분 정도 걸렸네요.

2) minimap2 수행

1
2
3
4
5
6
minimap2 -ax sr \
  -t <Thread> \
  -R <ReadGroup> \
  ucsc.hg19.fasta \
  sample_1.fastq.gz \
  sample_2.fastq.gz

이렇게 하면 sam 포맷 파일이 standard out 으로 출력됩니다.

주목할 점은 첫 번째 옵션으로 -ax sr 인데 short genomic paired-end reads 일 때 사용하는 preset 입니다.

다음 내용을 보시고 해당하는 preset을 사용하시면 되겠습니다.

1
2
3
4
5
6
7
8
9
10
11
12
# use presets (no test data)
./minimap2 -ax map-pb ref.fa pacbio.fq.gz > aln.sam       # PacBio genomic reads
./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam         # Oxford Nanopore genomic reads
./minimap2 -ax asm20 ref.fa pacbio-ccs.fq.gz > aln.sam    # PacBio CCS genomic reads
./minimap2 -ax sr ref.fa read1.fa read2.fa > aln.sam      # short genomic paired-end reads
./minimap2 -ax splice ref.fa rna-reads.fa > aln.sam       # spliced long reads (strand unknown)
./minimap2 -ax splice -uf -k14 ref.fa reads.fa > aln.sam  # noisy Nanopore Direct RNA-seq
./minimap2 -ax splice -uf -C5 ref.fa query.fa > aln.sam   # Final PacBio Iso-seq or traditional cDNA
./minimap2 -cx asm5 asm1.fa asm2.fa > aln.paf             # intra-species asm-to-asm alignment
./minimap2 -x ava-pb reads.fa reads.fa > overlaps.paf     # PacBio read overlap
./minimap2 -x ava-ont reads.fa reads.fa > overlaps.paf    # Nanopore read overlap
 

작은 테스트셋 데이터로 진행하였을 때

bwa 와 비교해보면 약 20% 정도의 시간 감축이 있었습니다.

(mapping 된 결과는 물론 다릅니다만... ㅎㅎ)

오늘은 Heng Li 의 새로운 mapper 인 minimap2의 설치 및 수행방법에 대해 알아보았습니다.

여러분들께 도움 되셨음 좋겠습니다.

그럼 다음에 만나요~~

참고자료:

https://github.com/lh3/minimap2

기부 버튼을 만들었습니다

단지 $1 의 작은 정성도 저에게는 큰 힘이 됩니다

기부해주신 분들을 기억하며

더 좋은 내용으로 보답해 드리겠습니다 :)

Donate 버튼은 paypal 결제로 paypal 계정이 없으시더라도

카드로도 기부 가능하십니다 :)

Use your credit card or bank account (where available). 옆의 continue 를 누르시면 됩니다

한주현 드림

저작자표시 비영리 변경금지 (새창열림)

'생물정보학 > Tools' 카테고리의 다른 글

[github] github 파일 다운로드 방법 (5)	2019.05.27
[Tool] BLAST 리눅스 로컬 설치 및 실행방법 - BLAST LINUX local install and execute - BLAST 로컬 설치의 장단점 (2)	2018.12.08
[Tool] gnomAD 란? gnomAD vcf 다운로드 방법 및 압축 푸는 방법- gnomAD 활용 방법 - gnomad vcf bgz: unknown suffix -- ignored (2)	2018.11.17
[생물정보학] VCF에서 snpEff html report의 정보 내용 가져오기 (1)	2018.11.16
[생물정보학] Fasta Reader GUI, 윈도우에서 FASTA 파일 읽어서 염기서열 세는 프로그램, JAVA GUI 예제, JAVA Swing 예제 (0)	2018.11.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

생물정보학자의 블로그

[minimap2] minimap2 설치 및 수행 방법

'생물정보학 > Tools' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[minimap2] minimap2 설치 및 수행 방법

'생물정보학 > Tools' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역