[바이오파이썬] 4.2 Sequence Record 객체 - FASTA, GenBank 파일로 부터 생성하기

안녕하세요 한주현입니다.

오늘은 FASTA, GenBank 파일로 부터 SeqRecord 객체를 만드는 방법에 대해 알아보겠습니다.

바로 이전 포스팅에서 SeqRecord 객체에 대해서 알아보았는데요,

http://korbillgates.tistory.com/86

SeqRecord 객체는 서열과 annotation 정보 등등을 포함한 객체입니다.

이전 시간에는 아래와 같이 직접 타이핑을 하여 SeqRecord 객체를 만들었는데요,

1
2
3
4
5
6

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
 
simple_seq = Seq("GATC")
simple_seq_r = SeqRecord(simple_seq)
print(simple_seq_r)

cs

ID: <unknown id>

Name: <unknown name>

Description: <unknown description>

Number of features: 0

Seq('GATC', Alphabet())

이번시간에는 FASTA 파일 또는 GenBank 파일로 부터 SeqRecord 객체를 만들어 봅시다.

1. FASTA 파일로 부터 SeqRecord 객체 만들기

실습을 위해 다음 파일을 받습니다.

https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.fna

1
2
3

from Bio import SeqIO
record = SeqIO.read("NC_005816.fna", "fasta")
print(record)
Colored by Color Scripter

cs

ID: gi|45478711|ref|NC_005816.1|
Name: gi|45478711|ref|NC_005816.1|
Description: gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Number of features: 0
Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', SingleLetterAlphabet())

SeqIO의 read 메서드에는 두 개의 인자가 들어갑니다.

1) 파일 이름

2) 파일 타입

이렇게 두 개의 인자를 넣어주시면

간단히 SeqRecord 객체를 만들 수 있습니다.

아래와 같이 dot을 통해 SeqRecord 객체의 메서드에 접근 할 수 있습니다.

1

print(record.id)

cs

gi|45478711|ref|NC_005816.1|

1

print(record.dbxrefs, record.annotations, record.letter_annotations, record.features)

cs

[] {} {} []

FASTA로 부터 만든 SeqRecord 객체에는 dbxrefs, annotations, letter_annotations, features 의 정보가 없군요 ㅎㅎ;

2. GenBank 파일로 부터 SeqRecord 객체

실습을 위해 다음 파일을 받습니다.

https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.gb

1
2
3

from Bio import SeqIO
record = SeqIO.read("NC_005816.gb", "genbank")
print(record)
Colored by Color Scripter

cs

ID: NC_005816.1
Name: NC_005816
Description: Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Database cross-references: Project:58037
Number of features: 41
/molecule_type=DNA
/topology=circular
/data_file_division=BCT
/date=21-JUL-2008
/accessions=['NC_005816']
/sequence_version=1
/gi=45478711
/keywords=['']
/source=Yersinia pestis biovar Microtus str. 91001
/organism=Yersinia pestis biovar Microtus str. 91001
/taxonomy=['Bacteria', 'Proteobacteria', 'Gammaproteobacteria', 'Enterobacteriales', 'Enterobacteriaceae', 'Yersinia']
/references=[Reference(title='Genetics of metabolic variations between Yersinia pestis biovars and the proposal of a new biovar, microtus', ...), Reference(title='Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans', ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]
/comment=PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AE017046.
COMPLETENESS: full length.
Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA())

SeqIO의 read 메서드에는 두 개의 인자가 들어갑니다.

1) 파일 이름

2) 파일 타입

앞선 FASTA 예제에서는 두 번째 들어가는 인자에 "fasta"를 넣어주었고,

이번에는 "genbank"를 넣어주었습니다.

이렇듯 biopython에서 지원하는 파일 형식을 넣어주시면 쉽게 SeqRecord 객체를 만들 수 있습니다.

참고로 biopython에서 지원하는 파일 형식 목록은 다음 링크를 참조해 주세요.

http://www.biopython.org/wiki/SeqIO#file-formats

1
2
3
4
5
6
7
8

print(record.seq)
print(record.id)
print(record.description)
print(record.letter_annotations)
print(record.annotations)
print(record.annotations["source"])
print(record.dbxrefs)
print(len(record.features))

cs

Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA())
'NC_005816.1'
'Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence'
{}
13
'Yersinia pestis biovar Microtus str. 91001'
['Project:58037']
41

각 객체에 dot으로 접근이 가능합니다

오늘은 Biopython에서 SeqRecord객체를 FASTA, GenBank 파일로 부터 생성하는 방법에 대해 알아보았습니다.

그럼 다음 시간에 만나요 ~~~~

저작자표시 비영리 변경금지 (새창열림)

'생물정보학 > Biopython (바이오파이썬)' 카테고리의 다른 글

[바이오파이썬] 4.3.2 SeqFeature 의 position, location (0)	2018.05.24
[바이오파이썬] 4.3.1 SeqFeature 객체 (0)	2018.05.22
[바이오파이썬] 4.1. Sequence Record 객체 (0)	2017.10.29
[바이오파이썬] 03. Sequence 객체 (8)	2017.04.09
[바이오파이썬] 02. 바이오파이썬으로 할 수 있는 일들 (12)	2017.04.09

생물정보학자의 블로그

[바이오파이썬] 4.2 Sequence Record 객체 - FASTA, GenBank 파일로 부터 생성하기

'생물정보학 > Biopython (바이오파이썬)' 카테고리의 다른 글

댓글

티스토리툴바

[바이오파이썬] 4.2 Sequence Record 객체 - FASTA, GenBank 파일로 부터 생성하기

'생물정보학 > Biopython (바이오파이썬)' 카테고리의 다른 글

관련글

댓글

티스토리툴바