ImpactU Versión 3.11.2 Última actualización: Interfaz de Usuario: 16/10/2025 Base de Datos: 29/08/2025 Hecho en Colombia
Datasets of the manuscript "Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons"
<strong>DATASETS</strong> Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons Liliane S. Oliveira, Alejandro Reyes, Bas E. Dutilh and Arthur Gruber<sup>*</sup> * Correspondence: argruber@usp.br (AG); Tel. +55 11 3091 7274 Here we provide different data of <em>Microviridae</em>, <em>Flavivirus</em> and casposons used throughout the work: Microviridae folder conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for <em>Microviridae</em> discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for <em>Microviridae</em> sequences – different sequence datasets and respective multiple sequence alignments Microviridae_113-seq_training_set.fasta - 113 VP1 sequences covering diversity of the <em>Microviridae</em> family Microviridae_113-seq.aln – multiple sequence alignment of the 113-protein dataset Microviridae_1836-seq_testset.fasta - 1,836 sequence dataset covering 1,836 sequences of the major capsid protein (VP1) comprising 501 <em>Alpavirinae</em> sequences, 1,040 <em>Gokushovirinae</em> sequences and 295 <em>Pichovirinae</em> sequences Microviridae_1866-seq.aln - multiple sequence alignment of the 1,866-protein <em>Microviridae</em> dataset used in the experiment of Figure 4 Flavivirus folder conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for <em>Flavivirus</em> discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for <em>Flavivirus</em> full-length – models constructed from full-length protein sequences short - models constructed from selected short alignment blocks of the protein sequences sequences – different sequence datasets and respective multiple sequence alignments Flavivirus_127-seq_training_set.fasta - 127 polyprotein sequences covering species diversity of the genus <em>Flavivirus</em> Flavivirus_127-seq.aln – multiple sequence alignment of the 127-protein dataset Flavivirus_6364-seq_testset.fasta - 6,364 sequence dataset covering species diversity of <em>Flavivirus</em>, including 3,919 of dengue virus (DENV), 327 of Zika virus (ZIKV), 63 of yellow fever virus (YFV), and the remaining 2,055 sequences covering other available flaviviruses Flavivirus_6364-seq.aln - multiple sequence alignment of the 6,364-protein <em>Flavivirus</em> dataset Casposons folder casposon_generic_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the generic detection of all casposons and discrimination from CRISPRs. casposon_family_discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the specific discrimination among casposon families and from CRISPRs. sequences – different sequence datasets and respective multiple sequence alignments casposons_crisprs.fasta – 106 Cas1 <em>bona fide</em> sequences derived from 52 CRISPRs and 54 casposons casposon_family_discrimination.aln - multiple sequence alignment of 52 <em>bona fide</em> CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for the discrimination of each casposon family. casposons_crisprs_discrimination.aln - multiple sequence alignment of 52 <em>bona fide</em> CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for discrimination of CRISPRs and casposons.
Tópico:
Bacteriophages and microbial interactions
Citaciones:
0
Citaciones por año:
No hay datos de citaciones disponibles
Altmétricas:
0
Información de la Fuente:
FuenteZenodo (CERN European Organization for Nuclear Research)