The data correspond to the article entitled: "dNTPs and adjuvant reagent solutions in 3’ RACE improve the characterization of noncanonical RNA SARS-CoV-2 genomes" R1. RACE 3’ Primer Blast Alignment. Contains BLAST alignments against the GenBank database using the consensus nucleotide sequence from the 3’ end of the SARS-CoV-2 genome and the polylinker. In addition, an illustration of the restriction enzyme pattern of the 3' RACE primer RV30AkCOVID19 and its synthesis by MALDI-TOF is included. The red box indicates the nucleotide sequence of the polylinker and the yellow box represents the 3' RACE primer along with the result of primer synthesis and purification. Graphic representation of the procedure for SARS-CoV-2 genome cDNA synthesis and design of the 3’ RACE RV30AkCOVID19 primer. The rectangle with vertical lines and the dots represents the 3’ RACE RV30AkCOVID19 primer and the polylinker, respectively, in the region complementary to the 3’ UTR end. The arrow represents the reverse transcriptase during complementary strand synthesis. The scissors represent RNases used in purification. The black spheres and magnets indicate the purification process using magnetism. R2. Reads and assembles SARS-CoV-2 genomes. The folder "1) Reads - Ion torrent" contains the reads obtained from sequencing via Ion Torrent technology and the reagents used in this study. The folder named "2) FastQC" contains the results of Ion Torrent sequencing. In the file name, the number indicates the sample, and the letters "RNA" indicate the sequencing according to the IonTorrent protocol. The cDNA synthesis procedures for this study correspond to the following nomenclature: dNTPs-R = dNTPs SARS-CoV-2 solution, DES-R = denaturation reagent, and COM PRO = commercial procedure. The folders named "3) IRMA" and "4) Bowtie2" contain the assemblies of the genomes. Regions and/or codons with loss of genomes 07dN120320 and 27sT122620. Mutations and amino acid substitutions of the SARS-CoV-2 genomes. In addition, an Excel document with the nucleotide ratios of each characterized genome is included from SARS-CoV-2. R3. BLAST alignment of assembled SARS-CoV-2 genomes. Contains two folders named "BLAST - IRMA" and "BLAST - Bowtie2," which contain plain text documents with the results of the BLAST alignment for the genomes obtained with each of the assemblies. R4. Pangolin v1.16 and Nextclade v2.9.1 lineages for SARS-CoV-2 genomes. Contains the folders "Pangolin and Nextclade (Bowtie2)" and "Pangolin and Nextclade (IRMA)." Each folder shows the data obtained with the Pangolin v1.16 and Nextclade v2.9.1 software for the classification of the genomes reported in this study, which were assembled with the IRMA and Bowtie2 software. R5. Reference genome alignment and assembled genomes. Contains the folders "1) IRMA genomes," "2) Bowtie2 genomes," and "3) Genomes 07dN120320 and 27St122620." The files show the sequences and alignments of the examined genomes (the file name indicates the analyzed genome) relative to the SARS-CoV-2 reference genome both in FASTA and Clustal W formats. R6. Programmed −1 Ribosomal Frameshifting Structure. The folder "1) Gibbs free energy 2D" contains a plain text document indicating the secondary structures of the open reading frame stimulation element in dot-bracket format. The folder "2) modeling Data Modeling 3D" contains the information for generating the structure of folder 1 in 3D. R7. SARS-CoV-2 Database. 1) GISAID_sequences.zip contains a Zip file that contains a folder named GISAID, which in turn contains plain text documents with the genomes of each variant indicated in the filename of each document. 2) The depuration of sequences_GISAID contains two subfolders. The first subfolder, named "1) SARS-CoV-2 complete genome" contains plain text documents with the genomes downloaded from GISAID without undetermined nucleotides. The file name of each document corresponds to the analyzed variant. The subfolder "2) SARS-CoV-2 eliminate genome" contains the sequences eliminated from subfolder 1 because they differed from the majority of the analyzed sequences. 3) SARS-CoV-2 consensus variants. Contains plain text documents with consensus sequences for each variant, with frequency thresholds of 20 and 100 indicated in the file name of each document. 4) SARS-CoV-2 alignment consensus variants. Contains two subfolders, with the number indicating the alignment frequency threshold. The "Alignment 20_" subfolder contains four documents named "with Ns," which correspond to fasta and Clustal formats with undetermined nucleotides, whereas the files named "without" do not have undetermined nucleotides. The "100_" folder has the same file pattern as the previous folder. 5) SARS-CoV-2 codons alignment consensus variants and nc-sgRNA. Contains a document with the alignment of the genomes characterized in this study with the reference genome of SARS-CoV-2. A subfolder named “SARS-CoV-2 codons nc-sgRNA” shows each of the nc-sgRNA obtained in this study with the reference genome, and the file name corresponds to the nc-sgRNAs. The subfolder “SARS-CoV-2 Geneious Prime” contains 4 documents. Each document includes the graphical representation of the alignment of the nc-sgRNA obtained with each treatment for the synthesis of SARS-CoV-2 cDNA with respect to the reference genome. The following three documents indicated with the numbers 25, 50, and 100 correspond to the percentage of identity with respect to the number of annotations relative to the reference genome, which is indicated in the title of each document. 6) Variant Alignment – Ns. Contains eight documents corresponding to the fasta and clustal formats with SARS-CoV-2 genomes obtained in this study from the reference genome and from genomes containing undetermined nucleotides of the Gamma, Lambda, Mu and Omicron variants. R8. Phylogeny SARS-CoV-2. Contains two subfolders with the results of the phylogenetic analyses conducted via the maximum likelihood method of the genomes characterized in this study compared to the variants. The subfolder named "Phylogeny with Ns" indicates the analysis of genomes containing undetermined nucleotides, whereas "Phylogeny without Ns" corresponds to the analysis of complete genomes.
Tópico:
RNA and protein synthesis mechanisms
Citaciones:
0
Citaciones por año:
No hay datos de citaciones disponibles
Altmétricas:
0
Información de la Fuente:
FuenteZenodo (CERN European Organization for Nuclear Research)