<strong>R1: Establishment and purification of neuropeptide sequences</strong> The LW, APGW, RPCH, AKH, CRZ, and GnRH neuropeptide families were searched in the GenBank database using 10 keywords: the neuropeptide name, the precursor abbreviation, the full name of the precursor, the full name of the precursor with the word “prepropeptide,” and the combinations of these terms. The candidate sequences were downloaded in FASTA format using the appropriate commands in the GenBank database. The AKH neuropeptide family was classified according to the groups published in the literature, as well as the amino acid number and sequence. Furthermore, the ACP hybrid family was identified in the GenBank database using BLAST alignments. <strong>C00: Neuropeptide Precursor. </strong>Eight folders were named with the initials of each neuropeptide family. The AKH family folder was the only one containing four subfolders. All of the folders contained the same type of files: three text files named after the neuropeptide initials and the obtained result. The files identified with the words “<em>with codes</em>” contained the sequences with the codes generated for this study, whereas the documents with the word “<em>Full</em>” contained the GenBank database search results obtained with the 10 aforementioned keywords. These files were located in a folder named “<em>Fasta Keywords.</em>” Each file contained the results from each respective keyword. The files with the words “<em>selected EA</em>” contained the sequences that were selected for evolutionary analyses. <strong>C01: BLAST ACP</strong>. The text file named “00 BLAST ACP” contains the BLAST alignment results obtained from the NCBI database generated with the Adipokinetic Hormone/Corazonin-related peptide from the transcriptome of <em>Callinectes toxotes</em>. The file named “01 ACP Selected” contains the precursors selected for this study. All sequences were in FASTA format and contained the codes summarized in Supplementary Material 3 “<em>Database Sequences.</em>” The file named “<em>02 ACP selected EA</em>” contains the ACP precursors of other species, which were used for the evolutionary analyses of <em>C. toxotes</em> ACP. The PDF file titled “<em>03 ACP ProP 1.0 Serv</em>” contains the results of the proteolytic cleavage sites of the precursors indicated in the file named “<em>02 ACP selected EA,</em>” which were generated using the aforementioned software. <strong>C02: BLAST VP.</strong> The folder contains the results of the BLAST alignment against the NCBI database, which were generated with the virtual peptide sequences reported by Martinez-Perez et al. (2007). This folder contains seven text files. The name of each file corresponds to the precursor and species in which it was identified. Moreover, the PDF document named “<em>Virtual peptides ProP 1.0 Serv</em>” contains the results of the proteolytic cleavage sites generated with the aforementioned software. <strong>C03: Debugging sequences with software.</strong> This folder contains three subfolders containing the results obtained with each software used in this study for the detection of each of the neuropeptide sequences using the appropriate keywords. The folder named “<em>BioDataToolKit</em>” contains six subfolders with the abbreviated name of each neuropeptide. Additionally, there is a file containing the sequences downloaded from the GenBank database, as well as a Microsoft Excel file containing the details generated by the software. The name of each file corresponds to the keywords used for each search. The software used in this study can be found in the following repository: https://github.com/rduarte24/BiodataToolkit. The folder named “<em>Pro1.0Server</em>” was organized in the same way as the results derived for the “<em>BioDataToolKit</em>” for each neuropeptide family. However, each of the neuropeptide folders contained a file with the pertinent sequences whereas another file contained the endoproteolytic cleavage sites of the neuropeptide precursors obtained with the software. The folder named “Proteios” contains seven files. The file names indicate the precursor analyzed with the software and the identified sequences in FASTA format. The Proteios software is available in the following website: https://github.com/Martin-Munive/Proteios. <strong>C04: Neuropeptide precursors for evolutionary analysis.</strong> Files with the sequences of the neuropeptide precursors used for the generation of the phylogenetic trees in Supplementary Materials 4 and 7. The name of each file corresponds to the name of each of the analyzed neuropeptides. <strong>R2: Transcriptome BLAST</strong> Microsoft Excel file containing the BLAST alignments conducted using the sequences of the AKH/CRZ-related peptide (ACP) from <em>C. toxotes</em> and Corazonin (CRZ) from <em>C. arcuatus</em>. The following information is summarized in the spreadsheets named <em>C. toxotes</em> and <em>C. arcuatus</em>: Column A, neuropeptide name; Column B, species name; Columns C–G, BLAST alignment results; Column H, GenBank protein accession number; Column I, precursor sequence. <strong>R3: </strong><strong>Construction of neuropeptide database</strong> Microsoft Excel file with information pertaining to the database and a detailed description of each of the neuropeptide precursors analyzed in this study. The Excel file contains seven spreadsheet tabs. Each of the tabs contains the following columns: <strong>Neuropeptides.</strong> Column A, sequence numbering in descending order; Column B, neuropeptide name; Column C, identification code used in this study; Column D, accession number; Columns E–G, species taxonomy; Columns H–L, GenBank sequence description; Columns M–N, literature reference and link. <strong>Taxonomy.</strong> Taxonomic description of each of the examined species derived from the NCBI database. <strong>Sequences evolutionary anal</strong>. This tab contains the code developed for this work in Column C; the GenBank accession codes of each neuropeptide are summarized in Column D and species taxonomy details are summarized in Columns E y F. <strong>Table of differences.</strong> Column B shows the codes of identical sequences and Column C shows the code of the sequence selected for this study. <strong>Codes deleted. </strong>This tab contains the accession codes of the species and the species name but contains no details on the properties of the neuropeptide precursors. <strong>Sequences Paper</strong>. Neuropeptide sequences reported in previous studies that were later reported in the GenBank database. The sequences marked with asterisks have not been previously reported in public databases. The codes used in this study to designate the sequences are also included. <strong>Keywords. </strong>Keywords used to conduct the GenBank database searches to obtain the members of each neuropeptide family. <strong>R4: <em>In silico</em> validation, alignments, and phylogenetic relationships</strong> Generated phylogenetic trees and results obtained from individual runs for each of the neuropeptide families with the DNA-LM and Kalign parameters using the IQ-TREE software. The folder named “<em>RUN</em>” contains the “<em>DNALM and kalign 2.0 default parameters</em>” subfolder. Both folders contain 11 subfolders with the names of each of the neuropeptide families, as well as the results obtained with the IQ-TREE software. The folder named “<em>Trees</em>” contains the folder “<em>DNALM and kalign 2.0 default parameters</em>” containing the phylogenetic trees for each of the neuropeptide families, which were created with the Itol software. <strong>R5: BLAST alignment of the virtual peptide precursors</strong> Results of the BLAST alignment of the virtual peptides described by Martinez-Perez et al. (2007) with respect to the sequences in the GenBank database. The files follow the same nomenclature as in the folder named “<em>Carpeta 02 BLAST VP</em><strong>”</strong> in Repository 1. <strong>R6: Alignment of neuropeptide precursors</strong> “<em>DNALM and Kalign 2.0 default parameter</em>” folders. Each of these folders contains the alignments of the examined neuropeptide precursors from each family and each folder is named after the corresponding neuropeptide. The remaining files contain the alignments in ascending order in the evolutionary scale and are appropriately named after the corresponding neuropeptide. The file named “<em>All Sequence FASTA</em>” contains the sequences used in our study in FASTA format. <strong>R7: Phylogenetic clustering of the precursors </strong> “<em>DNALM and Kalign 2.0 default parameter</em>” folders. Both folders contain the phylogenetic tree clustering results from Supplementary Material 6, which were obtained using the DNA-LM y Kalign parameters and the IQ-TREE software. All analyses were conducted using the GUANE-1 supercomputer (Universidad Industrial de Santander). The phylogenetic clustering results of all of the precursors are contained in the folders with the respective precursor name. The folder also contains Figure 6, which was included in our main manuscript. Additionally, a folder entitled "Orthofinder and Robinson-Foulds" is included, which corresponds to the analyses carried out for: the Robinson-Foulds metric and the Orthofinder software.
Tópico:
Genetics and Neurodevelopmental Disorders
Citaciones:
0
Citaciones por año:
No hay datos de citaciones disponibles
Altmétricas:
0
Información de la Fuente:
FuenteZenodo (CERN European Organization for Nuclear Research)