In order to understand how biological information within DNA and RNA biosignals is preserved, several works have proposed that these biosequences can be identified as codewords of BCH error correcting codes over GF(4) (F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sub> ). The current mathematical tool, used to perform such identification, has some limitations: one of them being the need of knowing all binary primitive polynomials and the other one is essentially due to the constraint imposed by the cyclic code construction leading to a restriction regarding the nucleotide sequence lengths. In this work, a novel algorithm for identifying odd-length nucleotide sequences as codewords of BCH codes over GF(4) is presented. As a result more than 270 cDNA sequences of nine different lengths, which could not be considered by the previous algorithm, are identified as codewords of 35 BCH codes.
Tópico:
Machine Learning in Bioinformatics
Citaciones:
1
Citaciones por año:
Altmétricas:
0
Información de la Fuente:
Fuente2022 IEEE International Symposium on Information Theory (ISIT)