The lossless compression problem consists in implementing the encoding (uniquely decodable) of an alphabet, which assigns to each string of symbols of the alphabet the shortest possible code length. Finding this smallest representation of data can save costs in storage space, data transfer time, or number of data processing operations within a computer. This makes lossless compression a reasonable goal in Computer Science and represents a significant challenge during the development of many technological solutions. Information theory, for its part, has established the necessary mathematical formalism for the study of quantitative measures of information such as Shannon entropy and has found its place within the implementation of lossless compression by providing some of the theoretical tools necessary for the study of models that describe data sources in coding theory. On the other hand, the close relationship that has been found between information theory and lossless compression theory has motivated many authors to devise ways to measure the information in them through file compression. This has resulted in interesting applications of lossless compression in machine learning, particularly in the classification of texts written in natural language or DNA strings. In this paper, a monographic review is presented about how information theory is applied to lossless compression. For this, some of the implementations of lossless compression in code theory and their respective analysis are presented. The proofs, graphs, algorithms, and implementations in this paper generalize some of the most important facts about binary encodings that have been stated in the literature, to the general case of alphabets of arbitrary sizes. Finally, an application of lossless compression to automatic machine learning is presented, for the classification of natural language, through the application of the LZ77 coding algorithm to estimate some information measures well known in the literature, which are used as a distance metric to compare the languages with each other. The result of the classification is presented in the form of phylogenetic trees of natural language