Highlights•Data science can be applied in agriculture to optimize biological data processing and management to improve decision-making.•Data science field includes management and raw data automation, data sets expansion, visualization tools, and analysis techniques.•In this work, we propose a step-by-step approach to proper and automated use of data science in agriculture.AbstractData science (DS) is one of the areas with the greatest versatility for application in any field of knowledge. It allows the optimization of different processes of daily life and permits the analysis of massive amounts of data. DS combines computer programming with mathematics and statistics tools in multiple environments such as Python, Julia, R, among others. Here, a protocol was proposed to use DS tools applied to the organization, visualization, and analysis of historical data in sugarcane production systems in the tropics as a basis to identify patterns associated with sucrose. The protocol consisted of four phases: (i) data collection and organization, (ii) data management, cleaning and incorporation of new variables, (iii) visualization tools, and (iv) analysis and modeling based on a multiapproach using a frequentist model (generalized lineal model), a regularized regression model (Lasso) and machine learning models (AutoML). Each of the phases was implemented using multiple algorithms and techniques to automate processes such as queries, numerical calculations, sorting, grouping, dividing, pivoting, totalizing, concatenation, cleaning, visualization, and fitting to models using the free Python software and libraries including Pandas, Numpy, Plotly, Matplotlib, SciPy, PySpark, Scikit-learn, Statsmode, among others. Each of the phases allowed the elimination of variables that obscured the analysis process by considering parameters such as Pearson correlation, exploratory analysis, and modeling. Important variables that offered value in the analysis were obtained, considering those variables related to the soil as those of minor contribution, and climatic variables as the most informative. Our results present an alternative to traditional analyzes in the agricultural sector, based on a step-by-step protocol for the responsible use of DS in the search to understand the behavior and temporal historical patterns of sucrose in sugarcane.