For this activity, a module based on a Decision Tree (DT) was implemented, usable for both classification and regression tasks. The module takes as input a dataset composed of numerical and categorical variables and builds a model capable of predicting the value of a target variable by learning simple decision rules inferred from the data characteristics.
Decision trees are well-known, intuitive machine learning algorithms suitable for classification and regression tasks. They represent decisions and their outcomes through a tree structure: each node corresponds to a decision or test on a specific feature, each branch represents the outcome of that decision, and each leaf contains the class label (for classification) or the predicted value (for regression).
Assuming a dataset described by four input variables, X1, X2, X3, and X4, the regression tree constructed by the module iteratively partitions the feature space into homogeneous intervals relative to the target variable, allowing interpretable modeling of the relationships between input variables and the outcome. The following figure shows an example of a regression tree generated on the dataset.