MURAL - Maynooth University Research Archive Library



    Extensions to Bayesian tree-based machine learning algorithms


    Prado, Estevão B. (2022) Extensions to Bayesian tree-based machine learning algorithms. PhD thesis, National University of Ireland Maynooth.

    [thumbnail of Thesis_Estevao_Batista.pdf]
    Preview
    Text
    Thesis_Estevao_Batista.pdf

    Download (4MB) | Preview

    Abstract

    Bayesian additive regression trees (BART) is a Bayesian tree-based algorithm which can provide high predictive accuracy in both classification and regression problems. Unlike other machine learning algorithms based on an ensemble of trees, such as random forests and gradient boosting, BART is not based on recursive partitioning. Rather, it is a fully Bayesian model built upon a likelihood function and diligently specified prior distributions. In this thesis, we propose methodological extensions to BART to deal with two main limitations of tree-based methods: the limited ability to fit smooth functions, which is inherently associated with how methods based on trees are built, as well as the lack of adequate mechanisms that enable to quantify in an interpretable fashion the impact of certain inputs of primary interest on the output. Firstly, we present an extension that aims to deal with linear effects at the terminal nodes level. By considering linear piecewise functions instead of piecewise constants, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Secondly, motivated by an agricultural application, we develop a semi-parametric BART model in which marginal genotypes and environment effects are estimated along with their interactions. Last, motivated by data collected in 2019 under the seventh cycle of the quadrennial Trends in International Mathematics and Science Study, we extend semiparametric models based on BART, which generally assume that the set of covariates in the linear predictor and the BART model are mutually exclusive, to account for shared covariates. In particular, we change the tree-generation moves in BART to deal with bias/confounding between the parametric and non-parametric components, even when they have covariates in common.
    Item Type: Thesis (PhD)
    Keywords: Extensions; Bayesian; tree-based; machine learning; algorithms;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 17285
    Depositing User: IR eTheses
    Date Deposited: 06 Jun 2023 15:02
    URI: https://mu.eprints-hosting.org/id/eprint/17285
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads