Decomposable neuro symbolic regression with uncertainty awareness

dc.contributor.advisorChairperson, Graduate Committee: John Shepparden
dc.contributor.authorMorales Luna, Giorgio L.en
dc.date.accessioned2025-12-24T19:59:31Z
dc.date.available2025-12-24T19:59:31Z
dc.date.issued2025en
dc.description.abstractOne of the fundamental goals of science is to discover laws that provide causal explanations for the observable world. Such discoveries may stem from distilling experimental data into analytical equations that allow interpretation of their underlying natural laws. This process is known as equation learning or symbolic regression (SR). However, most SR methods prioritize minimizing prediction error over identifying the governing equations, often producing overly complex or inaccurate expressions. Notably, they struggle to identify the functional form that explains the relationship between each variable and the system's response. To address this challenge, this dissertation presents a decomposable SR method that generates interpretable multivariate expressions by leveraging transformer models, genetic algorithms (GAs), and genetic programming (GP). In particular, our interpretable SR method distills a trained "opaque" regression model1 into mathematical expressions that serve as explanations of its computed function. It employs a Multi-Set Transformer model to generate multiple univariate symbolic skeletons that characterize how each variable influences the opaque model's response. The performance of the generated skeletons is evaluated using a GA-based approach to select a subset of high-quality candidates before incrementally merging them via a GP-based cascade procedure that preserves their original skeleton structure. The final multivariate skeletons undergo coefficient optimization via a GA. We evaluated our method on problems with controlled and varying degrees of noise, demonstrating lower or comparable interpolation and extrapolation errors compared to two GP-based and two neural SR methods. Unlike these methods, our approach consistently learned expressions that matched the original mathematical structure. Complementing this effort, we explore the role of uncertainty quantification in enhancing symbolic model reliability. We investigate the use of prediction interval-generation neural networks to model total and potential epistemic uncertainty, and introduce an adaptive sampling strategy designed to minimize it. By integrating an uncertainty-aware sampling process guided by Gaussian process surrogates, we aim to reduce uncertainty not only in model predictions but also in the symbolic expressions extracted from them. This broader perspective highlights the importance of uncertainty awareness in SR, especially when symbolic models are intended for decision-making under limited or costly experimentation, such as in precision agriculture and other scientific domains.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/19441en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2025 by Morales Luna, Giorgio Luigien
dc.subject.lcshRegression analysisen
dc.subject.lcshGenetic programming (Computer science)en
dc.subject.lcshGenetic algorithmsen
dc.subject.lcshArtificial intelligenceen
dc.subject.lcshUncertaintyen
dc.titleDecomposable neuro symbolic regression with uncertainty awarenessen
dc.typeDissertationen
mus.data.thumbpage130en
thesis.degree.committeemembersMembers, Graduate Committee: Joseph A. Shaw; Matthew Revelle; Sean Yawen
thesis.degree.departmentComputingen
thesis.degree.genreDissertationen
thesis.degree.namePhDen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage261en

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
morales-luna-decomposable-2025.pdf
Size:
29.21 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
825 B
Format:
Item-specific license agreed upon to submission
Description: