Decomposable neuro symbolic regression with uncertainty awareness

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Montana State University - Bozeman, College of Engineering

Abstract

One of the fundamental goals of science is to discover laws that provide causal explanations for the observable world. Such discoveries may stem from distilling experimental data into analytical equations that allow interpretation of their underlying natural laws. This process is known as equation learning or symbolic regression (SR). However, most SR methods prioritize minimizing prediction error over identifying the governing equations, often producing overly complex or inaccurate expressions. Notably, they struggle to identify the functional form that explains the relationship between each variable and the system's response. To address this challenge, this dissertation presents a decomposable SR method that generates interpretable multivariate expressions by leveraging transformer models, genetic algorithms (GAs), and genetic programming (GP). In particular, our interpretable SR method distills a trained "opaque" regression model1 into mathematical expressions that serve as explanations of its computed function. It employs a Multi-Set Transformer model to generate multiple univariate symbolic skeletons that characterize how each variable influences the opaque model's response. The performance of the generated skeletons is evaluated using a GA-based approach to select a subset of high-quality candidates before incrementally merging them via a GP-based cascade procedure that preserves their original skeleton structure. The final multivariate skeletons undergo coefficient optimization via a GA. We evaluated our method on problems with controlled and varying degrees of noise, demonstrating lower or comparable interpolation and extrapolation errors compared to two GP-based and two neural SR methods. Unlike these methods, our approach consistently learned expressions that matched the original mathematical structure. Complementing this effort, we explore the role of uncertainty quantification in enhancing symbolic model reliability. We investigate the use of prediction interval-generation neural networks to model total and potential epistemic uncertainty, and introduce an adaptive sampling strategy designed to minimize it. By integrating an uncertainty-aware sampling process guided by Gaussian process surrogates, we aim to reduce uncertainty not only in model predictions but also in the symbolic expressions extracted from them. This broader perspective highlights the importance of uncertainty awareness in SR, especially when symbolic models are intended for decision-making under limited or costly experimentation, such as in precision agriculture and other scientific domains.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By