Sampling Bounds for Topological Descriptors

Thumbnail Image

Date

2024-04

Journal Title

Journal ISSN

Volume Title

Publisher

Undergraduate Scholars Program

Abstract

Increasingly, topological descriptors like the Euler characteristic curve and persistence diagrams are utilized to represent complex data. Recent studies suggest that a meticulously selected set of these descriptors can encode geometric and topological information about shapes in d-dimensional space. In practical applications, epsilon-nets are employed to sample data, presenting two extremes: oversampling, where epsilon is small enough to ensure a comprehensive representation but may lead to computational inefficiencies, and undersampling, where epsilon lacks a grounded rationale, offering faster computations but risking an incomplete shape description without theoretical guarantees. This research investigates phenomena of oversampling and undersampling, delving into their prevalence across synthetic and real-world datasets. It experimentally verifies excessive oversampling in theory-guided approaches and examines the implications of undersampling, shedding light on the behavior and consequences of both extremes. We establish lower bounds on the number of descriptors required for exact encodings and explore the trade-offs associated with undersampling, contributing insights into the potential information loss and the resulting impact on the overall shape representation.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright (c) 2002-2022, LYRASIS. All rights reserved.