Content-based recommendation via topic modeling and social network analysis
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Montana State University - Bozeman, College of Engineering
Abstract
The exponential growth of digital information and scholarly output has increased the need for intelligent systems that can identify relevant, interpretable, and equitable connections among entities. Traditional content-based recommender systems focus on item similarity but often overlook the relational structures that govern how knowledge and expertise are organized. This dissertation advances content-based recommendation by integrating topic modeling and social network analysis into a unified framework that represents semantic similarity as a network. The framework models relationships among entities through their topical proximity, enabling recommendations that are not only accurate but also transparent and structurally grounded. The research is organized around four interrelated questions. The first investigates how topic modeling can be combined with network analysis to construct topic-based collaboration graphs that reveal latent research communities. By transforming document-level topic distributions into author-level profiles, this approach defines weighted network edges through topical similarity, producing networks that balance cohesion and diversity. The second question extends this representation to hierarchical community detection, introducing Nested Hierarchical Louvain (NH-Louvain) and Spectral Hierarchical Agglomerative Clustering (Spectral-HAC). These methods uncover multilevel community structures, allowing recommendations to operate at different granularities, from tightly focused collaborators within a subcommunity to broader interdisciplinary groups. The third question addresses a broader data imbalance problem, demonstrated through the case of publication imbalance, where prolific authors dominate the content space and bias recommendation outcomes. A cloning-based strategy was developed to represent such authors by multiple topical instances, each reflecting a distinct research direction. Clone- LDA and Clone-BERT variants reduce dominance effects, improve thematic diversity, and enhance background representation in the generated networks. The fourth question evaluates the framework's accuracy, stability, and its content-based explainability, assessed through hold-out and perturbation experiments. Results show that topic-based similarity remains stable under missing information and that hierarchical and cloned models yield balanced, semantically coherent communities. To operationalize these findings, the ScholarNode prototype system was developed, providing an interactive, explainable interface that links recommendations to their underlying topical and community evidence. Together, these contributions establish a principled foundation for topic-driven, network- aware recommender systems. The integrated framework advances understanding of how semantic and relational information interact, while the ScholarNode implementation shows its practical feasibility. Beyond the scholarly domain, the same design principles, representing content similarity as a network, detecting hierarchical communities, addressing imbalance, and supporting content-based explanations, can generalize to other content-rich environments. This research thus lays the groundwork for recommender systems that emphasize clarity, diversity, and balanced representation.