Apriori approach to graph-based clustering of text documents

dc.contributor.advisorChairperson, Graduate Committee: Rafal A. Angryken
dc.contributor.authorHossain, Mahmud Shahriaren
dc.date.accessioned2013-06-25T18:37:52Z
dc.date.available2013-06-25T18:37:52Z
dc.date.issued2008en
dc.description.abstractThis thesis report introduces a new technique of document clustering based on frequent senses. The developed system, named GDClust (Graph-Based Document Clustering) [1], works with frequent senses rather than dealing with frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and uses an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. We propose a novel multilevel Gaussian minimum support strategy for candidate subgraph generation. Additionally, we introduce another novel mechanism called Subgraph-Extension mining that reduces the number of candidates and overhead imposed by the traditional Apriori-based candidate generation mechanism. GDClust utilizes an English language thesaurus (WordNet [2]) to construct document-graphs and exploits graph-based data mining techniques for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/1506en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2008 by Mahmud Shahriar Hossainen
dc.subject.lcshDocument clusteringen
dc.subject.lcshAlgorithmsen
dc.subject.lcshGraphic methodsen
dc.subject.lcshGraph theoryen
dc.subject.lcshComputer programsen
dc.titleApriori approach to graph-based clustering of text documentsen
dc.typeThesisen
thesis.catalog.ckey1327464en
thesis.degree.committeemembersMembers, Graduate Committee: John Paxton; Hunter Lloyden
thesis.degree.departmentComputer Science.en
thesis.degree.genreThesisen
thesis.degree.nameMSen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage67en

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
HossainM0508.pdf
Size:
433.31 KB
Format:
Adobe Portable Document Format
Copyright (c) 2002-2022, LYRASIS. All rights reserved.