Apriori approach to graph-based clustering of text documents
dc.contributor.advisor | Chairperson, Graduate Committee: Rafal A. Angryk | en |
dc.contributor.author | Hossain, Mahmud Shahriar | en |
dc.date.accessioned | 2013-06-25T18:37:52Z | |
dc.date.available | 2013-06-25T18:37:52Z | |
dc.date.issued | 2008 | en |
dc.description.abstract | This thesis report introduces a new technique of document clustering based on frequent senses. The developed system, named GDClust (Graph-Based Document Clustering) [1], works with frequent senses rather than dealing with frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and uses an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. We propose a novel multilevel Gaussian minimum support strategy for candidate subgraph generation. Additionally, we introduce another novel mechanism called Subgraph-Extension mining that reduces the number of candidates and overhead imposed by the traditional Apriori-based candidate generation mechanism. GDClust utilizes an English language thesaurus (WordNet [2]) to construct document-graphs and exploits graph-based data mining techniques for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose. | en |
dc.identifier.uri | https://scholarworks.montana.edu/handle/1/1506 | en |
dc.language.iso | en | en |
dc.publisher | Montana State University - Bozeman, College of Engineering | en |
dc.rights.holder | Copyright 2008 by Mahmud Shahriar Hossain | en |
dc.subject.lcsh | Document clustering | en |
dc.subject.lcsh | Algorithms | en |
dc.subject.lcsh | Graphic methods | en |
dc.subject.lcsh | Graph theory | en |
dc.subject.lcsh | Computer programs | en |
dc.title | Apriori approach to graph-based clustering of text documents | en |
dc.type | Thesis | en |
thesis.catalog.ckey | 1327464 | en |
thesis.degree.committeemembers | Members, Graduate Committee: John Paxton; Hunter Lloyd | en |
thesis.degree.department | Computer Science. | en |
thesis.degree.genre | Thesis | en |
thesis.degree.name | MS | en |
thesis.format.extentfirstpage | 1 | en |
thesis.format.extentlastpage | 67 | en |
Files
Original bundle
1 - 1 of 1