Framework for creating large-scale content-based image retrieval system (CBIR) for solar data analysis
With the launch of NASA's Solar Dynamics Observatory mission, a whole new age of high-quality solar image analysis was started. With the generation of over 1.5 Terabytes of solar images, per day, that are ten times higher resolution than high-definition television, the task of analyzing them by scientists by hand is simply impossible. The storage of all these images becomes a second problem of importance due to the fact that there is only one full copy of this repository in the world, therefore an alternate and compressed representation of these images is of vital importance. Current automated image processing approaches in solar physics are entirely dedicated to analyze individual types of solar phenomena and do not allow researchers to conveniently query the whole Solar Dynamics Observatory repository for similar images of their interests. We developed a Content-based Image Retrieval system that can automatically analyze and retrieve multiple different types of solar phenomena, this will fundamentally change the way researchers look for solar images in a similar way as Google changed the way people searched the internet. During the development of our system, we created a framework that would allow researchers to tweak and develop their own content-based image retrieval systems for different domain-specific applications with great ease and a deeper understanding of the representation of domain-specific image data. This framework incorporates many different aspects of image processing and information retrieval such as: image parameter extraction for reduced representation of solar images, image parameter evaluation for validation of image parameters used, evaluation of multiple dissimilarity measures for more accurate data analysis, analyses of dimensionality reduction methods to help reduce storage and processing costs, and indexing and retrieval algorithms for faster and more efficient search. The capabilities of this framework have never been available together as an open source and comprehensive software package. With these unique capabilities, we achieved a higher level of knowledge of our solar data and validated each of our steps into the creation of our solar content-based image retrieval system with an exhaustive evaluation. The contributions of our framework will allow researchers to tweak and develop new content-based image retrieval systems for other domains (e.g astronomy, medical field) and will allow the migration of astrophysics research from the individual analysis of solar phenomenon into larger-scale data analyses.