Large-Scale Image Analytics Using Deep Learning
Friday, 19 December 2014
High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The AWS core components that we use to solve this problem are DynamoDB along with S3 for database query and storage, ElastiCache shared memory architecture for image segmentation, Elastic Map Reduce (EMR) for image feature extraction, and the memory optimized Elastic Cloud Compute (EC2) for the learning algorithm.