GC13H-1253
Interactively Improving Agricultural Field Mapping in Sub-Saharan Africa with Crowd-Sourcing and Active Learning

Monday, 14 December 2015
Poster Hall (Moscone South)
Stephanie R Debats, Lyndon D Estes and Kelly K Caylor, Princeton University, Princeton, NJ, United States
Abstract:
As satellite imagery becomes increasingly available, management of large image databases becomes more important for efficient image processing. We have developed a computer vision-based classification algorithm to distinguish smallholder agricultural land cover in Sub-Saharan Africa, using a group of high-resolution images from South Africa as a case study. For supervised classification, smallholder agriculture, with ambiguous patterns of small, irregular fields, requires a wide range of training data samples to adequately describe the variability in appearance. We employ crowd-sourcing to obtain new training data to expand the geographic range of our algorithm. A crowd-sourcing user is asked to hand-digitize the boundaries of agricultural fields in an assigned 1 kmimage. Yet random assignment of images to users could result in a highly redundant training data set with limited discriminative power. Furthermore, larger training data sets require a greater number of users to hand-digitize fields, which increases costs through crowd-sourcing engines like Amazon Mechanical Turk, as well as longer algorithm training times, which increases computing costs.

Therefore, we employ an active learning approach to interactively select the most informative images to be hand-digitized for training data by crowd-sourcing users, based on changes in algorithm accuracy. We investigate the use of various image similarity measures used in content-based image retrieval systems, which quantify the distance, such as Euclidean distance or Manhattan distance, between a variety of extracted feature spaces to determine how similar the content of two images are. We determine the minimum training data set needed to maximize algorithm accuracy, as well as automate the selection of additional training images to classify a new target image that expands the geographic range of our algorithm.