IN54A-01
Forget the hype or reality. Big data presents new opportunities in Earth Science.
Abstract:
Earth science is arguably one of the most mature science discipline which constantly acquires, curates, and utilizes a large volume of data with diverse variety. We deal with big data before there is big data. For example, while developing the EOS program in the 1980s, the EOS data and information system (EOSDIS) was developed to manage the vast amount of data acquired by the EOS fleet of satellites. EOSDIS continues to be a shining example of modern science data systems in the past two decades.With the explosion of internet, the usage of social media, and the provision of sensors everywhere, the big data era has bring new challenges. First, Goggle developed the search algorithm and a distributed data management system. The open source communities quickly followed up and developed Hadoop file system to facility the map reduce workloads. The internet continues to generate tens of petabytes of data every day. There is a significant shortage of algorithms and knowledgeable manpower to mine the data. In response, the federal government developed the big data programs that fund research and development projects and training programs to tackle these new challenges.
Meanwhile, comparatively to the internet data explosion, Earth science big data problem has become quite small. Nevertheless, the big data era presents an opportunity for Earth science to evolve. We learned about the MapReduce algorithms, in memory data mining, machine learning, graph analysis, and semantic web technologies. How do we apply these new technologies to our discipline and bring the hype to Earth? In this talk, I will discuss how we might want to apply some of the big data technologies to our discipline and solve many of our challenging problems. More importantly, I will propose new Earth science data system architecture to enable new type of scientific inquires.