Social Media as Seismic Networks for the Earthquake Damage Assessment
Monday, 15 December 2014
The growing popularity of online platforms, based on user-generated content, is gradually creating a digital world that mirrors the physical world. In the paradigm of crowdsensing, the crowd becomes a distributed network of sensors that allows us to understand real life events at a quasi-real-time rate. The SoS-Social Sensing project [http://socialsensing.it/] exploits the opportunistic crowdsensing, involving users in the sensing process in a minimal way, for social media emergency management purposes in order to obtain a very fast, but still reliable, detection of emergency dimension to face. First of all we designed and implemented a decision support system for the detection and the damage assessment of earthquakes. Our system exploits the messages shared in real-time on Twitter. In the detection phase, data mining and natural language processing techniques are firstly adopted to select meaningful and comprehensive sets of tweets. Then we applied a burst detection algorithm in order to promptly identify outbreaking seismic events. Using georeferenced tweets and reported locality names, a rough epicentral determination is also possible. The results, compared to Italian INGV official reports, show that the system is able to detect, within seconds, events of a magnitude in the region of 3.5 with a precision of 75% and a recall of 81,82%. We then focused our attention on damage assessment phase. We investigated the possibility to exploit social media data to estimate earthquake intensity. We designed a set of predictive linear models and evaluated their ability to map the intensity of worldwide earthquakes. The models build on a dataset of almost 5 million tweets exploited to compute our earthquake features, and more than 7,000 globally distributed earthquakes data, acquired in a semi-automatic way from USGS, serving as ground truth. We extracted 45 distinct features falling into four categories: profile, tweet, time and linguistic. We run diagnostic tests and simulations on generated models to assess their significance and avoid overfitting. Overall results show a correlation between the messages shared in social media and intensity estimations based on online survey data (CDI).