When data arrive as curves: an overview of Functional Data Analysis methods in oceanography

David Nerini, Aix-Marseille University, Mediterranean Institute of Oceanography, Marseille, France, Etienne Pauthenet, Sorbonne Université, LOCEAN‐IPSL, CNRS/IRD/MNHN, Paris, France, Pascal Monestiez, INRA Institut National de la Recherche Agronomique, Avignon, France, Christophe Guinet, Centre d’Etudes Biologiques de Chizé (CEBC), UMR 7372 Université de la Rochelle-CNRS, Villiers en Bois, France, Fabien Roquet, University of Gothenburg, Department of Marine Sciences, Gothenburg, Sweden, Madec Gurvan, LOCEAN-IPSL, CNRS/IRD/MNHN/Sorbonne Université, Paris, France, Frédéric Ménard, IRD, France, Christophe Menkes, IRD/LOCEAN, Nouméa, New Caledonia and Arnaud Bertrand, Institut de Recherche pour le Développement (IRD), France
These last two decades have seen a massive increase in in-situ observing data acquired at very high frequency in many areas of environmental science, especially in oceanography. This emergence of new data is strongly related to the progress made in the miniaturization of electronics and in computing, both in terms of storage capacity and energy efficiency of sensors. New observing devices by their size, their autonomy in energy, operate in unprecedented situations and have opened the way to new experiences that were unimaginable in the era of mechanical instruments. One common feature shared by these data is that observations often arrive as sampled profiles. Nowadays, with the improvement of observing technologies, ocean science community faces to massive multivariate datasets constituted with heterogeneous data (missing values, sparse datasets, varying sampling design, ...) which make data analysis and synthesis a challenging task. Functional Data Analysis (FDA) refers to the branch of statistics which develops methods for analyzing datasets of variables indexed along a continuum (time, depth, frequency, : : :). The theoretical efforts in FDA have led to generalize the usual multivariate methods (PCA,CCA, linear model, kriging, non-parametric statistics, ...) when data lie in functional spaces. Surprisingly, few applications have been conducted in oceanography even if most data in environmental sciences can be handled as curves or profiles. This work proposes a general overview of FDA methods through many examples of big data analysis (output profiles from numerical ocean general circulation models, fishery acoustic profiles, hydrographic profiles sampled from equipped elephant seals). After mentioning frequently encountered issues when dealing with functional data, we will show how the use of functional methods owns many advantages such as 1) including information of curve shape in a statistical analysis, 2) sweeping out the variability of sampling devices by smoothing steps, 3) fixing sampling design problems by constructing continuous data 4) Dealing with massive and complex datasets using adequate dimension reduction methods.