Prof. Dr. Illia Horenko
Institute of Computational Science
Via Giuseppe Buffi 13
+41 58 666 4123
A central scientific issue in our work programme, which recurs in several of the individual projects, is the unbiased characterization of observation, measurement, and simulation data. Over the past several years, Prof. Horenko has developed non-parametric, non-stationary, non-homogeneous data analysis techniques which, in our view, belong to the most advanced methodologies in this field. Moreover, besides having introduced some fundamentally new techniques, he has also tested them against and applied them to real-life data from a range of application areas that are part of the CRC 1114, such as Meteorology and Bio-Informatics, with related publications in high-ranking journals. He has already accounted for the challenges that arise from the shear amount of data that have to be processed in real-life applications to obtain robust and credible results, and his group has generated high-performance ready implementations of these data analysis algorithms.
Prof. Horenko's FEM-BV family of time series analysis techniques allows for systematic time dependent model identification when assumptions of stationarity or homogeneity of some underlying statistics are not justifiable. Finite Element Methods are employed in the numerical representation of indicator functions for the space-time domains of applicability of different models from a common model class. These indicators are regularized using a Bounded Variation constraint, hence the acronym FEM-BV. The choice of the model class from which to select the individual models in each of the regimes depends on the type of data considered. Implemented versions include Vector Auto-Regressiv models with eXternal influences (VARX) with finite memory depth, K-means for geometric clustering of continuous data, Empirical Orthogonal Function (EOF) decompositions for model reduction of continuous data in high-dimensional vector spaces, Markov/Bernoulli models for discrete (categorical) data, and Generalized Extreme-Value distributions (GEV) for regression analysis with emphasis on extreme event characteristics.
The number of different spatio-temporal regimes, the model parameters to be chosen within these regimes, such as memory depth and number of EOFs, and the indicator functions signalling activation of the respective models are all determined simultaneously in a global optimization procedure. This yields a judicious compromise between low residuals in reproducing the data of a training set on the one hand, and the demand for the smallest-possible overall number of free parameters of the complete model on the other. The optimization is based on a new non-parametric modified Akaike Information Criterion (mAIC) and may be interpreted as a constructive implementation of ``Occam's Razor''. By addressing directly a scalar model error functional to characterize the model-data-distance, the optimization problem remains solvable in high dimensions. Versions of this methodology have been applied successfully to a variety of data from different application areas.
Project cooperations with the Mercator Fellow:
- Within Project A01, an appropriate version of these techniques will serve as an independent, data-based method for optimizing hierarchical multi-scale stochastic precipitation models, and for the quantitative data-based evaluation of the project's hypotheses and theoretical derivations.
- In most recent work, the model identification procedures have been generalized for successive incorporation of new data as they become available in the course of time. This extension is based on Bayesian learning ideas and complements the framework of the Data Assimilation project A02. Here, Prof. Horenko's techniques could be used to represent and assimilate the influences of possible non-observed external influences, which materialize in the FEM-BV family of models as regime changes in the identified FEM-BV indicator functions.
- Project B01 will generate a wealth of three-dimensional displacement fields from laboratory ``earthquakes''. The FEM-BV-VARX techniques, in combination with model reduction in terms of spatial patterns, e.g., through EOF-decompositions, will allow for a detailed characterization of these data that goes considerably beyond what is currently available in this laboratory setting. At the same time, this methodology will be provide insights into connections between the measured three-dimensional fields and displacements measured at the surface. This is important as only surface displacements can directly be measured out in the field. The three-dimensional displacements under an observed surface are ``non-observed degrees of freedom'' in the sense of the FEM-BV technology, and their influence is reflected in potential model regime changes. In conjunction with the three-dimensional laboratory measurements, there is a unique opportunity to establish a direct, quantifiable connection between such three- dimensional processes and the surface displacements.
- Project B04 investigates the compact representation of complex data using tensor-product decompositions, considering direct numerical simulations of turbulent flows and experimental as well as simulation data from project B01 (see above). There are two routes of fruitful developments in conjunction with Prof. Horenko's data analysis techniques in this project. The first route simply consists in a mutual benchmarking of the data compression capability of the tensor product decompositions with what is achievable using Prof. Horenko's EOF-based multiple-regime representations. The second route of development involves extending the data analysis technology by incorporating tensor product representations in the data representation ansatz. Tensor product decompositions could replace the EOF-based decompositions in cases where the data reveal scale self-similarities.
- Prof. Horenko will also guide the data-based development of triggering mechanisms for unstable updraftes in Project C06.