Issue 
Mechanics & Industry
Volume 25, 2024



Article Number  9  
Number of page(s)  12  
DOI  https://doi.org/10.1051/meca/2024001  
Published online  15 March 2024 
Regular Article
Empowering optimal transport matching algorithm for the construction of surrogate parametric metamodel
^{1}
PIMM Lab, Arts et Métiers Institute of Technology,
155 Boulevard de l’Hôpital,
75013
Paris,
France
^{2}
Safran Tech, Department of Digital Sciences and Technologies,1 rue des Jeunes Bois,
78117,
Châteaufort,
France
^{3}
CNRS@CREATE LTD, 1 Create Way, # 0801 CREATE Tower,
138602
Singapore
^{*} email: maurine.jacot@ensam.eu
Received:
10
August
2023
Accepted:
10
January
2024
Resolving Partial Differential Equations (PDEs) through numerical discretization methods like the Finite Element Method presents persistent challenges associated with computational complexity, despite achieving a satisfactory solution approximation. To surmount these computational hurdles, interpolation techniques are employed to precompute models offline, facilitating rapid online solutions within a metamodel. Probability distribution frameworks play a crucial role in data modeling across various fields such as physics, statistics, and machine learning. Optimal Transport (OT) has emerged as a robust approach for probability distribution interpolation due to its ability to account for spatial dependencies and continuity. However, interpolating in highdimensional spaces encounters challenges stemming from the curse of dimensionality. The article offers insights into the application of OT, addressing associated challenges and proposing a novel methodology. This approach utilizes the distinctive arrangement of an ANOVAbased sampling to interpolate between more than two distributions using a stepbystep matching algorithm. Subsequently, the ANOVAPGD method is employed to construct the metamodel, providing a comprehensive solution to address the complexities inherent in distribution interpolation.
Key words: Model order reduction / optimal transport / proper generalized decomposition / bayrcentric projection
© M. Jacot, et al., Published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Many problems in science and engineering can be modelled mathematically by a system of Partial Differential Equations (PDEs). Generally, solving numerically these equations requires a numerical method e.g. the Finite Element (FE) method which approximates the problem solution from its approximation in each element of the mesh covering the domain [1]. This process may be timeconsuming and computationally expensive, specifically when a new configuration associated with the problem parameters has to be evaluated.
One possible approach is to use interpolation to build offline a model that can rapidly calculate online a new solution of the problem at any given point in the parametric space, without requiring expensive computation resources. These techniques are already used in a wide range of scientific and engineering applications, such as parametrised Model Order Reduction techniques (pMOR) [2], because the ability to evaluate solutions in near real time leads to a more effective exploration of the parametric space.
However, interpolating in a highdimensional space faces the socalled curse of dimensionality [3]. Many methodologies exploiting sparsity assumptions have been proposed to alleviate these difficulties [4–6] and succeed in constructing highdimensional models from limited data.
Furthermore, the choice of a suitable technique for interpolation and the resulting performance is impacted by how data is represented and the purpose for which it is being used.
In the fields of physics, statistics, and machine learning, the framework of probability distributions is widely used for modelling and representing data by associating a probability measure to a collection of samples that form a solution [7]. This enables to describe and model the statistical characteristics of the data and to develop a probabilistic model for predicting and interpolating efficiently.
The continuous field description of a collection samples is viewed as a continuous probability density function assigning a probability density (also called weight) to each point of a regular grid that discretises the space (usually a Euclidean space). In the Finite Element method, the solution of a problem is typically represented by a continuous field (e.g. velocity, stress). The nodal values are defined at fixed points in space where the physical quantity is evaluated. The particle cloud description of data refers to describing the probability distribution by a set of particles characterised by a mass and a position. Figure 1 illustrates the two types of descriptions.
One of the most accurate and mathematically rigorous approaches to interpolate between distributions into the particle cloud description is to follow the movement of individual particles, within the Optimal Transport (OT) framework [8].
Optimal Transport, also known as the Wasserstein distance or transportation distance was formalised in 1781 by the French mathematician Gaspard Monge and is a mathematical framework that deals with the optimal way to transport mass from one probability distribution to another [9]. The goal is to find the most efficient way to transform one distribution into another while minimizing a certain cost or distance metric.
The basic idea is to consider the two probability distributions as piles of mass, and the transportation plan as a way to move mass from one pile to another. The cost of transporting mass from one point to another is typically given by a cost function, and the optimal transport problem involves finding the optimal way to move mass that minimizes the total cost [10, 11].
Despite the recent advances in the field, such as the entropy regularised Optimal Transport and more precisely, the SinkhornKnopp algorithm [12], solving an Optimal Transport problem can be computationally expensive and not well suited for an online approach [13]. First, if distributions are matched using the Optimal Transport theory in theirs continuous descriptions, it involves tackling an online transportation problem that requires solving partial differential equations. Thus, to decrease this complexity, distributions in their continuous form needs to be discretized into a collection of particles. However, the choice of the type of discretization can have a significant impact on the accuracy and efficiency.
A technique for discretization uses the Smoothed Particle Hydrodynamics (SPH) decomposition, as explained in [14]. In this method, all particles have the same mass, and their positions are established by solving an optimization problem. Contrastingly, an alternative method using Finite Element discretization eliminates the need for solving an optimization problem. In this setting, the positions of particles align with the locations in the continuous field within the Finite Element discretization, and the mass assigned to each particle is directly proportional to the magnitude of the continuous field at the corresponding particle location. Ultimately, even if the distributions are discretized, performing online Optimal Transport between collections of particles presents challenges, as it entails solving a discrete linear programming problem, which can be both numerically costly and statistically inefficient in an offlineonline approach [15].
In practice, to be able to interpolate between distributions, it is also common to consider more than two distributions simultaneously. The concept of interpolation between more than two distributions is relevant in various fields, including Machine Learning and Statistics. In these context, Euclidean distance is often used as a measure of dissimilarity or similarity between these distributions as is it applied in [16]. When interpolating between more than two distributions using the Wasserstein distance, the mathematical theory does not have a solid foundation. In fact, the goal is to define a matching between an arbitrary number of discretes densities where the distributions are discretized into collections of particles [17].
Some approaches have been developed to overcome this problem, in particular the Wassertein Barycenters [18] which aim to find a representative that lies “in between” a set of given collections of particles. However, this method is often computationally intractable and cannot ensure a fast real time (online) solution for new data points due to the complexity of the problem and its formulation and, moreover, it is difficult to operate in highly multidimensional settings.
The aim of this paper is to provide a reliable approach for realtime evaluations of distributions of a parametric problem by using Optimal Transport by constructing a parametric metamodel. Once the metamodel is constructed, it can be deployed for realtime use. This means that the model is capable of making predictions or providing insights on new data as it becomes available, without significant delays.
The approach proceeds in two phases: an offline training in the parametric space and an online application.
In the offline phase, we introduce a structured sampling represented as a crosscentred sampling that will explore the influence of each dimension of the parametric space 𝒟. Numerical simulations are performed for each point of the sampling (problem parameters), solving a PDE defined in a domain and subjected to initial and boundary conditions. To manipulate efficiently the model and apply transport, the data has to be transformed into a particle cloud description, i.e. expressed in the form of particles with a mass and position. Performing SPH decomposition indeed involves solving an optimization problem, which can be computationally demanding. However, despite this computational cost, the structure of the SPH decomposition is often more efficient than alternative methods.
To avoid the computational cost of discretizing each distribution in their continuous form following the SPH decomposition, we introduce a new methodology where only the first distribution is discretized following the SPH decomposition and the optimal matching is performed using a step by step matching algorithm in the proposed sampling where the others distributions are discretized based on the Finite Element localization and the matching problem is solved based on the use of the notion of barycentric projection [19].
By using this approach, the dimensionality of the matching problem is reduced from matching the distributions in pairs, rather than matching all distributions simultaneously. Finally, the parametric displacement field between solutions associated to sampling points is learned using a Model Order Reduction technique, the ANOVAPGD [6].
While the distinct arrangement of the sampling provides an opportunity to acquire knowledge of the parametric displacement field with reduced computational effort, it is possible that these sampling points may not cover the entire parametric space. This limitation can result in errors in the predictions made by the metamodel during online operation. To enhance the accuracy of the surrogate parametric metamodel, additional sampling points are generated using a Latin Hypercube Sampling (LHS) with associated solutions and then decomposed into particles. The difference between the projected positions of the particles and their actual positions will be used for computing a correction component. The surrogate parametric metamodel is then selfenriched by taking into account the correction. Testing points are then used to verify the accuracy of the corrected metamodel.
Finally, the selfenriched parametric surrogate parametric metamodel is used online to predict the particles’ position and therefore the solution of the problem for any point of the parametric space.
The paper is organised as follows. Section 2 first details in a nutshell the theory of Optimal Transport. In Section 3, the construction of the parametric surrogate parametric metamodel using the ANOVAPGD technique is detailed as well as its enrichment. The step by step matching algorithm using a structured sampling is presented. Section 4 presents the global methodology and the error evaluation. Two physical problems are introduced in the Section 5 and the proposed methodology is applied. Conclusion and remarks are finally discussed in Section 6.
Fig. 1 (left) Continuous field description of data associated with a discretised probability density (also called weight) for each point of the regular grid. (right) Particle cloud description associated with a set of particles (point clouds). 
2 Theory of optimal transport
The objective of this section is to revisit the mathematical and numerical aspects of Optimal Transport. We restrict to present only the fundamentals that will be discussed later and used in the proposed methodology. To obtain further information regarding the theory, readers can refer to: [20], [21], [22] to cite a few.
2.1 Monge’s problem
The Optimal Transport problem was first introduced in 1781 by the mathematician Gaspard Monge. His idea consists in finding the cheapest way to transport a pile of sand to a hole in the ground while minimising the distance that sand must travel.
We denote two independent and distributed discrete samples α and β on two compact metric spaces (𝒳, 𝒴) that correspond to the source and target distributions, respectively $\alpha ={\displaystyle \sum _{i=1}^{n}{a}_{i}}{\delta}_{{x}_{i}}$ and $\beta ={\displaystyle \sum _{j=1}^{m}{b}_{j}}{\delta}_{{y}_{j}}$. In our case 𝒳 and 𝒴 are both subspaces of a common larger metric space and are endowed with the same metric. The Monge Problem (MP) searches for a map T : {x_{1},x_{2}, …x_{n}} — {y_{1},y_{2}, ….y_{n}} that associates each point x_{i} to a single point y_{j}, while minimising the sum of the costs c(x_{i},y_{j}) = d(x_{i},y_{j})^{2}. T must satisfy the mass conservation $\forall j\in [m],{b}_{j}={\displaystyle \sum _{i:T\left({x}_{i}\right)={y}_{j}}{a}_{i}}$, where the pushforward measure T_{#}α is defined by ${T}_{\u266f}\alpha ={\displaystyle \sum _{i=1}^{n}{a}_{i}}{\delta}_{T\left({x}_{i}\right)}$. The Monge Problem can be formulated as a nonconvex optimisation problem:
$$\text{(MP)}M(\alpha ,\beta )={\mathrm{min}}_{{T}_{\#}=\beta}{\displaystyle \sum _{i=1}^{n}c}\left({x}_{i},T\left({x}_{i}\right)\right)\text{.}$$(1)
2.2 Kantorovich relaxation and linear programming problem
The Monge formulation defines an illposed problem, since sometimes there is no transport map to solve the constraint T_{#}α = β. The main idea of Kantorovich is the relaxation of the deterministic transport map. At any point x_{i}, the mass can be split and transported from a source towards several targets. Thus, we adopt a probabilistic transportation. Numerically, the Kantorovich relaxation leads to a strictly convex problem that is easier to solve. Instead of a map T, a transportation plan γ ∈ Γ(α,β) is used, where Γ(α,β) is the set of measure couplings between two discrete probability measures α and β. Specifically, γ : (𝒳, 𝒴) → ℝ_{+} is a function that satisfies ∫ γ(x, y)dxdy = 1, ∫ γ(x, •)dx = β and ∫ γ( •,y)dy= α. For the sake of simplicity, γ also refers to the discretised version of the transport plan, which belongs to ${\mathbb{R}}_{+}^{n\times m}$. Then, to define the optimal transport map γ* ∈ Γ(α, β), the Kantorovitch Problem (KP) reads:
$$(\text{KP}){\gamma}^{*}=\underset{\gamma \in \Gamma (\alpha ,\beta )}{\mathrm{argmin}}{\displaystyle \sum _{i=1}^{n}{\displaystyle \sum _{j=1}^{m}c}}\left({x}_{i},{y}_{j}\right){\gamma}_{i,j}.$$(2)
where c(x, y)dγ(x, y) is the transport cost associated with a transportation plan γ ∈ Γ(α, β) and can be interpreted as the work needed to move one unit of mass from x to y. The pWassertein distance for p ∈ [1, ∞[ measures similarity between distributions and is one natural way of defining the cost:
$${\mathcal{W}}_{p}(\alpha ,\beta )={\left({\mathrm{min}}_{\gamma \in \Gamma (\alpha ,\beta )}{\displaystyle \sum _{i,j}{\gamma}_{i,j}}d{\left({x}_{i},{y}_{j}\right)}^{p}\right)}^{1/p}$$(3)
The linear problem in equation (2) can be solved using the network simplex in 𝒪 ((n + m)nm log(n + m)) [23], such as proposed by the solver in open source Python Library POT [24] and will return the optimal transport plan γ* .
Fig. 2 Linear displacement interpolation between the source distribution α and the target distribution β. 
2.3 Interpolation using optimal transport
OT defines a distance 𝒲_{p} between two discrete probability measures α and β. To interpolate between two distributions, the minimisation of the cost yields the map T which matches each point of the source α to one in the target distribution ß. Then, the linear displacement interpolation is obtained by moving each mass along the line defined by (1 − t)x_{i} + tT(x_{i}) for t ∈ [0,1] as it is illustrated in the Figure 2. For 𝒲_{2}, the interpolation is called displacement interpolation [25].
Once the displacement interpolation between two measures is defined, it can be generalised to an interpolation between more measures using the concept of Wassertein barycenters as it is presented in [11].
2.4 Notion of barycentric projection
In the context of the Kantorovitch formulation, the map T is not defined, therefore an extra step is required to perform the interpolation, the socalled barycentric projection [19]. Once an optimal coupling γ* has been computed, one needs to map x_{i} from 𝒳 to 𝒴. For each source sample, the barycentric mapping is expressed as follows:
$${x}_{i}\in \mathcal{X}\mapsto \frac{1}{{\displaystyle \sum _{j}{\gamma}_{i,j}^{*}}}{\displaystyle \sum _{j}{\gamma}_{i,j}^{*}}{y}_{j}\in \mathcal{Y}$$(4)
3 Offline learning approach using optimal transport
In the fields of statistical and machine learning, Optimal Transport has gained significant attention due to its ability to compare, align and model probability distributions while underlying geometry and structure of data. It is therefore an appealing route to create parametric models when more traditional methods fail due to the presence of localised features in the data.
When dealing with many parameters, in order to obtain a precise approximation of the system, a sufficient number of sampling points generated from a N_{𝒟}dimensional sampling should be used. However, this leads to solving optimal transport with many distributions, and therefore to highdimensional matching problems. Learning transportation plans in this context is a challenge due to the curse of dimensionality. One effective approach to address this issue involves utilizing the concept of Wasserstein barycenter that aims to find a representative distribution that lies “in between” a set of given probability distributions. It can be thought of as a way to compute a central or average distribution that captures the essential characteristics of the input distributions. However, the computational complexity of this approach can be high, especially when dealing with high multidimensional settings and cannot ensure a fast real time (online) solution for new data points. Therefore, finding an offline representation of the solution to the OT problem and learning its parametric dependencies is essential to evaluate online a new configuration of the system. In addition, finding a strategy to efficiently compute transportation plans is crucial since, as the number of parameters and therefore of data samples increases, the traditional optimization methods become computationally expensive or even infeasible.
Motivated by such an idea, our offline learning approach follows a step by step matching algorithm based on the structure of an adequate sampling. Point clouds are matched successively by pairs based on the notion of barycentric projection. The distinct arrangement of the sampling provides a way to learn the displacement field between numerous sets of point clouds. Finally, we introduce a nonlinear regression technique, the ANOVAPGD, to learn the parametric model of the transportation plan. This technique is presented in this section.
3.1 Definition of a structured sampling
Computing parametric solutions based on surrogates (metamodels or response surfaces) requires to define a design of experiments (also called sampling) to work with. We consider the parametric space 𝒟 where N_{𝒟} parameters (also known as dimensions) are introduced, $p=\left({p}^{1},\dots ,{p}^{{N}_{\mathcal{D}}}\right)$. The chosen sampling is a centred cros that explores the influence of each dimension individually [6] and is composed of B sampling points, b ∈ 〚B〛 around the socalled anchor point p_{a}. Figure 3 shows an example of sampling with N_{𝒟} = 3 used to demonstrate our proposed methodology.
Fig. 3 Crosscentred sampling (DoE) considered as example with anchor point p_{α}. 
3.2 Step by step matching algorithm based on barycentric projection
Our approach relies on a stepbystep algorithm designed to address the challenge of estimating transportation plans in a highdimensional space. While the Optimal Transport framework, as explained in Section 2, is centered around matching two distributions, extending this matching to more than two distributions introduces increased complexity. Additionally, if the distributions are represented by a continuous field, discretizing them into a set of particles can be computationally expensive, especially when using Smoothed Particle Hydrodynamics (SPH) decomposition.
To mitigate these challenges, we operate within the defined parametric space 𝒟 and leverage a distinctive arrangement of sampling. The distributions at the anchor point are discretized following the SPH decomposition, while the other distributions are decomposed using the Finite Element discretization. Subsequently, each pair of point clouds is matched sequentially by solving a classical Optimal Transport problem. The results are then projected using a barycentric projection, facilitating interpolation between the two distributions. This principle is described in Figure 4.
We first provide some essential notation. Let B be a set of points defining a crosscentred sampling as shown in the Figure 3. B is decomposed into N_{c} overlapping subsets, n_{c} ∈ 〚N_{c}〛 which correspond to the number of branches of the cross. Then, each branch contains the associated N_{b} sampling points with the same anchor point, n_{b} ∈ 〚N_{b}〛 and an associated solution of a highfidelity problem represented by a continuous field, ${f}_{{n}_{b}}^{{n}_{c}}:{\Omega}_{N}\subset {\mathbb{R}}^{2}\to \mathbb{R}$ with ƒ_{0}, the solution at the anchor point. Each dimension of the problem contains N_{c} = 2 branches. Figure 5 shows the different parts that compose the sampling presented in the Figure 3 with the associated highfidelity solutions.
For every n_{c} ∈ 〚N_{c}〛, our step by step matching algorithm follows a foursteps procedure:
Highfidelity solutions: Highfidelity solutions are generated based on the resolution of PDEs with the FE method, for instance, on the points of the crosscentred sampling presented in Figure 5 which is expected to express the individual effect of each dimension. Solutions of the problem are denoted ${f}_{{n}_{b}}^{{n}_{c}}:{\Omega}_{N}\subset {\mathbb{R}}^{2}\to \mathbb{R}$ and correspond to the solution of each node j ∈ 〚N〛 of the mesh. The mesh is the same for all the solutions and the spatial coordinates of each node are denoted (x_{j}, y_{j}) and stored in a vector X as it is shown in the Figure 6. Then, solutions are normalised in the domain Ω, denoted as a distribution ${\phi}_{{n}_{b}}^{{n}_{c}}$ and correspond to the integral of the solution at each node of the mesh.

Discretisation into point cloud: The distribution at the anchor point φ_{0} is discretised based on the SPH decomposition, into a 2D point cloud and denoted ${\overline{\phi}}_{0}$. For the others distributions, we consider that particles are positioned on the FE points and theirs masses are proportional to the highfidelity solution at each point.
For that, we consider a numerical solution ƒ as ƒ : Ω ⊂ ℝ^{2} → ℝ^{+}. The solution is first normalised over the domain to transform features on a similar scale such that:
$$\phi =\frac{f}{I}\text{withI}={\displaystyle {\int}_{\Omega}f}d\Omega \text{.}$$(5)
Then, following the SmoothedParticle Hydrodynamics method (SPH), the continuous function is discretised in a set of points referred to as particles [17].
Indeed, the idea is to decompose φ into a sum of P Gaussian functions, i ∈ 〚P〛, centred on the coordinates of each particle, located at µ_{x} and µ_{y}, and stored in the vector µ ∈ ℝ^{(P×2)}. The sum of all the Gaussian functions should represent the approximated distribution φ such as:
$$\phi \approx \overline{\phi}={\displaystyle \sum _{i=1}^{P}{G}_{{\mu}_{i},\sigma}}(\text{x}),$$(6)
where ${G}_{{\mu}_{i},\sigma}$ is a 2DGaussian with the standard deviation σ (a model hyperparameter) and the mean µ_{i}. This decomposition is obtained by minimizing $\Vert \overline{\phi}\phi {\Vert}^{2}$
Transport plan: An optimal transport plan ${\gamma}^{*1,{n}_{c}}$ is computed by resolving the Kantorovitch problem in the equation (2) between the particle’s positions ${\mu}_{0}^{{n}_{c}}\text{and}{\phi}_{1}^{{n}_{c}}$ as is it presented in Figure 7.
Projection: Particles are projected from µ_{0} using the notion of barycentric projection following the methodology presented in Figure 4. The projection is performed between each pair of discretized distributions $\left({\mu}_{{n}_{b}}^{{n}_{c}},{\mu}_{{n}_{b}+1}^{{n}_{c}}\right)$ by computing the transport plan between ${\mu}_{{n}_{b}}^{{n}_{c}}\text{and}{\phi}_{{n}_{b}+1}^{{n}_{c}}$ iteratively.
Each point cloud representation of a distribution ${\phi}_{{n}_{b}}^{{n}_{c}}$ is characterised by µ_{x} and µ_{y} of each particle stored in the vector${\phi}_{{n}_{b}}^{{n}_{c}}$. A natural way to estimate the particles’ position ${\mu}_{{n}_{b}+1}^{{n}_{c}}$ would be to approximate it using the optimal transport plan ${\gamma}^{*{n}_{c},{n}_{b}}$ between the source represented by a point cloud ${\phi}_{{n}_{b}}^{{n}_{c}}$ and the target ${\phi}_{{n}_{b}+1}^{{n}_{c}}$ distributions represented by a continuous field. By relying on the notion of barycentric projection presented in the Section 2.4> the mapping of the ith component of ${\mu}_{{n}_{b}+1}^{{n}_{c}}$ which allows to reconstruct to the distribution ${\overline{\phi}}_{{n}_{b+1}}^{{n}_{c}}$ is approached as below and presented in Figure 8:
$${\left({\mu}_{{n}_{b}+1}^{{n}_{c}}\right)}_{i}=\frac{1}{{\displaystyle \sum _{j=1}^{m}{\gamma}_{ij}^{*{n}_{c},{n}_{b}}}}{\displaystyle \sum _{j=1}^{m}{\gamma}_{ij}^{*{n}_{c},{n}_{b}}}{(X)}_{j}$$(7)
where X is the vector that contains the spatial coordinates of the mesh (x_{j},y_{j}) as shown in Figure 6 and ${\gamma}^{*{n}_{c},{n}_{b}}$ is the transportation plan between ${\mu}_{{n}_{b}}^{{n}_{c}}\text{and}{\phi}_{{n}_{b}+1}^{{n}_{c}}\text{.}$.
Then, the fourstep procedure is repeated for each part n_{c} of the sampling. At the end, we obtain the particles’ positions µ_{B} from all the B distributions corresponding to the sampling points presented in the Figure 3. The particles’ positions allow to recover the distribution φ_{B} using equation (6). The number of particles P and the standard deviation of each particle σ are fixed from the decomposition of the first distribution φ_{0} and are the same for all the distributions.
Fig. 4 (left) Optimal matching between two distributions represented by point clouds. (middle) Barycentric projection focusing in one matched particle on the first point cloud. (left) Particles optimal matching following the barycentric projection that allow to interpolate between these two collections of particles. 
Fig. 6 Example of a element mesh with associated spatial coordinates. 
3.3 Construction of the surrogate parametric metamodel
Realtime applications require fast and accurate computations, often with limited resources. Hence, the parametric displacement field between points in the sampling is learned using nonintrusive Model Order Reduction techniques. We ensure to find a sparse approximation that will allow estimating the particles’ parametric displacement, to be able to represent the solution for any parameter choice without having to compute the associated solution.
Fig. 7 Optimal matching between two point clouds (collections of particles.) 
Fig. 8 Projection of particles ${\mu}_{{n}_{b}+1}^{{n}_{c}}\text{from}{\mu}_{{n}_{b}}^{{n}_{c}}$ using the notion of barycentric projection proposed in equation (7). 
3.3.1 Model Order Reduction using ANOVAProper Generalized Decomposition (ANOVAPGD)
The ANOVAPGD consists in combining the Analysis of Variance and the sparse Proper Generalized Decomposition (sPGD), to approximate solutions which depend on multiple parameters.
ANOVAPGD works by decomposing the solution as a sum of functions defined with respect to the anchor point P_{a} [26] and then using interpolation to approximate the terms depending on a single dimension (parameter), defining N_{𝒟} onedimensional approximation problem. The difference between this approximation and the data, referred to as the residual, is then learned using a sparse PGD (sPGD).
We apply the ANOVAPGD to learn the particles’ positions stored in a vector µ which will allow to reconstruct distributions $\overline{\phi}$. Then, following the use of a sampling, it is possible to define a parametric vector µ(p) where p are the parameters of the problem in the parametric space 𝒟. The ANOVA decomposition of a function µ(p) is an orthogonal decomposition based on the analysis of variance and written as a sum of orthogonal functions [6]:
$$\begin{array}{r}\hfill \mu (p)=\mu \left({p}^{1},\dots ,{p}^{{N}_{\mathcal{D}}}\right)={\mu}_{0}+{\displaystyle \sum _{i=1}^{N\mathcal{D}}{\mu}_{i}}\left({p}^{i}\right)\\ \hfill +{\displaystyle \sum _{1\le {i}_{1},{i}_{2}\le {N}_{\mathcal{D}}}^{{N}_{\mathcal{D}}}{\mu}_{{i}_{1},{i}_{2}}}\left({p}^{{i}_{1}},{p}^{{i}_{2}}\right)+\dots +{\mu}_{1,\dots ,{N}_{\mathcal{D}}}\left({p}^{1},\dots ,{p}^{{N}_{\mathcal{D}}}\right)\end{array}$$(8)
The anchoredANOVA allows computing all the onedimensional functions from the DoE introduced in the Section 3.1 and presented in Figure 3. Then, the remaining terms of the ANOVA decomposition are grouped in the residual, function of all the variables, defined as:
$${\mu}_{RES}(p)=\mu (p){\mu}_{ANOVA}(p),$$(9)
where µ_{RES} is approximated with the Sparse Proper Generalized Decomposition [27] and µ_{ANOVA}(p) contains the first terms of the sum such as:
$${\mu}_{ANOVA}(p)={\mu}_{0}+{\displaystyle \sum _{i=1}^{{N}_{\mathcal{D}}}{\mu}_{i}}\left({p}^{i}\right),$$(10)
where the constant term µ_{0} is the point cloud at the anchor point p_{a}.
Once the model is built and µ can be obtained for any choice of the parameter, φ can be recovered from µ using the formula in (6). To recover f from φ, the total mass I(p) is learned with a different surrogate parametric metamodel following the same methodology (ANOVAPGD based regression).
Fig. 9 Example of enrichment of the crosscentred sampling points introduced in the Figure 3. The blue points are the additional sampling points from a LHS. 
3.3.2 Enrichment and correction of the parametric surrogate parametric metamodel
The question that arises now is, how to ensure that the surrogate parametric metamodel predicts particles’ position µ in new points of the parametric space which fall outside of the cross and how to correct it?
One possibility is to consider additional parameter configurations based on the use of a Latin Hypercube Sampling (LHS) with the associated highfidelity solution to selfenrich the parametric surrogate parametric metamodel. One example of LHS is depicted in Figure 9 and represented by the blue points. The sampling points are split into two sets: the training set B_{trαin}, b_{train} ∈ 〚B_{trαin}〛 that will allow to selfenrich the parametric model and the testing set B_{test}, b_{test} ∈ 〚B_{test}〛 in order to evaluate the approximation.
For each new sampling point, the parametric reduced order model trained with the crosscentred sampling predicts the particles’ position ${\widehat{\mu}}_{{b}_{\text{train}}}$ which allow reconstructing the distribution ${\widehat{\phi}}_{{b}_{t\text{tain}}}$. The transport plan between the predicted distribution and the reference one resulting from the normalised highfidelity model distribution${\phi}_{{b}_{train}}$ is calculated. The displacement related to this transport plan represents a correction to add to the predicted particles’ position${\widehat{\mu}}_{{b}_{\text{train}}}$. This approach is performed for every new point in the sampling. This correction is then learned as the residual of the ANOVAPGD presented in 3.3.1. A larger learning sampling points leads to a better enrichment and therefore more precise predictions from the model for new unseen configurations.
To sumpup, for each new parameter configuration b_{trαin} ∈ 〚B_{trαin}〛, the model is corrected following a fourstep process:
Enrichment: New highfidelity solutions are generated associated to the additional sampling points and normalised${\phi}_{{b}_{train}}$ on the domain Ω. Normalised solutions correspond to the weight at each node of the mesh X = (x_{j},y_{j}).
Prediction: The surrogate parametric metamodel introduced in the Section 3.3.1 (Eq. (10)) is used to predict ${\widehat{\phi}}_{{b}_{\text{train}}}$ based on the particles’ position ${\widehat{\mu}}_{{b}_{\text{train}}}$ of the Gaussian functions for each new simulation b_{train}.
Transport plan: An Optimal Transport plan is computed between the prediction of each distribution ${\widehat{\phi}}_{{b}_{\text{train}}}$ and the normalised highfidelity model distribution ${\phi}_{{b}_{\text{train}}}$ by solving the Kantorovitch problem (Eq. (2)).
Correction and training: Using the barycentric projection presented in equation (7), the prediction of the particles’ positions ${\widehat{\mu}}_{{b}_{t\text{tain}}}$ is corrected and denoted ${\widehat{\text{\mu}}}_{{b}_{\text{train}}}^{c}$. The parametric surrogate parametric metamodel is then selfenriched with ${\widehat{\text{\mu}}}_{{b}_{\text{train}}}^{c}$ from the LHS sampling.
In detail, this approach corrects the prediction of the initial surrogate parametric metamodel by adding an additional term known as a correction term to each particle’s displacement and denoted ${\left({\delta}_{{b}_{\text{train}}}\right)}_{i}$ for the ith particle and are obtained from the matching between the output of the initial ROM and the new highfidelity simulations. Enrichment is necessary to include the effect of the interaction between the parameters to the particles’ positions. The corrected particles’ position are obtained such as:
$${\left({\widehat{\mu}}^{c}{}_{{b}_{train}}\right)}_{i}={\left({\widehat{\mu}}_{{b}_{train}}\right)}_{i}+{\left({\delta}_{{b}_{train}}\right)}_{i},$$(11)
as illustrated in Figure 10.
4 Model overview and online evaluation
In this section, we first summarise the methodology presented so far to build the offline phase. Then, the online model is evaluated for the testing configurations and the errors are calculated.
4.1 Model overview
The construction of the parametric model is composed of two phases: an offline phase of the surrogate construction and its online evaluation. In the offline phase, the aim is to build a surrogate parametric metamodel by combining Optimal Transport and the ANOVAPGD technique to be able to predict online the solution for any set of parameters in the parametric space 𝒟.
The model is built by the following steps:
Highfidelity simulations: A database of highfidelity solutions f_{b} from the parametric space 𝒟 is obtained using a crosscentred sampling (DoE). Continuous solutions are normalised and denoted as distributions φ_{b}.
SPH decomposition and matching: The distribution at the anchor point φ_{0} is decomposed into P particles and, by dividing the sampling into subsets, a step by step matching is performed using Optimal Transport and barycentric projection. Then, the particles’ positions in each distribution b ∈ 〚B〛 are obtained and stored in the vector µ_{b}.
Surrogate parametric metamodel training: The ANOVAPGD technique is used to construct a surrogate parametric metamodel of the particles’ position. The inputs are the parameters values of the problem p (crosscentred sampling points) and the outputs are the predicted particles’ positions µ_{B} . The I_{b} norm used for the normalization of ƒ_{b} is also learned with another metamodel.
Surrogate parametric metamodel enrichment and correction: B_{trαin} additional sampling points are generated from a sparse sampling technique such as LHS. These new sampling points are used to run highfidelity simulations. They are then used with the surrogate parametric metamodel to predict the particles’ positions ${\mu}_{{B}_{\text{train}}}$. The surrogate parametric metamodel is corrected by adding an additional term to these predictions, coming from the matching performed between the predictions and the normalised highfidelity solutions. The ANOVAPGD is selfenriched with the predictedcorrected particles’ positions denoted ${\widehat{\mu}}_{{B}_{\text{train}}}^{c}$. New sampling points B_{test} from the Latin Hypercube sampling proposed in Figure 9 are used to check and quantify the accuracy of the model.
Online evaluation: The surrogate parametric metamodel can be used for any set of unknown parameters and will predict the distribution $\widehat{\phi}$ using the particles’ position as presented in the equation (6). To recover $\widehat{f},\widehat{\phi}$ is multiplied by the predicted norm $\widehat{I}$.
Fig. 10 Methodology of the enrichment and correction of the parametric surrogate parametric metamodel for the b_{train} new sampling point. 
4.2 Error evaluation
To evaluate the accuracy of the surrogate parametric metamodel B_{test}, b_{test} ∈ 〚B_{test}〛 sampling points from the LHS in Figure 9 are used. Then, predicted solutions ${\widehat{f}}_{{b}_{test}}$ are compared with the reference solutions ${f}_{{b}_{\text{test}}}$ obtained from highfidelity solutions. To recover the denormalised prediction ${\widehat{f}}_{{b}_{test}}$, we first reconstruct ${\widehat{\phi}}_{{b}_{test}}$ using the predicted particles’ positions ${\widehat{\mu}}_{{b}_{test}}$ following equation (6) and then multiply the result by the predicted mass ${\widehat{I}}_{{b}_{\text{test}}}$. The relative error is calculated as follows:
$${E}_{rr}=\frac{{\Vert {f}_{{b}_{\text{test}}}{\widehat{f}}_{{b}_{\text{test}}}\Vert}^{2}}{{\Vert {f}_{{b}_{\text{test}}}\Vert}^{2}}.$$(12)
Parametric space
5 Numerical applications
In order to illustrate and analyse the proposed methodology, two different 2D heat problems are studied. In both cases, we consider a rectangular plate of dimensions [0, 3] × [0, 3] made of a conductive material (isotropic). The 2D heat equation is solved using the Finite Element (FE) method implemented in FEniCS [28] in the domain Ω. We consider the parametric space 𝒟 were three parameters (also known as dimensions) are introduced, p = (p^{1},p^{2},p^{3}) which correspond respectively to Sx, Sy, the coordinates of the Gaussian heat source and the initial temperature T_{0}. The sampling is the one presented in the Figure 9 referred to the parameter intervals in Table 1. We also referred to the element mesh in Figure 6.
5.1 Parametric 2D heat equation problem with one Gaussian heat source
First, we consider one unique heat source with the parameters p = (Sx, Sy, T_{0}). Then, the 2D problem reads:
$$\{\begin{array}{ll}k\left(\frac{{\partial}^{2}T}{\partial {x}^{2}}+\frac{{\partial}^{2}T}{\partial {y}^{2}}\right)=\rho {C}_{P}\frac{\partial T}{\partial t}\hfill & \text{in}\Omega \times \left[0,{t}_{f}\right),\hfill \\ \nabla T\cdot \text{n}=0\hfill & \text{on}\partial \Omega \hfill \\ T(x,y,t=0)={T}_{0}\frac{1}{2\pi {\sigma}^{2}}{\mathrm{exp}}^{\frac{{(xSx)}^{2}+{(ySy)}^{2}}{2{\sigma}^{2}}}\hfill & \text{in}\Omega ,\hfill \end{array}$$(13)
In this case, and for each parameter configuration p_{m}, m ∈ 〚M〛, where M = B + B_{train} is the number of sampling points of the structured sampling plus the number of sampling points of the LHS as it shown in the Figure 9, T(x, y, t = t_{f} )_{m} is a highfidelity solution. After the normalisation, they are denoted φ_{i}. Then, based on the methodology described in Section 4, a surrogate parametric metamodel is created and returns a predicted particles’ position ${\widehat{\mu}}_{M}.$. The number of particles and the standard deviation are fixed with P = 1500 particles and σ = 0.035. In the sPGD part, M_{modes} = 100 modes were used with a classical polynomial basis. To evaluate the relative error of the surrogate parametric metamodel, equation (12) is used with B_{test} = 6 distributions. Results are presented in the Table 2.
The Figure 11 shows a comparison between reference solutions f from the highfidelity model and reconstructed predicted solutions $\widehat{f}$ from the surrogate parametric metamodel for B_{test} = 6.
Fig. 11 Comparison between the reference solutions and the predicted solutions for B_{test} = 6. 
Relative errors obtained between the predicted solutions and the real solutions for B_{test} = 6.
5.2 Parametric 2D heat equation problem with two Gaussian heat sources
In the previous physical problem, the solution can be easily approximated by a Model Order Reduction technique because of the localised solution due to one Gaussian heat source. To evaluate our approach, a more complex numerical example is formulated in this section while taking into consideration two Gaussian heat sources. Then, the 2D heat problem with the parameters p = (Sx, Sy,T_{0}) sampled as shown in Figure 9 reads:
$$\{\begin{array}{ll}k\left(\frac{{\partial}^{2}T}{\partial {x}^{2}}+\frac{{\partial}^{2}T}{\partial {y}^{2}}\right)=\rho {C}_{P}\frac{\partial T}{\partial t}\hfill & \text{in}\Omega \times \left[0,{t}_{f}\right),\hfill \\ \nabla T\cdot \text{n}=0\hfill & \text{on}\partial \Omega \hfill \\ T(x,y,t=0)={T}_{0}\frac{1}{2\pi {\sigma}^{2}}{\mathrm{exp}}^{\frac{{(xSx)}^{2}+{(ySy)}^{2}}{2{\sigma}^{2}}}\hfill & \text{in}\Omega ,\hfill \end{array}$$(14)
As presented in the first case, and for each parameter configuration p_{m}, m ∈ 〚M〛, where M = B + B_{train} is the number of sampling points of the structured sampling plus the number of sampling points of the LHS as it shown in the Figure 9, T(x,y,t = t_{f} )_{m} is a highfidelity solution. After the normalisation, they are denoted φ_{i}. Then, based on the methodology presented Section 4, a surrogate parametric metamodel is created and returns predicted particles’ position ${\widehat{\mu}}_{M}.$. The number of particles and the standard deviation are fixed with P = 1500 particles and σ = 0.05. In the sPGD part, M_{modes} = 100 modes were used with a classical polynomial basis. To evaluate the error of the surrogate parametric metamodel, equation (12) is used with B_{test} = 6 distributions. Results are presented in the Table 3.
Then, Figure 12 shows a comparison between reference solutions f from the highfidelity model and reconstructed predicted solutions $\widehat{f}$ from the surrogate parametric metamodel for B_{test} = 6.
6 Conclusion
In this paper, with the aim of speeding up the offline stage and increasing the accuracy of the former methodology, we address Optimal Transport as a step by step matching algorithm, matching solutions by pairs rather than all together.
Relying on the ANOVAPGD and its corresponding sampling strategy, the dimensionality of the complex matching problem is drastically reduced leading to an easily solved optimization problem. Moreover, based on the notion of barycentric projection, we no longer need to decompose all the solutions into Gaussian functions but just the one for the ANOVA anchor point. Also, the use of the ANOVAPGD methodology ensures that the quantity of offline calculations remains reasonable as a result of the linear increase in sampling.
Finally, the ANOVAPGD based surrogate is selfenriched with a classical LHS, relying again on the barycentric projection approach. This method enables the use of particle cloud representation of data, which may not typically be employed for this particular problem. To evaluate the efficiency, the work is in progress at extending our approach in a given complex industrial problem.
Relative errors obtained between the predicted solutions and the real solutions for B_{test} = 6.
Fig. 12 Comparison between reference solutions and predicted solutions for B_{test} = 6. 
Funding
The work described in this paper was performed in the frame of the MORPHO project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 101006854. This research was supported by the French National Association for Research and Technology (ANRT) through the research grant CIFRE 2020/0310.
Conflict of interest
All authors do not have conflict of interest to this submission.
References
 O. Cecil Zienkiewicz, P.B. Morice, The finite element method in engineering science, Vol. 1977, McGrawHill London, 1971 [Google Scholar]
 F. Chinesta, P. Ladeveze, E. Cueto, A short review on model order reduction based on proper generalized Decomposition, Arch. Comput. Methods Eng. 18, 395 (2011) [CrossRef] [Google Scholar]
 M. Verleysen, D. François, The curse of dimensionality in data mining and time series prediction, in Computational Intelligence and Bioinspired Systems: 8th International WorkConference on Artificial Neural Networks, IWANN 2005, Vilanova i la GeltrU, Barcelona, Spain, June 8–10, 2005, Proceedings 8. Springer, 2005, pp. 758–770 [Google Scholar]
 A. Belloni, V. Chernozhukov, L1penalized quantile regression in highdimensional sparse models, 2011 [Google Scholar]
 G. Yi, J.Q. Shi, T. Choi, Penalized Gaussian process regression and classification for highdimensional nonlinear data, Biometrics 67, 1285–1294 (2011) [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
 A. Sancarlos, V. Champaney, J.L. Duval, E. Cueto, F. Chinesta, Pgdbased advanced nonlinear multiparametric regressions for constructing metamodels at the scarcedata limit, [arXiv:2103.05358], 2021 [Google Scholar]
 U. Wilensky, Paradox, programming, and learning probability: a case study in a connected mathematics framework, J. Math. Behav. 14, 253–280 (1995) [CrossRef] [Google Scholar]
 C. Villani et al., Optimal transport: old and new, Vol. 338, Springer, 2009 [CrossRef] [Google Scholar]
 Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for image retrieval, Int. J. Comput. Vis. 40, 99 (2000) [CrossRef] [Google Scholar]
 M. Perrot, N. Courty, R. Flamary, A. Habrard, Mapping estimation for discrete optimal transport, Adv. Neural Inform. Process. Syst. 29, 4197–4205 (2016) [Google Scholar]
 M. Cuturi, A. Doucet, Fast computation of wasserstein barycenters, in: International Conference on Machine Learning, PMLR, 2014, 685–693 [Google Scholar]
 P.A Knight, The sinkhornknopp algorithm: convergence and applications, SIAM J. Matrix Anal. Applic. 30, 261–275 (2008) [CrossRef] [Google Scholar]
 P. Dvurechensky, A. Gasnikov, A. Kroshnin, Computational optimal transport: complexity by accelerated gradient descent is better than by sinkhorn’s algorithm, in: International Conference on Machine Learning, PMLR, 2018, 1367–1376 [Google Scholar]
 X. Bian, Z. Li, G.E. Karniadakis, Multiresolution flow simulations by smoothed particle hydrodynamics via domain decomposition, J. Comput. Phys. 297, 132–155 (2015) [CrossRef] [MathSciNet] [Google Scholar]
 J. Weed, F. Bach, Sharp asymptotic and finitesample rates of convergence of empirical measures in wasserstein distance, 2019 [Google Scholar]
 M. Jacot, V. Champaney, F. Chinesta, J. Cortial, Parametric damage mechanics empowering structural health monitoring of 3D woven composites, Sensors 23, 1946 (2023) [CrossRef] [PubMed] [Google Scholar]
 S. Torregrosa, V. Champaney, A. Ammar, V. Herbert, F. Chinesta, Surrogate parametric metamodel based on optimal transport, Math. Comput. Simul. 194, 36–63 (2022) [CrossRef] [Google Scholar]
 J. Rabin, G. Peyre, J. Delon, M. Bernot, Wasserstein barycenter and its application to texture mixing, in: Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, EinGedi, Israel, May 29June 2, 2011, Revised Selected Papers 3. Springer, 2012, pp. 435–446 [Google Scholar]
 N. Deb, P. Ghosal, B. Sen, Rates of estimation of optimal transport maps using plugin estimators via barycentric projections, Adv. Neural Inform. Process. Syst. 34, 29736–29753 (2021) [Google Scholar]
 G. Peyré, M. Cuturi, et al., Computational optimal transport, Center for Research in Economics and Statistics Working Papers, 2017–86, 2017 [Google Scholar]
 G. Peyré, M. Cuturi, et al., Computational optimal transport: with applications to data science, Found. Trends Mach. Learn. 11, 355–607 (2019) [CrossRef] [Google Scholar]
 C. Villani, Topics in optimal transportation, Vol. 58, American Mathematical Society, 2021 [Google Scholar]
 N. Bonneel, M.V. De Panne, S. Paris, W. Heidrich, Displacement interpolation using lagrangian mass transport, in: Proceedings of the 2011 SIGGRAPH Asia Conference, 2011, pp. 1–12 [Google Scholar]
 R. Flamary, N. Courty, A. Gramfort, M.Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, et al., Pot: Python optimal transport, J. Mach. Learn. Res. 22, 3571–3578 (2021) [Google Scholar]
 R.J. McCann, A convexity principle for interacting gases, Adv. Math. 128, 153–179 (1997) [CrossRef] [MathSciNet] [Google Scholar]
 M. El Fallaki Idrissi, F. Praud, V. Champaney, F. Chinesta, F. Meraghni, Multiparametric modeling of composite materials based on nonintrusive pgd informed by multiscale analyses: application for realtime stiffness prediction of woven composites, Composite Struct. 302, 116228 (2022). [CrossRef] [Google Scholar]
 R. Ibáñez, E. AbissetChavanne, A. Ammar, D. González, E. Cueto, A. Huerta, J.L. Duval, F. Chinesta, et al., A multidimensional datadriven sparse identification technique: the sparse proper generalized decomposition, Complexity 2018 (2018). [Google Scholar]
 I. Torre, Diffusive solver: a diffusionequations solver based on fenics, [arXiv:2011.04351], 2020 [Google Scholar]
Cite this article as: M. Jacot, V. Champaney, S. Torregrosa Jordan, J. Cortial, F. Chinesta, Empowering optimal transport matching algorithm for the construction of surrogate parametric metamodel, Mechanics & Industry 25, 9 (2024)
All Tables
Relative errors obtained between the predicted solutions and the real solutions for B_{test} = 6.
Relative errors obtained between the predicted solutions and the real solutions for B_{test} = 6.
All Figures
Fig. 1 (left) Continuous field description of data associated with a discretised probability density (also called weight) for each point of the regular grid. (right) Particle cloud description associated with a set of particles (point clouds). 

In the text 
Fig. 2 Linear displacement interpolation between the source distribution α and the target distribution β. 

In the text 
Fig. 3 Crosscentred sampling (DoE) considered as example with anchor point p_{α}. 

In the text 
Fig. 4 (left) Optimal matching between two distributions represented by point clouds. (middle) Barycentric projection focusing in one matched particle on the first point cloud. (left) Particles optimal matching following the barycentric projection that allow to interpolate between these two collections of particles. 

In the text 
Fig. 5 Different parts composing the initial crosscentred sampling introduced in the Figure 3. 

In the text 
Fig. 6 Example of a element mesh with associated spatial coordinates. 

In the text 
Fig. 7 Optimal matching between two point clouds (collections of particles.) 

In the text 
Fig. 8 Projection of particles ${\mu}_{{n}_{b}+1}^{{n}_{c}}\text{from}{\mu}_{{n}_{b}}^{{n}_{c}}$ using the notion of barycentric projection proposed in equation (7). 

In the text 
Fig. 9 Example of enrichment of the crosscentred sampling points introduced in the Figure 3. The blue points are the additional sampling points from a LHS. 

In the text 
Fig. 10 Methodology of the enrichment and correction of the parametric surrogate parametric metamodel for the b_{train} new sampling point. 

In the text 
Fig. 11 Comparison between the reference solutions and the predicted solutions for B_{test} = 6. 

In the text 
Fig. 12 Comparison between reference solutions and predicted solutions for B_{test} = 6. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.