Issue 
Mechanics & Industry
Volume 24, 2023



Article Number  34  
Number of page(s)  21  
DOI  https://doi.org/10.1051/meca/2023029  
Published online  15 September 2023 
Research Article
A neural networkbased datadriven local modeling of spotwelded plates under impact
^{1}
Université ParisSaclay, CentraleSupélec, ENS ParisSaclay, CNRS, LMPS – Laboratoire de Mécanique ParisSaclay,
91190
GifsurYvette, France
^{2}
EPF School of Engineering,
94230
Cachan, France
^{3}
IUF, Institut Universitaire de France,
Paris, France
^{4}
Altair Engineering France,
92160
Antony, France
^{5}
Stellantis,
78140
VelizyVillacoublay, France
^{*} email: afsal.pulikkathodi@ensparissaclay.fr
Received:
28
February
2023
Accepted:
27
July
2023
Solving large structural problems with multiple complex localized behaviors is extremely challenging. To address this difficulty, both intrusive and nonintrusive Domain Decomposition Methods (DDM) have been developed in the past, where the refined model (local) is solved separately in its own space and time scales. In this work, the Finite Element Method (FEM) at the local scale is replaced with a datadriven Reduced Order Model (ROM) to further decrease computational time. The reduced model aims to create a lowcost, accurate and efficient mapping from interface velocities to interface forces and enable the prediction of their time evolution. The present work proposes a modeling technique based on the PhysicsGuided Architecture of Neural Networks (PGANNs), which incorporates physical variables other than input/output variables into the neural network architecture. We develop this approach on a 2D plate with a hole as well as a 3D case with spotwelded plates undergoing fast deformation, representing nonlinear elastoplasticity problems. Neural networks are trained using simulation data generated by explicit dynamic FEM solvers. The PGANN results are in good agreement with the FEM solutions for both test cases, including those in the training dataset as well as the unseen dataset, given the loading type is present in the training set.
Key words: Artificial neural networks / datadriven modelling / local / global coupling / explicit dynamics / physicsguided architecture
© A. Pulikkathodi et al., Published by EDP Sciences, 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Highfidelity simulations of large structures containing many localized features such as spotwelds [1] or bolted joints are still a major scientific and industrial challenge. The mechanical behavior of these localized features has traditionally been addressed using simplified models to circumvent the complexities associated with numerical analysis procedures. For spotwelds, these models typically employ rigid or flexible beams with coincident nodes [2,3]. However, these simplified modeling approaches often fall short in accurately predicting failure. Therefore, in order to capture the complex and nonlinear behavior of these localized features accurately, refined 3D elements are necessary. Due to the space and time scale discrepancies between the global response of the structure and the localized phenomena, such mesh refinements drastically increase the computational cost. This arises not only from the increase of the number of dofs but also from the requirement to reduce the time step to ensure the CourantFriedrichsLewy (CFL) condition [4] of the explicit time integration process.
Numerous numerical methods devoted to multiscale computing have emerged over the last three decades to address this problem. They are mostly based on Domain Decomposition techniques such as the primal BDD method [5], dual FETI method [6], the mixed LATIN scheme [7], or the Arlequin framework [8]. These are wellsuited to problems in which the refinement zones are fixed. In the case of rapidly evolving problems, nonintrusive domain decomposition techniques based on local/global approaches have proven to be effective (e.g., local plasticity [9,10], local crack propagation [11]). An overview of such techniques can be found in [12]. They allow the global mesh to remain unchanged while the local problem is refined in space and time. The nonintrusive local/global strategy relies on an iterative exchange of interface quantities between global and local computations. It was successfully applied to coupling of 2D and 3D models in thin composite panels with local stress concentration and debonding [13]. The same approach was extended in the context of explicit dynamics in [14,15]. Submodelling techniques are another nonintrusive option widely available in Finite Element Method (FEM) software and used in industry. However, they are referred to as “oneway coupling” because the data exchange occurs from global to local, and the global solution is not updated after the local solution has been modified.
The FEM simulations of local problems can be computationally expensive due to the requirement for refined space and time scales. To address the aforementioned challenge, it is proposed here to replace the original, timeconsuming simulation by a reduced model in order to increase the computational performance. Model Order Reduction (MOR) [16] seems to be an appealing choice to overcome this issue. Projectionbased MOR methods, which rely on linear transformations with some additional constraints, such as Proper Orthogonal Decomposition (POD) or ReducedBasis technique [17], are widely used. However, they can hardly be used to model highly nonlinear phenomena and often require prior knowledge of the governing equations of the physics. Neural Networks (NNs) based on machine learning have proven to be highly effective for learning nonlinear manifolds. NNs often outperform physicsbased models in many disciplines (e.g., materials science [18], applied physics [19], biomedical science [20], computational biology [21]) in terms of prediction accuracy and capability to capture the underlying nonlinear inputoutput relationship for complex systems. A main limitation of this technique is the correct selection of the network tuning parameters.
Training NNs requires a vast amount of data that must contain a rich inputoutput relationship, which is not available in the majority of engineering problems. To address this issue, researchers have developed a large variety of methods for integrating physical principles into NN models. Integrating physical principles into NN models provides several key advantages. It ensures physical consistency by aligning predictions with fundamental physics laws and constraints. This improves generalization, allowing accurate predictions with limited data. A detailed review of these methods applied in various fields can be found in [22]. They can be broadly classified into four main categories: (i) physicsguided loss function, (ii) physicsguided initialization, (iii) physicsguided architecture, and (iv) hybrid physicsMachine Learning (ML) models. The idea behind physicsguided loss function is to incorporate physical constraints into the loss function of NN models [23,24]. In physicsguided initialization, the physical or other contextual knowledge is used to help inform the initialization of the weights, so that the model training can be accelerated and may require fewer training samples [25]. In physicsguided architecture, the key idea is to incorporate physicsbased guidance into architecture design making the NN more interpretable, this allows to include typically missing features into the NN model [26,27]. In hybrid modelling the focus has been on augmenting ML models specifically; numerous approaches combine physicsbased models with ML models where both are operating simultaneously [28].
In an explicit dynamic local/global coupling framework, the input and output of the local problem are the interface velocities and interface reaction forces, respectively. However, these quantities are highly noisy in nature, making it difficult to obtain direct inputoutput relationships. To address this fundamental challenge, we propose to use PhysicsGuided Architecture of Neural Networks (PGANNs). The key idea behind this approach is to inject other physical variables of the whole local domain, such as displacement, stress, strain, plastic strain etc., at the local scale between the input and output layers of NNs in order to improve learning within the solution space. By including these additional physical variables, we provide the network with relevant domainspecific information that can potentially enhance optimization and training efficiency. It should be noted that the architecture of the proposed NN itself operates as a black box, as its internal mechanisms are not explicitly designed to resemble physics equations.
In this article, we focus on the injection of displacement as well as effective plastic strain as intermediate variables, as displayed in Figure 4. The neural network (NN) architecture begins by reconstructing the displacement of the local domain from the input. Next, an autoencoder extracts the most relevant features of displacement, known as Latent Vector (LV). Similarly, another autoencoder is employed to capture the significant features of effective plastic strain, forming its corresponding LV. The dynamics of these LVs is then modeled using a Long ShortTerm Memory (LSTM) network. Finally, the predicted displacement at the next global time step is mapped to the interface reaction forces.
The paper is organised as follows: In Section 2, a summary of the nonintrusive local/global coupling reference problem in explicit dynamics is presented. Section 3 details the proposed PGANN to metamodel the local problem. In Section 4, the proposed method is tested on a numerical example of an elastoplastic plate with hole geometry discretized using 2D elements that has undergone rapid deformation. In Section 5, the proposed method is tested on a spotwelded plate geometry discretized using 3D elements. Finally, in Section 6, conclusions and outlooks of the proposed work are presented.
2 Problem formulation
In this study, we briefly present a method for addressing the nonintrusive local/global coupling in explicit dynamics [29]. To illustrate the method, we consider the problem configuration depicted in Figure 1. As shown in the figure, the overall domain, denoted as Ω, is partitioned into two subdomains: the local region, denoted as Ω_{i}, and the complementary region, denoted as Ω_{C}. The local region Ω_{l} may contain fine geometric features or large gradients, which requires a refined mesh in both space and time scales. Conversely, the complementary region Ω_{C} only requires a coarse mesh in both space and time. The interface between the two regions is denoted by Г. In this problem, the disºlacement 𝒰_{D} is prescribed on tlie Dirichlet boundary ∂Ω_{𝒰} the external force f_{ext} is applied on the Neumann boundary ∂Ω_{f}, and the body force f_{Ω} may be applied in the domain Ω.
Using a displacementbased finite element method with the heterogeneous spatial discretization described in Figure 2a (left), the problem can be expressed as follows.
$$\begin{array}{lll}M\ddot{U}={F}^{\text{ext}}{F}^{\mathrm{int}}\hfill & \text{\hspace{0.17em}\hspace{0.17em}over}\hfill & {\text{\Omega}}_{c}\cup {\text{\Omega}}_{l}\times \left[{t}_{0},{t}_{end}\right]\hfill \\ U=\overline{U}\hfill & \text{along}\hfill & \partial {\text{\Omega}}_{l}\times \left[{t}_{0},{t}_{end}\right]\hfill \\ \left\{U,\dot{U}\right\}=\left\{{U}_{0},\text{\hspace{0.17em}}{V}_{0}\right\}\hfill & \text{\hspace{0.17em}\hspace{0.17em}over}\hfill & {\text{\Omega}}_{c}\cup {\text{\Omega}}_{l}{}_{{t}_{0}}\hfill \end{array}$$(1)
where M is the lumped mass matrix, F^{ext} and F^{int} are, respectively, the external and internal force vectors, and Ω_{C} ∪ Ω_{l} represents the union of two different homogeneous finite element discretizations that span the entire domain. U represents the vector containing the nodal displacement (Ü and $\dot{U}$ are the associated acceleration and velocity respectively). The initial and final times are denoted t_{0} and t_{end}.
The central difference technique is used to implement an explicit time integration scheme. The time integration scheme is not unconditionally stable, so the CFL condition governs the selection of the time step size. The critical time step ∆t_{cr} is given by:
$$\text{\Delta}{t}_{cr}=\frac{2}{{\omega}_{max}}$$(2)
where ω_{тах} is the maximum eigenfrequency of the problem. The value of ω_{тах} is inversely proportional to the smallest element size. As a result, the time integration step is reduced because of the refined space mesh.
In a nonintrusive local/global coupling framework, the global model extends over the whole structure with a global mesh and never changes (Fig. 2a). The local analysis is carried out with a more refined mesh where the boundary conditions are derived from the global problem. Two different time steps ∆t_{ɡ} and ∆t_{l} are applied in the two partitions Ω_{C} and Ω_{l} (Fig. 2b). Separate explicit dynamics analyses of the two meshes are performed concurrently, allowing the models to run with their own time increment. The continuity of velocities between the two models is ensured along the interface Г through Lagrange multipliers [9]. Substituting the local model into the global model is achieved by iteratively exchanging the velocities and forces at the interface Г, as displayed in Figure 2a. The global domain, shown with a coarse mesh on the left, is initially solved at a global time scale τ_{ɡ}. The velocity at the interface is then transferred to solve the local problem as a Dirichlet problem at local time scale τ_{l}. The global problem is then resolved by applying the reaction forces from the local problem at the interface. This process is repeated until the difference in reaction forces reaches a predefined tolerance ē. It was shown in [30] that for explicit dynamic problems, the global computation may be performed only once per global time step, while a repeated solution is needed only for the local problems. This finding serves as a motivation for our development of a NNbased ROM for the local problem, which significantly reduces the overall iteration cost.
The nonintrusive local/global coupling strategy is not implemented in this article; it is a work in progress to integrate NNbased ROM with explicit FEM solvers. In this context, our present goal is to develop a reduced model of the local problem based on NN, which can predict the interface forces at time t + ∆t_{ɡ}, given the boundary conditions of the local problem at previous global time steps t. More precisely, as shown in Figure 3, given the interface velocities and other model parameters at time instants such as ${t}_{g}^{n}$, ${t}_{g}^{n1}$, ${t}_{g}^{n2}$, ${t}_{g}^{n3}$, the metamodel should be able to accurately predict the interface forces at ${t}_{g}^{n1}$. It is worth noticing that the number of past time steps considered is arbitrary, and a greater amount of historical information generally improves the predictive performance of the model. However, it is important to consider the limitations of storage space associated with longterm data retention.
Data: Initial conditions, time step, total simulation time, ē
while t_{g} < t_{end} do
// Solve the global problem
`M_{g}Ü_{g} = F_{g}^{ext}−F_{g}^{int};
while e > ē do
// Local computation using velocities extracted at Г from the global problem
M_{1}Ü_{1} = F_{1}^{ext}−F_{1}^{int};
// Compute residual and global correction acceleration at the interface Г
${r}_{\text{\Gamma}}\text{\hspace{0.17em}=\hspace{0.17em}}{r}_{1,\text{\Gamma}}{r}_{\text{c,\Gamma}}=\text{\hspace{0.17em}}\left({F}_{1,\text{\Gamma}}^{\text{ext}}{M}_{1,\text{\Gamma}}\text{\hspace{0.17em}}{\ddot{U}}_{1,\text{\Gamma}}{F}_{1,\text{\Gamma}}^{\mathrm{int}}\right)\left({F}_{C\text{,\Gamma}}^{\text{ext}}{M}_{C\text{,\Gamma}}\text{\hspace{0.17em}}{\ddot{U}}_{C\text{,\Gamma}}\text{\hspace{0.17em}}{F}_{C\text{,\Gamma}}^{\mathrm{int}}\right)$
${M}_{g,\text{\Gamma}}\text{\hspace{0.17em}}{\ddot{U}}_{g,\text{\Gamma}}^{\text{corr}}={r}_{\text{\Gamma}};$
// update global acceleration at the interface Г
${\ddot{U}}_{g,\text{\Gamma}}={\ddot{U}}_{g,\text{\Gamma}}+{\ddot{U}}_{g,\text{\Gamma}}^{corr}$
Compute e ;
end
// Global stabilization and update acceleration [30]
// Update time
t_{g}=t_{g}+∆t_{g}
end
Fig. 1 Heterogeneous discretization in space of the reference problem. 
Fig. 2 An illustration of the nonintrusive local/global coupling method. 
Fig. 3 Structure of the reduced model: the NN inputs are interface velocity and interface nodal position at time instances ${t}_{g}^{n}$,${t}_{g}^{n1}$, ${t}_{g}^{n2}$,${t}_{g}^{n3}$ in global time scale denoted using dashed arrows, while the output is the interface forces at time ${t}_{g}^{n+1}$ denoted using solid arrow. 
Fig. 4 A schematic for the architecture of the PGANN framework. 
3 Proposed physicsguided NN architecture (PGANN)
3.1 Structure of PGANN
In this section, we shall elaborate on the proposed PGANN model. Due to the lack of a robust inputoutput relationship, it is challenging to train a NN using only the velocity and reaction forces of the interface. This is primarily due to their noisy nature and dependence on other model parameters, such as material parameters and geometric parameters. In order to address this issue, a new layer containing information about the displacement 𝒰_{l} of the local problem is inserted between the input and output layers.
Plastic deformation of materials is a complex process that exhibits historydependent behavior. Recent studies in [31,32] have investigated the application of (Recurrent Neural Networks) RNNs in modeling pathdependent plasticity models. These RNNs, which are an extension of traditional NNs, have shown promising results in predicting complex historydependent plasticity. However, their architecture is more complex, requiring a large amount of historical data for accurate predictions. It was shown in [33] that traditional NN architectures can also capture pathdependent behavior effectively, provided that a set of internal variables representing the material history is appropriately included in the training process. Inspired by this work, we adopt a similar methodology in this study to avoid the need for an extensive history of data while still achieving accurate predictions. In metal plasticity, the history of plastic deformation is often characterized by using a scalar quantity called the effective plastic strain, which is given by
$${\overline{\epsilon}}_{p}={\displaystyle \int \Vert {\dot{\epsilon}}_{p}\Vert}\text{\hspace{0.17em}}dt$$(3)
where ${\dot{\epsilon}}_{p}$ is the plastic strain rate. The examples presented in this study use the effective plastic strain ${\overline{\epsilon}}_{p}$ as the internal variable.
An important aspect of our method is the preprocessing step, in which the actual FEM grid is interpolated to a Cartesian grid to represent the local displacement. Thus, the interpolated displacement can be treated analogous to a colour image, and the three components of displacement (x, y, and z) are treated analogous to the three colour channels, allowing the use of convolutional NN architectures. As a result, the memory and computation costs depend primarily on the size of the interpolated grid rather than the size of the actual FEM grid. A schematic of the training pipeline for the PGANN model is presented in Figure 4. It consists of four separately trained NNs. In the following paragraphs of this section, we shall explain the procedures for training all four networks separately.
The first NN (NN I) of the architecture is trained to capture the nonlinear relationship between the input variables, specifically the velocities and displacements ${u}_{\text{\Gamma}}^{t}$ of the interface nodes, at a specific time t, and the corresponding displacement of the local model ${u}_{l}^{t}$. It is important to highlight that while this study focuses on these specific input variables, the approach is not limited exclusively to this set of variables. Other model parameters, such as material parameters or geometric parameters, can also be included as input variables, further enhancing the flexibility and applicability of the methodology. The input layers are concatenated to form a dense layer. A few dense layers and convolutional layers are then added between the concatenated layer and the output layer ${u}_{l}^{t}$. The number of these added layers and their properties, such as the number of filters used and filter size, are the main hyperparameters to be optimized in this model; this is detailed in Section 4.2.1.
The second NN, referred to as (NN II), utilizes autoencoders to encode the local solution into a reduced manifold known as a latent vector z^{t}. Autoencoders are trained by setting the target layers equal to the input layers, where the size of the middle layer (latent vector) is much smaller than that of the input/output layers. The main objective is to extract the most dominant features of the system, while effectively filtering out the noise generated during explicit dynamics simulation. This reduced manifold enhances the training performance of the subsequent RNN model (NN III), which is responsible for predicting the system time evolution. Within (NN II), two autoencoders are employed and trained separately. The first autoencoder reduces the displacement ${u}_{l}^{t}$ to its corresponding latent vector ${z}_{u}^{t}$ using its encoder component ϕ_{𝒰}, while the decoder component $\phi {\prime}_{u}$ converts the latent vector ${z}_{u}^{t}$ back to the original displacement ${u}_{l}^{t}$. Similarly, the second autoencoder reduces the ${\overline{\epsilon}}_{p,l}^{t}$ to its corresponding latent vector ${z}_{{\overline{\epsilon}}_{p}}^{t}$ using an encoder component ${\phi}_{{\overline{\epsilon}}_{p}}$, and a decoder component converts the latent vector back to the original ${\overline{\epsilon}}_{p,l}^{t}$. During the online phase, when the trained model is utilized, the first explicit cycle initializes the ${\overline{\epsilon}}_{p,l}$ with zero, assuming no plastic deformation has occurred. The predicted ${\overline{\epsilon}}_{p,l}^{t+\text{\Delta}{t}_{g}}$ is then used as input for the subsequent cycle. Since we are not interested in observing ${\overline{\epsilon}}_{p,l}$ during the online phase, only its latent vector ${z}_{{\overline{\epsilon}}_{p}}^{t+\text{\Delta}{t}_{g}}$ needs to be stored at the end of each cycle. This approach avoids the need for evaluating ${\phi}_{{\overline{\epsilon}}_{p}}$ and $\phi {\prime}_{{\overline{\epsilon}}_{p}}$, thereby reducing the computational cost involved.
The third NN (NN III) is a RNN that learns the temporal evolution of the latent variable. More specifically, the goal is to predict the latent vector at time t + ∆t_{ɡ} based on the latent vector from previous global time steps. The input of this network consists of a sequence of latent vectors (z_{𝒰} and z${z}_{{\overline{\epsilon}}_{p}}$ ) from past global time steps, while the output is a sequence of latent vectors in future local time steps. From this sequence, the latent vector corresponding to ∆t_{ɡ} is selected. It is important to note that here the purpose of the RNN is not to learn the path dependency of elastoplastic materials. We employ an encoderdecoder (also known as SequencetoSequence models) type LSTM architecture. This means that the model will not output a vector sequence directly. Instead, the model will consist of two submodels: the encoder and the decoder. The encoder reads and summarizes the input sequence information, referred to as the internal state vectors. At each time step, the encoder receives an input element from the sequence and updates its internal state based on the current input and the previous hidden state. The LSTM memory cells maintain a memory of past inputs, which enables them to retain information over longer sequences. The outputs of the encoder are discarded and only the internal states are kept. The decoder reads the final state of the encoder and makes a onestep prediction for each element of the output sequence (Fig. 5). We enforce that the input of the decoder at each time step is the output from the previous time step, which helps to train the decoder faster. The fact that the output sequence corresponds to local time steps allows prediction for variable global time step ∆t_{ɡ} using simple interpolation. The shapes of input and output are specific to the case, therefore the implementation will be detailed in Section 4.2.4.
The final NN, (NN IV), is responsible for mapping the predicted local nodal displacement ${u}_{l}^{t+\text{\Delta}tg}$, the predicted interface velocity ${u}_{\text{\Gamma}}^{t+\text{\Delta}tg}$ at time t + ∆t_{ɡ}, and its corresponding interface reaction forces ${r}_{\text{\Gamma}}^{t+\text{\Delta}{t}_{g}}$. The architecture used is similar to that of the encoder part φ_{𝒰} of the autoencoder. First the multidimensional displacement layer is transformed to a dense layer using convolutional layers, and merged with other model parameter layers to output the interface forces.
Fig. 5 Example of proposed RNN architecture used. The network is trained by taking the latent vectors from the past four global time steps and the following five latent vectors of local scale as output. 
3.2 Training configuration and metrics
The quantitative performance of each NN framework is evaluated using a metric called Normalized Mean Absolute Error (NMAE). NMAE is calculated for each component k between the actual value and the predicted value by the NN, from the initial time to the final time. The overall NMAE is obtained by summing the NMAEs of each component. The NMAE is defined as:
$$\text{NMAE=}{\displaystyle \sum _{k=1}^{R}\left(\frac{{\displaystyle {\sum}_{i=1}^{N}\left{y}_{k}^{\left(i\right)}{\tilde{y}}_{k}^{\left(i\right)}\right}}{{\displaystyle {\sum}_{i=1}^{N}{y}_{k}^{\left(i\right)}}}\right)}$$(4)
where R is the total number of components of the dynamical system, N is the total number of time steps in the evolution of the dynamical system, is the true solution and ỹ is the solution predicted by the NN.
Throughout this study, we employ the mean squared error (MSE) as the loss function, Rectified Linear Unit (ReLU) activation function for all hidden layers and linear activation function for the output layer in all NNs. The ReLU activation function σ can be defined as:
$$\sigma \left(a\right)=\mathrm{max}\left(0,a\right)$$(5)
where a is the input of the node.
The NN training has been performed in the Python environment using TensorFlow. We used the TensorFlow 2.0 API tf.distribute.MultiWorkerMirroredStrategy to distribute the training across multiple machines (up to 4 machines) with single NVIDIA Quadro RTX 4000 GPU cards. Source codes are available at https://github.com/afsalpt1/ROMPGANN. Using the TensorFlow API tf .data.Dataset, efficient input pipelines are written for the data generated in Section 4.1. The pipeline allows for easy access to the data to distribute the training across multiple machines. We set the training batch size to 128 to ensure better generalization of each batch. To train the NN, the Adam optimization algorithm is used. In the following section, the proposed method is implemented on a plate with hole example.
4 A first validation example: plate with a hole
In the present section we consider an elementary problem to test and validate the PGANN architecture for metamodelling of a highly nonlinear local problem. Let us consider a rectangular 2D domain with a hole in the centre, as shown in Figure 6. The region surrounding the hole is taken as the local domain Ω_{l}, where plastic deformation is expected during the loading process, and the remaining part refers to Ω_{C} where only elastic deformation occurs.
The elastic behavior is defined by the Young modulus E = 210 GPa, the Poisson ratio V = 0.3, and the material density ρ = 7, 800 kg/m^{3}. The isotropic elastoplastic material is described using the JohnsonCook material model. Neglecting the effect of temperature or strain rate, the expression for stress is given by:
$$\sigma =a+b{\epsilon}_{p}^{n}$$(6)
where ∈_{p} is the plastic strain. The Johnson Cook parameter a, hardening modulus b, and the hardening exponent n are defined as
$$\begin{array}{lll}a={\sigma}_{y},\hfill & b=\frac{{\sigma}_{u}}{n{\epsilon}_{u}^{\left(u1\right)}},\hfill & n=\frac{{\sigma}_{u}{\epsilon}_{u}}{{\sigma}_{u}{\sigma}_{y}}\hfill \end{array}$$(7)
where σ_{y} is the yield stress, σ_{𝒰} is the engineering Ultimate Tensile Stress (UTS), and ∈_{𝒰} is the engineering strain at UTS. For simulation purposes, the following values are considered: a = 0.792 GPa, b = 0.51 GPa, and n = 0.26.
The dimensions of the computational domain shown in Figure 6 are H_{x} = 300 mm, H_{y} = 100 mm. The diameter of the hole is d = 40 mm, and the length of the local domain is H_{l} = 60 mm. The structure is clamped on the left boundary ∂Ω_{L}, and the displacements are imposed on the right boundary ∂Ω_{R}.
Fig. 6 Model of a plate with hole. The local domain, shown in yellow, is a refined region near a geometrical detail (a hole). 
4.1 Data generation and postprocessing
The data required for training the NNs are obtained by solving the reference model shown in Figure 6 using a nonlinear explicit FEM dynamic solver. The simulation spans from an initial time t_{0} =0 ms to a final time t_{end} = 2.5 ms. To discretize the time domain, a constant time step ∆t_{l} = 2.3 × 10^{−4} ms is employed, calculated based on (2). Consequently, each simulation comprises approximately 10,800 time steps. For simplicity it is assumed here that the ratio ∆t_{ɡ}/∆t_{l} is equal to 10.
The first dataset comprises five principal loading directions, namely translational displacements (s_{x},s_{y},s_{z}) and rotational angles (θ_{x},θ_{y}). Each load direction undergoes eight simulations with varying magnitudes, including four positive and four negative magnitudes, resulting in a total of 40 simulations. The second dataset is created to enrich the first dataset by including additional loading directions. Unlike the singledirection loading in the first set, the second dataset involves simultaneous monotonic loading in two directions, as outlined in Table 1. This results in a total of 24 distinct loading directions. For each combination of load directions, four simulations are performed, encompassing two positive and two negative magnitudes, thereby generating additional 96 simulations.
Consequently, the second dataset comprises a total of 136 simulations. Further information regarding the magnitude and direction of loading can be found in Appendix A. While alternative approaches like Design of Experiment (DoE) methodologies could be employed to efficiently generate optimal data parameters, it is important to note that the simplified load cases used in this article primarily serve to present the core concept of our strategy. In reality, the load cases are much more complex and can be derived from the local/global boundaries of a coupled model.
In this particular example, snapshots are collected at each local time step. At the interface Г, velocities (v_{Г} = (v_{Г,x},v_{Г,y},v_{Г,z})), rotational velocities (ω_{Г} = (ω_{Г,x},ω_{Г,y}, ω_{Г,z})), and reaction forces (r^{Г} = (r_{Г,x}, r_{Г,y}, r_{Г,z})) are recorded. Additionally, the nodal displacement (𝒰_{l} = (𝒰_{x}, 𝒰_{y}, 𝒰_{z})) of the local domain, and the elemental effective plastic strain (${\overline{\epsilon}}_{p}$) of the local domain is collected. The model contains a total of 120 interface nodes; thus, the interface quantities v, ω, or r are a vector of size 120 × 3 (x, y and z components). The displacement in the actual FEM grid is interpolated to a Cartesian regular grid of size 40 × 40, as shown in Figure 7. To improve the training performance, all data are standardized on a scale of [0, 1] using a MinMax scale. The network is trained using 80% of the total data, while the remaining portion is divided equally between the crossvalidation set (10%) and the test set (10%).
4.2 Training and verification of PGANN
In this section, we explore the individual training and verification of all four subnetworks of the proposed PGANNs. First we outline a systematic approach for determining the optimal hyperparameters of a NN. The hyperparameters we focus on include: the number of layers, the number of neurons or filters in dense or convolutional layers, the choice of activation functions, the type of loss function, the learning rate, and the batch size, among others. Then to evaluate the accuracy of the trained model, aside from the dedicated test dataset comprising 10% of the overall data, four additional test cases are generated, which are neither included in the first dataset nor in the second dataset. To avoid confusion, we will refer to the former as the test dataset and the latter as test cases. Table 2 provides details regarding the direction and magnitude of the load applied on the boundary ∂Ω_{R} for each test case.
The performance of the first three test cases is evaluated using the NNs trained on first dataset. The first two test cases involve loading of similar direction that were present in the training data, but with different, unseen magnitudes. The first test case has a magnitude within the range of the training data, while the second test case has a larger magnitude. The third and fourth test cases have the same loading magnitude and direction and correspond to combination loading. The third test case is evaluated on model trained using the first training set, while the fourth test case is evaluated on model trained using the second training set.
Training datasets.
Test cases used for verification of PGANN architectures for 2D example.
Fig. 7 Grid used to feed displacement to NN (green), and actual grid used by FEM solver (blue). 
4.2.1 NN I
Figure 8 illustrates the initial proposed reference architecture for NN I. It comprises three input layers:${\upsilon}_{\text{\Gamma}}^{t}$, ${w}_{\text{\Gamma}}^{t}$, and displacement of interface nodes ${u}_{\text{\Gamma}}^{t}$. After concatenating input layers, a dense layer of size 1, 600 is added to help reshape to a 3D layer of shape (5,5,64). Subsequently, three deconvolution layers (Conv2DTranspose) with 64, 32 and 16 filters, respectively, are added to achieve an output shape of (40,40,3). For each deconvolutional layer, filters have a kernel size of 2 and a stride of size 2 is used.
When working with NNs, two crucial hyperparameters to tune are the learning rate and batch size. The learning rate determines the step size at each iteration during the optimization process, while the batch size refers to the number of training examples processed in a single forward and backward pass. If the learning rate is set too high, the optimization process may oscillate or fail to converge. On the other hand, if the learning rate is too low, the training process may become extremely slow. A larger batch size can provide computational efficiency, as more examples are processed in parallel, but it may lead to suboptimal generalization. Smaller batch sizes, on the other hand, can introduce more noise and result in slower convergence. Figure 9a shows the NMAE plot against the learning rate of Adam optimizer using 512 batch size, it becomes evident that a value of 5 × 10^{−5} provides an ideal learning rate. Similarly, the Figure 9b shows the NMAE plot against the batch size for a learning rate of 5 × 10^{−5} ; it indicates that a batch size of 128 is optimal for this scenario.
Next, we focus on optimizing the number of layers used in the training network. This involves incorporating one deconvolutional layer with same shape as its preceding layer at positions A, B, or C as depicted in Figure 10. The results are tabulated in Table 3. The results indicate that adding a single layer at positions A and B leads to the lowest NMAE values. The effect of incorporating multiple layers at position A and B is shown in Figure 10, which indicates that adding eight such layers is optimal. The inclusion of dense layers after each input layer does not seem to have a significant impact on the training process.
After 1,000 epochs, the NMAEs of the optimized (NN I) on the first and second test datasets (10% of total data) are 6.52 × 10^{−4} and 4.8 × 10^{−3}, respectively. Furthermore, we investigated the robustness of the trained NN I on unseen data using the test cases presented in Table 2. Figure 11 displays the displacement predicted by NN I and the corresponding reference displacement (FEM) at t = 2.5 ms for each test case. Even though the test cases were unseen data during the training, it was observed that the predicted solution and FEM solution were in excellent agreement for the first two test cases in which the same direction of loading cases were seen during training. Based on the results of the first two test cases in Figure 11, NN I achieved good performance not only on loadings within the range it has been trained on, but also on loadings far outside the range it has been trained on. Based on the results of test cases 3 and 4, it can be deduced that the predicted solution using NN I is in good agreement with FEM solution only if similar directions of loading are present in the training set. From the aforementioned results, we can also conclude that the trained NN I model can generalize well to unseen values of loadings given the loading direction is present in the training set.
Effect of added layers at positions A, B and C.
Fig. 8 A schematic for the architecture of NN I. 
Fig. 9 Effect of learning rate and batch size on the learning. 
Fig. 10 Effect of NMAE and training time on the number of added layers at positions A&B. 
Fig. 11 Comparison between the reconstructed displacement using NN I of the PGANN framework vs. baseline model (FEM) for various test cases. The displacements shown are at the final time of each solution, whereas the NMAE value indicates the mean of all time steps. 
Fig. 12 Schematics of the architecture of NN II: Converting displacement input tensor to vector via convolutional layers and merging with other inputs, obtaining latent vector through dense layer, and generating outputs via dense and convolutional layers. 
4.2.2 NN II : reduction of u_{l}
The architecture utilized to train the first autoencoder is shown in Figure 12. The encoder component (φ_{u}) has the following structure. It begins with three input layers: ${\upsilon}_{\text{\Gamma}}^{t}$, ${w}_{\text{\Gamma}}^{t}$, and ${u}_{l}^{t}$. Following the input layer ${u}_{l}^{t}$, three convolution layers (Conv2D) with 16, 32, and 64 filters and 2 strides are incorporated to achieve an output shape of [5,5,64]. This output is then reshaped to form a dense layer with a size of 1,600. For the input layers ${\upsilon}_{\text{\Gamma}}^{t}$ and ${w}_{\text{\Gamma}}^{t}$, several dense layers are added. Subsequently, these layers are concatenated with the dense layer derived from the input ${u}_{l}^{t}$ to form the latent vector ${z}_{u}^{t}$. The decoder component ($\phi {\prime}_{u}$), has the same structure as the encoder but in reverse order.
In a similar manner to the previous section of training NN I, various hyperparameters such as learning rate, batch size, number of convolutional layers are optimized.
Using a learning rate of 5 × 10^{−5} and a batch size of 256 yielded optimal results for both datasets. By comparing the error and its corresponding training cost, we concluded that it is best not to add layers at positions A, B, and C. Additionally, we found that including 2 dense layers after ${\upsilon}_{\text{\Gamma}}^{t}$ and ${w}_{\text{\Gamma}}^{t}$ resulted in the best performance. One of the main hyperparameters to optimize in autoencoder is the size of latent vector ${z}_{u}^{t}$. Figure 13a shows the performance with respect to the size of the latent vector after 100 epochs. It can be observed that the model has the best performance when using 64 latent variables. We choose a latent vector of size 50 as an optimal value as there is no significant decrease in NMAE with increase in size of ${z}_{u}^{t}$. Figure 13b displays the latent vector of test case 1 with size 50. After 1,000 epochs, the NMAEs of the optimized model on the first and second test datasets are 6.38 × 10^{−4} and 4.38 × 10^{−3} respectively.
Figure 14 shows the performance of the network and the result of reconstructed displacement using 50 latent variables. From the first two test cases, it can be observed that, although the input is scaled by a large magnitude, the network is able to reconstruct local displacement accurately just using 50 latent variables. However, the reconstruction is poor in test case 3. Notably, in test case 4, where a combination loading similar to that in the training set is present, the reconstruction performance improves.
The use of skip connections in an autoencoder architecture can have a significant impact on its performance [34]. Skip connections allow for the flow of information from the input layer directly to the output layer, bypassing the bottleneck or compressed representation in the middle layers. This allows for the preservation of important information that may be lost during the compression process. Here it was observed that the use of skip connections did not result in a significant reduction of NMAEs.
Fig. 13 Training of NN II. 
Fig. 14 Comparison between the reconstructed displacement using NN II of PGANN framework vs. reference model (FEM) for various test cases. 
4.2.3 NN II : reduction of ${\overline{\epsilon}}_{p}$
The reference architecture of the proposed autoencoder, designed for reducing the dimension of effective plastic strain of local problem (${\overline{\epsilon}}_{p,l}$), consists of two main components: the encoder (${\phi}_{\overline{\epsilon}}{}_{{}_{p}}$) and the decoder ($\phi {\prime}_{{\overline{\epsilon}}_{p}}$), as shown in Figure 15. The ${\phi}_{\overline{\epsilon}}{}_{{}_{p}}$ takes ${\overline{\epsilon}}_{p,l}$ as input data and maps it to a lowerdimensional latent vector ${z}_{{\overline{\epsilon}}_{p}}^{t}$. The input layer has a size of 300, corresponding to each element of the local mesh. In this case, we opted to use only dense layers instead of convolutional layers, given the relatively small size of the input layer. The $\phi {\prime}_{\overline{\epsilon}}{}_{{}_{p}}$ which mirrors the architecture of ${\phi}_{\overline{\epsilon}}{}_{{}_{p}}$ aims to reconstruct ${\overline{\epsilon}}_{p,l}$ from ${z}_{{\overline{\epsilon}}_{p}}^{t}$.
The most crucial hyperparameter to optimize in this model is the size of the latent vector ${z}_{{\overline{\epsilon}}_{p}}^{t}$. Figure 16a displays the NMAE in relation to the size of the latent vector after 500 epochs with batch size 256 and learning rate 5 × 10^{−5}. Notably, the best performance is achieved when employing a latent vector of size 16. Additionally, the impact of adding n layers between two layers of the reference architecture, each having the same size as the layer before, is shown in Figure 16b. It is obvious that n = 1 is the ideal value. After 2,000 epochs, the NMAE of the optimized model on the first and second test datasets are 6.6 × 10^{−3} and 4.1 × 10^{−3} respectively. Table 4 shows the performance of the network and the result of reconstructed ${\overline{\epsilon}}_{p,l}$ using 12 latent variables.
NMAE on NN II training of ${\overline{\epsilon}}_{p,l}$ for various test cases.
Fig. 15 Schematic representation of the architecture of NN II: An autoencoder for converting the ${\overline{\epsilon}}_{p,l}$ input vector to a latent vector through dense layers, and its reconstruction. 
Fig. 16 Training of NN II. 
4.2.4 NN III
In this example, the goal of NN III is to predict the LV at ${t}_{g}^{n+1}$ given LVs at ${t}_{g}^{n3}$, ${t}_{g}^{n2}$, ${t}_{g}^{n1}$ and ${t}_{g}^{n}$. The encoderdecoder LSTM architecture takes two sequential inputs: the latent vector z_{u} and the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ from past instances of the local time scale. During the online phase, as the LVs are only available in the global time scale, a cubic interpolation method is employed to interpolate the LVs into the local time scale. This interpolation approach also enables the handling of varying time steps for the global problem during the online stage. In this specific implementation, the LSTM is fed with the previous 30 time steps, resulting in input shapes of [31,50] for z_{u} and [31,16] for ${z}_{{\overline{\epsilon}}_{p}}$, respectively.
The data required to train NN III are generated using the encoder (φ_{u} and ${\phi}_{{\overline{\epsilon}}_{p}}$) components of the trained NN II. The first dataset is created using the FEM simulation with load applied in a single direction. Each simulation consists of 10,800 time steps. To create a single inputoutput data pair, a sequence of 61 time steps is selected and first 31 are allotted to the input and the remaining is allotted to the output. By selecting various combinations, a total of 9,000 inputoutput pairs are generated from one simulation, resulting a 360,000 inputoutput pair from 40 simulations. A second dataset is also created with enriched loading cases, yielding 1,800,000 inputoutput pairs.
The number of LSTM units used in both the encoder and decoder parts of the NN III is one of the most crucial hyperparameters to optimize in this network. Because both input and output have the same shape, here we use the same number of LSTM units in both the encoder and decoder. This design allows for a seamless flow of information and promotes consistency between the encoding and decoding processes. Figure 17 shows the relationship between the number of LSTM units and two key metrics, namely the NMAE and training time for the first dataset over 500 epochs. To achieve a balance between computational time and NMAE, the optimal number of LSTM units is 512. After 1,000 epochs, the NMAE of the optimized NN III (with 512 LSTM units) on the first and second test datasets (10% of total data) is 4.56 × 10^{−7} and 1.10 × 10^{−6}, respectively.
In Figure 18, the predictions of the latent vector z_{u} using NN III are displayed. The NN model is trained with 1,000 epochs on the second dataset, and randomly selected data from the test dataset of the second dataset is used for plotting. The plot shows 5 out of 50 latent variables that are randomly selected. The prediction using 512 LSTM units in the NN model is reasonably accurate. However, the prediction with 756 LSTM units matches well with the ground truth, as shown in Figure 18c. It is important to note that the predictions of the latent variables with 128 and 1024 LSTM units are not displayed in the plot. This is because the predicted latent variable has a poor match with the ground truth for 128 LSTM units and an excellent match for 1024 LSTM units.
Figure 18 displays the prediction of the z_{u} using NN III trained with 1,000 epochs on second dataset with 256, 512, and 756 LSTM units of randomly selected data from the test dataset of the second dataset. Randomly selected 5 out of 50 latent variables are displayed in the plot. Using 512 LSTM units, the prediction is reasonably accurate. The prediction with 756 LSTM units matches well with ground truth, see Figure 18c. Note that the predictions of LV with 128 and 1024 LSTM units are not shown since the predicted LV has a poor and excellent match with the ground truth, respectively. Similarly, Figure 19 showcases the predictions of the variable ${z}_{{\overline{\epsilon}}_{p}}$ using NN III with the same configuration as before. Out of the 12 latent variables, four are chosen for plotting purposes that are varying with time. It is evident from the figure that using 756 LSTM units yields the best results.
We analyze the robustness of the trained NN III of 756 LSTM units on unseen data with inputs different from the two training datasets on which it was trained. We use the same test configurations as in Section 4.2.1. Table 5 summarizes the performance of the NN III for unseen test cases. The predicted evolution of LV given its past is shown in Figure 20. From Figure 20a20b, it can be clearly seen that the evolution of LV with time can be perfectly predicted using the proposed NN III even though data with such higher magnitudes is not available in the training set.
NMAE on NN III training for various test cases.
Fig. 17 Optimization of the number of LSTM units in NN III 
4.2.5 NN IV training and evaluation
The reference NN (first guess) model used to train NN IV is displayed in Figure 21. In this architecture, the input displacement layer ${u}_{l}^{t}$ of size [40 × 3] is encoded to a dense layer of size 1,600 using convolutional layers. The encoded layer is combined with the other input layers ${\upsilon}_{\text{\Gamma}}^{t}$ and ${w}_{t}^{\text{\Gamma}}$. The number of dense layers added between the output layer ${r}_{\text{\Gamma}}^{t}$, and combined layer is one of the hyperparameters we optimize in this section. After 1,000 iterations, the NMAE of the optimized NN IV on the first and second test datasets is 2.4 × 10^{−3} and 1.02 × 10^{−2}, respectively.
The trained model is verified on the same test configurations as in Section 4.2.1. The performance of the trained NN IV for various test configurations is tabulated in Table 6. Figure 22 shows the comparison of the reconstructed ${r}_{\text{\Gamma}}^{t}$ using NN IV and the actual ${r}_{\text{\Gamma}}^{t}$ from FEM. It is obvious from the reconstruction with first two test cases that the trained NN IV can capture the variations in the force very well for unseen large magnitudes. However, from the third test case the reconstruction using NN IV is poor when unseen direction of loading is present. It is inferred from last two test cases that the reconstruction can be improved by adding combination loading to the training set.
Fig. 18 Prediction of the latent vector z_{u} using NN III with 256, 512, and 756 LSTM units: given the LV of past time steps (0–30), the NN III predicts the LVs of next 30 time steps, with dashed and solid lines indicating predicted and ground truth, respectively. 
Fig. 19 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III with 256, 512, and 756 LSTM units. Dashed lines represent the predicted latent vectors, while solid lines represent the ground truth. 
5 Application to spotwelded plates
In this section, we extend the previously introduced techniques to an industrial application in three dimensions. The car bodyinwhite (BIW) is assembled by spotwelding numerous sheet metals together. Due to heat treatment during the spotwelding process, these welds might have complex properties. In order to accurately model these localized nonlinear behaviors, refined 3D elements are required near the spotwelded region. Due to the high computational cost, these refinements are avoided in a full vehicle crash simulation, and simplified models (1D elements) are traditionally used. Having numerous spotwelds in a car structure, replacing the fine 3D FEM model with datadriven ROM in localized spotwelded zones can significantly reduce the computational cost.
5.1 Data generation
Let us consider a spotweld domain as shown in Figure 23. The edges of bottom plates are clamped and the loading is applied on the edges of the top plate for 1.5 ms. The integration time step is set to a constant value Δt_{l} = 5.0 × 10^{−5} ms, calculated based on (2).
The training dataset was constructed using a total of 60 simulation cases, using various loading magnitudes and directions. The minimum and maximum magnitudes of loading applied in each direction are listed in Table 7.
These extreme values denote the approximate maximum and minimum load capacity of the material, respectively, before failure occurs. The first 22 simulations correspond to instances in which the loading is applied in a single direction, while the remaining simulation cases correspond to loading in two different directions simultaneously.
In this study, snapshots of the simulation were created by saving the global interface velocities, denoted as ${\upsilon}_{{\text{\Gamma}}_{g}}$, the global interface forces, denoted as ${r}_{{\text{\Gamma}}_{g}}$, and the displacement of the local domain, denoted as u_{l}, at selected time steps. Each simulation contains approximately 3,000 time steps. Unlike the previous example presented in Section 4, the quantities at the interface nodes are chosen at the global interface nodes. The model contains a total of 24 global interface nodes. Furthermore, the local displacement u_{l} was interpolated to a regular threedimensional cartesian grid of size 31 × 31 × 8. As in the previous example, the data is partitioned such that 80% of the total dataset is used for training the network, while the remaining 20% is divided between crossvalidation and test datasets, with each set comprising 10% of the total data.
5.2 Training and verification of PGANN
In this section, we detail the process of individually training each section of the proposed PGANN, as well as its verification on unseen data. In the context of a 3D structural problem, the input layer of the PGANN comprises of two components: ${\upsilon}_{{\text{\Gamma}}_{g}}$ and ${u}_{{\text{\Gamma}}_{g}}$, and the output is ${r}_{{\text{\Gamma}}_{g}}$. Unlike the previous example, the information on plasticity is not fed into the LSTM, this is mainly for reducing the complexity of the architecture. The training strategy employed for the current models is similar to the one outlined in Section 4. However, in contrast to the model discussed in the previous example, the current model uses higher dimensional convolutional layers, specifically Conv3D and Conv3DTranspose. The NMAE of each component of the trained PGANN after 2,000 epochs on the test cases is presented in Table 8.
Similar to the previous example, each hyperparameter is optimized. The most critical hyperparameter to optimize in this study is the size of the latent vector. The NMAE of the NN II after 600 epochs with respect to the size of the latent vector is shown in Figure 24a. It is observed that the NN II demonstrates optimal performance when using a latent vector of 50 variables. A representative latent vector of size 50 from one of the training cases is presented in Figure 24b. It is obvious that the majority of the components of the latent vector are timedependent.
To evaluate the effectiveness of the trained model, four additional test cases were generated with randomly selected magnitudes and direction that were not present in the training data set. The direction and magnitude of the applied displacement on the edges of the top boundary are reported in Table 9. Note that unlike the previous example, the magnitudes of the test cases have been limited to the range of magnitudes used in the training dataset to prevent failure of material.
In this study, we investigate the robustness of trained PGANN on unseen data using the test cases presented in Table 9. To evaluate the performance of the PGANNs, we compare the displacement predicted by the NN (NN I and NN II) with the reference displacement obtained from FEM. The results of this comparison is presented in Figure 25, where the displacement at the final time step t = 1.5 ms for each test case is displayed. The NMAEs corresponding to each case are shown in the bracket. Additionally, we also evaluate the prediction of the LVs using NN III, which was trained with 756 LSTM units. The results of this evaluation are displayed in Figure 26.
Furthermore, we compare the reconstructed ${r}_{{\text{\Gamma}}_{g}}^{t}$ using NN IV and the actual ${r}_{{\text{\Gamma}}_{g}}^{t}$ from FEM, as shown in Figure 27.
From the results of these evaluations, it can be observed that the solution from the Neural Network is in good agreement with its reference solution. This suggests that the PGANNs trained in this study are robust and can provide accurate predictions on unseen data.
Fig. 20 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III for various test cases. The dashed and solid lines indicate predicted and reference values, respectively. 
Fig. 21 Schematics of the NN IV architecture, converting input ${u}_{l}^{t}$ tensor to vector via convolutional layers and merging with other inputs, and obtaining the output ${t}_{\text{\Gamma}}^{t}$ through dense layers. 
Fig. 22 Prediction of interface forces using NN IV for various test cases: displaying only 4 out of 360 variables with dashed and solid lines indicating predicted and reference values, respectively. 
Fig. 23 Model of a spotwelded plate. 
Prediction error on NN IV training for various test cases.
Parameters of training data
NMAE of test dataset.
Test cases used for verification of PGANN architectures for 3D example.
Fig. 24 Training of NN II 
Fig. 25 Comparison between the reconstructed displacement; using NN I and NN II of PGANN framework vs. baseline model (FEM) for various test cases. The displacements shown are at the final time, whereas the NMAE values shown in brackets indicate the mean of all time steps. 
Fig. 26 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III for various test cases. The solid and dashed lines indicating predicted and reference values, respectively. 
Fig. 27 Prediction of interface forces using NN IV for various test cases: displaying only 4 out of 360 variables in different colours with dashed and solid lines indicating predicted and reference values, respectively 
6 Conclusion
The application of PGANNs to the metamodeling of local nonlinear structures was shown in this work. We developed this approach on a 2D plate with a hole as well as a 3D case with spotwelded plates undergoing fast deformation, representing nonlinear elastoplasticity problems. In the proposed network architecture, we introduced a pair oS displacrment and effective plastic strain layers in between input and output. We compared the results obtained from ΡGANN architecture to those of traditional FEM methods to demonstrate its ability to generate physically consistent and generalizable solutions; the network yields promising results even for unseen magnitude of data. Despite the success exhibited by the PGANN approach, we have found that it faces challenges when dealing with unseen type of loading when the similar loading dirertion is not present in the training set. The network architecture trained on single load cases is less accurate on problems with combination loading types. We found that the reconstruction can be improved by adding more complex loading data to the training set.
Future benchmarking with other model order reduction tools such as POD will provide valuable insights into the effectiveness and limitations of the PGANN approach. Looking ahead, future research will be needed to further improve the accuracy and performance of the PGANN approach, as well as to assess its industrial applicability. To achieve this, we plan to use design of experiment tools to generate parameters required for generating data. Another objective is to enrich the training set with various model parameters such as material and geometric parameters, including plate thickness and diameter of spotweld. Additionally, nonintrusive coupling of NNbased reduced model with explicit FEM solver is a work in progress. We also plan to investigate the use of PhysicsInformed Neural Networks to impose basic principles of physics on the metamodel, thereby improving its physical consistency and alleviating; training data requirement.
Acknowledgments
The authors gratefully acknowledge the support provided by CNRS, EPF, ALTAIR, and STELLANTIS for this research.
Appendix A
Datasets used to train/crossvalidate/test the plate with a hole example.
References
 A. Reille, V. Champaney, F. Daim, Y. Tourbier, N. Hascoet, D. Gonzalez, E. Cueto, J.L. Duval, F. Chinesta, Learning datadriven reduced elastic and inelastic models of spotwelded patches, Mech. Ind. 22, 32 (2021) [CrossRef] [EDP Sciences] [Google Scholar]
 P. Salvini, F. Vivio, V. Vullo, A spot weld finite element for structural modelling, Int. J. Fatigue 22, 645–656 (2000) [CrossRef] [Google Scholar]
 A. Rupp, K. Storzel, V. Grubisic, Computer aided dimensioning of spotwelded automotive structures, SAE Tech. Pap. 950711 (1995) [CrossRef] [Google Scholar]
 R. Courant, K. Friedrichs, H. Lewy, On the partial difference equations of mathematical physics, IBM J. Res. Dev. 11, 215–234 (1967) [CrossRef] [Google Scholar]
 J. Mandel, Balancing domain decomposition, Int. J. Numer. Methods Biomed. Eng. 9, 233–241 (1993) [Google Scholar]
 C. Farhat, F.X. Roux, A method of finite element tearing and interconnecting and its parallel solution algorithm, Int. J. Numer. Meth. Eng. 32 1205–1227 (1991) [CrossRef] [Google Scholar]
 P. Ladevèze, O. Loiseau, D. Dureisseix, A micro–macro and parallel computational strategy for highly heterogeneous structures, Int. J. Numer. Meth. Eng. 52, 121–138 (2001) [CrossRef] [Google Scholar]
 H. Ben Dhia, G. Rateau, The Arlequin method as a flexible engineering design tool, Int. J. Numer. Methods Eng. 62, 1442–1462 (2005) [CrossRef] [Google Scholar]
 L. Gendre, O. Allix, P. Gosselet, Nonintrusive and exact global/local techniques for structural problems with local plasticity, Comput Mech. 44, 233–245 (2009) [CrossRef] [MathSciNet] [Google Scholar]
 L. Gendre, O. Allix, P. Gosselet, A twoscale approximation of the Schur complement and its use for nonintrusive coupling, Int. J. Numer. Meth. Eng. 87, 889–905 (2011) [CrossRef] [Google Scholar]
 J.C. Passieux, J. Réthoré, A. Gravouil, M.C. Baietto, Local/global nonintrusive crack propagation simulation using a multigrid XFEM solver, Comput Mech. 56, 1381–1393 (2013) [CrossRef] [Google Scholar]
 M. Duval, J.C. Passieux, M. Salaün, et al., Nonintrusive coupling: recent advances and scalable nonlinear domain decomposition, Arch. Computat. Methods Eng. 23, 17–38 (2016) [CrossRef] [Google Scholar]
 G. Guguin, O. Allix, P. Gosselet, On the computation of plate assemblies using realistic 3D joint model: a nonintrusive approach, Adv. Model. Simul. Eng. Sci. 3 (2016) [Google Scholar]
 T. Chantrait, J. Rannou, A. Gravouil, Low intrusive coupling of implicit and explicit time integration schemes for structural dynamics: application to low energy impacts on composite structures, Finite Elem. Anal. Des. 86, 23–33 (2014) [CrossRef] [MathSciNet] [Google Scholar]
 O. Bettinotti, O. Allix, U. Perego, V. Oancea, B. Malherbe, Simulation of delamination under impact using a globallocal method in explicit dynamics, Finite Elem. Anal. Des. 125, 1–13 (2017) [CrossRef] [Google Scholar]
 F. Chinesta, A. Huerta, G. Rozza, K. Willcox, Model order reduction, in: E. Stein, R. de Borst, T. Hughes (Eds.), The Encyclopedia of Computational Mechanics, 2nd ed., John Wiley & Sons Ltd., 2015 [Google Scholar]
 G. Rozza, D.B.P. Huynh, A.T. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations, Arch. Computat. Methods Eng. 15, 229 (2008) [CrossRef] [Google Scholar]
 S.K. Kauwe, J. Graser, A. Vazquez, T.D. Sparks, Machine learning prediction of heat capacity for solid inorganics, Integr. Mater. Manuf. Innov. 7, 43–51 (2018) [CrossRef] [Google Scholar]
 P. Baldi, K. Bauer, C. Eng, P. Sadowski, D. Whiteson, Jet substructure classification in highenergy physics with deep neural networks, Phys. Rev. D 93, 9 (2016) [CrossRef] [Google Scholar]
 C. Tesche, C.N. De Cecco, S. Baumann, et al., Coronary CT angiography–derived fractional flow reserve: machine learning algorithm versus computational fluid dynamics modeling. Radiology 288, 64–72 (2018) [CrossRef] [PubMed] [Google Scholar]
 B. Alipanahi, A. Delong, M. Tweirauch, B.J. Frey, Predicting the sequence specificities of DNAand RNAbinding proteins by deep learning, Nat. Biotechnol. 33, 831–838 (2015) [CrossRef] [PubMed] [Google Scholar]
 J. Willard, X. Jia, S. Xu, et al., Integrating scientific knowledge with machine learning for engineering and environmental systems, ACM Comput. Surv. 55, 1–37 (2022) [Google Scholar]
 M. Raissi, P. Perdikaris, G.E. Karniadakis, Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Computat. Phys. 378, 686–707 (2019) [CrossRef] [Google Scholar]
 F. As’ad, P. Avery, C. Farhat, A mechanicsinformed artificial neural network approach in datadriven constitutive modelling, AIAA 20220100. AIAA SCITECH 2022 Forum, 2022 [Google Scholar]
 J.S. Read, X. Jia, J. Willard, A.P. Appling, et al., Processguided deep learning predictions of lake water temperature, Water Resour. Res. 55, 9173–9190 (2019) [CrossRef] [Google Scholar]
 P. Sturmfels, S. Rutherford, M. Angstadt, et al., A domain guided CNN architecture for predicting age from structural brain images, arXiv:1808.04362 (2018) [Google Scholar]
 A. Daw, R.Q. Thomas, C.C. Carey, J.S. Read, A.P. Appling, A. Karpatne, Physicsguided architecture (PGA) of neural networks for quantifying uncertainty in lake temperature modelling, arXiv:1911.02682 (2019) [Google Scholar]
 F. Hamilton, A.L. Lloyd, K.B. Flores, Hybrid modeling and prediction of dynamical systems, PLoS Computat. Biol. 13, pp. e1005655 (2017) [CrossRef] [Google Scholar]
 T. Belytschko, W.K. Liu, B. Moran, Nonlinear Finite Elements for Continua and Structures, John Wiley & Sons, Ltd, 2000 [Google Scholar]
 O. Bettinotti, O. Allix, U. Perego, V. Oancea, B. Malherbe, A fast weakly intrusive multiscale method in explicit dynamics, Int. J. Numer. Methods Eng. 100, 577–595 (2014) [CrossRef] [Google Scholar]
 M. Mozaffar, R. Bostanabad, W. Chen, K. Ehmann, J. Cao, M. Bessa, Deep learning predicts pathdependent plasticity, Proc. Natl. Acad. Sci. U.S.A. 116, 26414–26420 (2019) [CrossRef] [PubMed] [Google Scholar]
 M.B. Gorji, M. Mozaffar, J.N. Heidenreich, J. Cao, D. Mohr, On the potential of recurrent neural networks for modeling path dependent plasticity, J. Mech. Phys. Solids 143, 103972 (2020) [CrossRef] [MathSciNet] [Google Scholar]
 F. Masi, I. Stefanou, P. Vannucci, V. MaffiBerthier, Thermodynamicsbased artificial neural networks for constitutive modelling, J. Mech. Phys. Solids 147, 1–28 (2021) [Google Scholar]
 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z.B. Wojna, Rethinking the inception architecture for computer vision, CoRR, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2818–2826 2016. [Google Scholar]
Cite this article as: A. Pulikkathodi, E. Lacazedieu, L. Chamoin, J. P. Berro Ramirez, L. Rota, and M. Zarroug, A Neural NetworkBased DataDriven local modeling of spotwelded plates under impact, Mechanics & Industry 24, 34 (2023)
All Tables
All Figures
Fig. 1 Heterogeneous discretization in space of the reference problem. 

In the text 
Fig. 2 An illustration of the nonintrusive local/global coupling method. 

In the text 
Fig. 3 Structure of the reduced model: the NN inputs are interface velocity and interface nodal position at time instances ${t}_{g}^{n}$,${t}_{g}^{n1}$, ${t}_{g}^{n2}$,${t}_{g}^{n3}$ in global time scale denoted using dashed arrows, while the output is the interface forces at time ${t}_{g}^{n+1}$ denoted using solid arrow. 

In the text 
Fig. 4 A schematic for the architecture of the PGANN framework. 

In the text 
Fig. 5 Example of proposed RNN architecture used. The network is trained by taking the latent vectors from the past four global time steps and the following five latent vectors of local scale as output. 

In the text 
Fig. 6 Model of a plate with hole. The local domain, shown in yellow, is a refined region near a geometrical detail (a hole). 

In the text 
Fig. 7 Grid used to feed displacement to NN (green), and actual grid used by FEM solver (blue). 

In the text 
Fig. 8 A schematic for the architecture of NN I. 

In the text 
Fig. 9 Effect of learning rate and batch size on the learning. 

In the text 
Fig. 10 Effect of NMAE and training time on the number of added layers at positions A&B. 

In the text 
Fig. 11 Comparison between the reconstructed displacement using NN I of the PGANN framework vs. baseline model (FEM) for various test cases. The displacements shown are at the final time of each solution, whereas the NMAE value indicates the mean of all time steps. 

In the text 
Fig. 12 Schematics of the architecture of NN II: Converting displacement input tensor to vector via convolutional layers and merging with other inputs, obtaining latent vector through dense layer, and generating outputs via dense and convolutional layers. 

In the text 
Fig. 13 Training of NN II. 

In the text 
Fig. 14 Comparison between the reconstructed displacement using NN II of PGANN framework vs. reference model (FEM) for various test cases. 

In the text 
Fig. 15 Schematic representation of the architecture of NN II: An autoencoder for converting the ${\overline{\epsilon}}_{p,l}$ input vector to a latent vector through dense layers, and its reconstruction. 

In the text 
Fig. 16 Training of NN II. 

In the text 
Fig. 17 Optimization of the number of LSTM units in NN III 

In the text 
Fig. 18 Prediction of the latent vector z_{u} using NN III with 256, 512, and 756 LSTM units: given the LV of past time steps (0–30), the NN III predicts the LVs of next 30 time steps, with dashed and solid lines indicating predicted and ground truth, respectively. 

In the text 
Fig. 19 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III with 256, 512, and 756 LSTM units. Dashed lines represent the predicted latent vectors, while solid lines represent the ground truth. 

In the text 
Fig. 20 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III for various test cases. The dashed and solid lines indicate predicted and reference values, respectively. 

In the text 
Fig. 21 Schematics of the NN IV architecture, converting input ${u}_{l}^{t}$ tensor to vector via convolutional layers and merging with other inputs, and obtaining the output ${t}_{\text{\Gamma}}^{t}$ through dense layers. 

In the text 
Fig. 22 Prediction of interface forces using NN IV for various test cases: displaying only 4 out of 360 variables with dashed and solid lines indicating predicted and reference values, respectively. 

In the text 
Fig. 23 Model of a spotwelded plate. 

In the text 
Fig. 24 Training of NN II 

In the text 
Fig. 25 Comparison between the reconstructed displacement; using NN I and NN II of PGANN framework vs. baseline model (FEM) for various test cases. The displacements shown are at the final time, whereas the NMAE values shown in brackets indicate the mean of all time steps. 

In the text 
Fig. 26 Prediction of the latent vector ${z}_{{\overline{\epsilon}}_{p}}$ using NN III for various test cases. The solid and dashed lines indicating predicted and reference values, respectively. 

In the text 
Fig. 27 Prediction of interface forces using NN IV for various test cases: displaying only 4 out of 360 variables in different colours with dashed and solid lines indicating predicted and reference values, respectively 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.