Accelerated optimizations of an electromagnetic acoustic transducer with artificial neural networks as metamodels

Electromagnetic acoustic transducers (EMATs) are noncontact transducers generating ultrasonic waves directly in the conductive sample. Despite the advantages, their transduction efficiencies are relatively low, so it is imperative to build accurate multiphysics models of EMATs and optimize the structural parameters accordingly, using a suitable optimization algorithm. The optimizing process often involves a large number of runs of the computationally expensive numerical models, so metamodels as substitutes for the real numerical models are helpful for the optimizations. In this work the focus is on the artificial neural networks as the metamodels of an omnidirectional EMAT, including the multilayer feedforward networks trained with the basic and improved back propagation algorithms and the radial basis function networks with exact and nonexact interpolations. The developed neural-network programs are tested on an example problem. Then the model of an omnidirectional EMAT generating Lamb waves in a linearized steel plate is introduced, and various approaches to calculate the amplitudes of the displacement component waveforms are discussed. The neural-network metamodels are then built for the EMAT model and compared to the displacement component amplitude (or ratio of amplitudes) surface data on a discrete grid of the design variables as the reference, applying a multifrequency model with FFT (fast Fourier transform)/IFFT (inverse FFT) processing. Finally the two-objective optimization problem is formulated with one objective function minimizing the ratio of the amplitude of the S0-mode Lamb wave to that of the A0 mode, and the other objective function minimizing as the negative amplitude of the A0 mode. Pareto fronts in the criterion space are solved with the neural-network models and the total time consumption is greatly decreased. From the study it could be observed that the radial basis function network with exact interpolation has the best performance considering its accuracy of approximation and the time required to build the metamodel.


Introduction
Nondestructive testing & evaluation (NDT & E) is the field in which different methods based on quite different physical mechanisms are employed to check critical components in various structures for defects or other potential threats, without harming the components or the structures in any way.The testing methods diversely include electromagnetic methods, ultrasonic methods, X-rays and so on.Structural health monitoring (SHM) is a closely related field where NDT & E methods could be applied to collect information about parameters related to structural performance.Since large-scale structures occupy large areas, in SHM the monitoring system is often distributed in nature, and a number of sensors or transducers are used (Gao et al., 2016).In both NDT & E systems and SHM systems where ultrasonic waves are employed, the ultrasonic sensors or transducers are basic and at the same time critical components.Traditionally the piezoelectric transducers are used to generate and receive ultrasonic waves, while they rely on liquid coupling to transfer the mechanical energy into the components under investigation, and sometimes this requirement of coupling is not convenient.So some noncontact transduction methods have become popular in recent years for NDT & E and SHM applications based on ultrasonic waves.
Published by Copernicus Publications on behalf of the AMA Association for Sensor Technology.
S. Wang et al.: Optimizations of an EMAT with ANN metamodels Electromagnetic acoustic transducers (EMATs) are new transducers relying on electromagnetic effects to generate ultrasonic waves directly in the tested metal samples, according to Lorentz force or magnetostriction mechanism.EMATs are noncontact transducers, so they are promising in many situations where traditional piezoelectric transducers are not convenient, such as testing hot or moving samples.Composed of magnets and coils, they are also very flexible and capable of generating various kinds of waves such as the bulk waves and many types of guided waves (Thompson, 1973(Thompson, , 1979;;Edwards et al., 2006).
Despite the obvious advantages, the energy transduction efficiencies of EMATs are relatively low compared to their piezoelectric counterparts (Hirao and Ogi, 2003).Often the signal of EMATs is at the level of several microvolts.With this situation, one imperative problem is to study the mechanism of the transducers further and build accurate theoretical models, and then try to design them optimally based on the models.The operations of EMATs are multiphysical in nature, and sometimes even nonlinearity exists, so their modeling is always a difficult task.
Here we mainly focus on the numerical models.Transient analysis of a meander-coil EMAT placed on isotropic nonferromagnetic half-space, assuming uniform static magnetic field was conducted in Ludwig and Dai (1991).The controlling eddy-current equations were studied in detail in Jafari-Shapoorabadi et al. (2001) and it was argued that the previous work using the total current divided by the cross section area of the conductor as the source current density was equivalently applying the incomplete equation, and this meant ignoring the skin effect and proximity effect, while we proved the opposite in Wang et al. (2016a) via customizing the underlying integrodifferential or normal differential equations.The finite element method (FEM) package COMSOL was used to build the electromagnetic model of a meander EMAT, and the simulated Lorentz force was exported to another package, Abaqus, as the driving force to excite Lamb waves in Dhayalan and Balasubramaniam (2010).The above-mentioned modeling work only involves nonmagnetic materials.There is also some initial work on modeling EMATs used to test magnetic material, which we will not discuss further here.
With proper models of EMATs, the next step is to optimize their design so as to maximize the testing performance.The work on optimizations of EMATs is still rare.A parametric study of an EMAT composed of a racetrack coil was conducted in Mirkhani et al. (2004) by varying the ratio of the width of the magnet to the width of the coil, and it was found that if this ratio was set at 1.2, the amplitude of the ultrasonic beam would be improved.One design variable and one objective function were used in this optimization, accomplished only through observation of a set of curves corresponding to different design variables, instead of using a real optimization algorithm.A spiral coil EMAT was optimized using genetic algorithm optimization procedure in the global optimization toolbox of Matlab in Seher et al. (2014Seher et al. ( , 2015)).The ratio of the amplitude of the A0-mode Lamb waves to that of the S0 mode was selected as the objective function to be maximized, i.e., preferably generating the A0 mode.
In the evolutionary optimization algorithms, a complex numerical model is evaluated a great number of times, which is very time-consuming.In order to resolve this problem, an alternative mathematical model could be built to approximate the original computationally expensive numerical model, i.e., the evaluations of the original model could be replaced by the evaluations of the computationally less expensive approximate model, called a metamodel, surrogate model, etc. Implemented with polynomials, kriging, radial basis functions, and so on, the metamodels are already applied in electromagnetic device design and optimization (Sykulski, 2008), but seldom found in the field of ultrasonic NDT, or optimization of EMATs, so we explore one type of metamodel, the artificial neural networks here in the context of EMAT design and optimization.Obviously neural networks are used widely in many different situations, and being used as a metamodel is only one of their applications.
In this work we focus on the artificial neural networks as function approximators, or as metamodels of computationally expensive numerical models.Multilayer feedforward networks and radial basis function networks are both considered.Their performances for function approximations are tested on an example problem, with programs developed by the authors.Then modeling of an EMAT with COMSOL is discussed briefly, and methods to calculate the amplitudes of the displacement components of the generated Lamb waves are introduced.Finally the neural networks are applied as metamodels in the optimizations of the EMAT to accelerate the whole process.

Artificial neural networks as function approximators
There are many parameters in the finite-element model of an EMAT, such as the geometrical parameters, the magnitude and number of periods of the tone-burst waveform of the current excitation signal, the strength of the magnet, the material parameters, and so on.We can select a subset of the parameters as the inputs of a forward model.Similarly, with the FEM model, different quantities could be solved, including the spatial distributions of the magnetic, eddy current, force and displacement fields at some time instants; the waveforms of the displacement, stress and strain components at a point; some extracted features of the waveforms; etc.Some of the solved quantities can serve as the outputs of the forward model.Artificial neural networks are widely used in different fields.We are interested here in their abilities to approximate computationally expensive functions or the forward models of EMATs.Two different kinds of neural networks are considered.The first kind is the multilayer feedforward network, and the second kind is the radial basis function network.These two networks have different structures and different approaches to implementing approximations of functions.We review important concepts and algorithms of these networks and test them on a mathematical function.All the related code were implemented by the authors in Matlab.In the descriptions of the networks, vectors and matrices are used whenever possible to simplify the expressions.

Multilayer feedforward network
The structure of a two-layer feedforward neural network is shown in Fig. 1.There are N 1 neurons in the first (hidden) layer and N 2 neurons in the second (output) layer.Superscript is used to indicate the number of the layer.The vector x is the input column vector with N I elements, and y is the corresponding output column vector with N O = N 2 elements.In layer 1, W 1 is the N 1 × N I weight matrix, arranged such that the elements of the ith row correspond to the weights of the input elements x 1 , . .., x N I into the ith neuron.Then for the column vector W 1 x, each element is a linear combination of x 1 , . .., x N I .The term b 1 is the bias column vector with one element for each neuron.The term n 1 = W 1 x + b 1 is the net input to the transfer function f 1 of layer 1, of which the output a 1 = f 1 (n 1 ) is also the input to layer 2. Similar equations exist for layer 2, i.e., the final output is y = f 2 (n 2 ), in which the net input is n 2 = W 2 a 1 + b 2 .Bold font is used for f 1 and f 2 to stress that these functions accept a vector and generate a vector.Operating on one element of the net input vector, the scalar transfer function f can take the following typical forms (Hagan et al., 2014).
For the purpose of function approximation, we use the two-layer network in Fig. 1.The transfer function of the first layer could be the log-sigmoid function or the tangentsigmoid function, while the transfer function of the second layer is the linear function.N I is the number of input variables, and N 2 = N O is the number of output variables.Next we must choose N 1 , the number of neurons in the hidden layer, and apply some training algorithm to compute the best possible W 1 , b 1 , W 2 and b 2 .

Basic back propagation
A widely used training algorithm for the multilayer neural network is the back propagation algorithm.Suppose the set of known input vectors is x 1 , . .., x Q , and the corresponding set of known target vectors is t 1 , . .., t Q .In iteration k of the training process, one randomly selected input vector x(k) is presented to the network, so the corresponding error vector is e(k) = t(k) − y(k).The performance index to minimize for the current iteration is According to the steepest descendent algorithm, the updating equations for the weight matrix and bias vector of layer m is in which α is the learning rate.Here we define a 0 = x in Fig. 1. s m is the vector of sensitivity with element s m i = ∂P ∂n m i , i.e., derivative of the performance index P with respect to the ith net input of layer m.The sensitivity of every layer is solved according to the recurrence equation, in which, where f m is the scalar transfer function of layer m.From Eq. ( 4), the sensitivity vectors of previous layers could be calculated from the sensitivity vector of the last www.j-sens-sens-syst.net/6/269/2017/layer recurrently, from which the name "back propagation" derives.
As the starting point for the equation, the sensitivity vector for the last layer (layer M) is To evaluate the capability of the two-layer network, trained with the basic back propagation algorithm, to approximate functions, we test the developed program on the following problem (Branin function) with two inputs and one output.
The contour plot of this function is shown in Fig. 2a. Figure 2b is the contour plot of the approximation by the multilayer neural network trained with the basic back propagation algorithm.The training data are sampled on a 30 × 30 grid of the two input variables.The transfer function of the first layer is the log-sigmoid function.A total of 60 neurons are used in the hidden layer, and 4 × 10 6 iterations are applied.The learning rate is α = 0.005.The original function in Fig. 2a and the approximation in Fig. 2b are both generated on a 60 × 60 grid.

Improved back propagation
The basic back propagation algorithm is based on the steepest-descendent algorithm.Newton's method could be applied to achieve improvement on the basic algorithm, called the Levenberg-Marquardt (LM) back propagation (Hagan et al., 2014).All the training data are presented to the network at the same time.The error vector corresponding to the qth input x q is e q = t q − y q .The components of Q error vectors are arranged into one column vector, e = e 1,1 , . .., e N M ,1 , e 1,2 , . .., e N M ,Q T .
For element e i,j , subscript i represents position in the error vector e j .The total number of error components in e is The performance index to minimize is in which k is the number of iteration.
The unknown variables to solve, including the weights and the biases, are also arranged into one column vector: The total number of variables is The Jacobian matrix is defined as The updating equation for the unknown variables is in which µ is a small value added to the diagonal elements of J T J to ensure that it is invertible.
The most difficult part is calculating the J matrix.The elements of J could be solved with sensitivity back propagation just like in the basic back propagation algorithm.The processing here is more complex because there are N e sensitivity vectors to propagate, instead of one sensitivity vector per iteration in the basic algorithm.This training algorithm of the two-layer network is also implemented and tested on the example problem.
The result of approximation of the test function with the multilayer network trained with the LM algorithm is shown in Fig. 2c.A total of 60 neurons are used in the hidden layer, and 800 iterations are applied.

Radial basis function network
Another candidate neural network for function approximation is the radial basis function neural network (RBFNN).The basic structure of RBFNN is shown in Fig. 3.There are N C neurons in the first (radial basis function, hidden) layer, and N 2 neurons in the second (linear, output) layer.In the RBFNN, the matrix C no longer contains weights applied to the elements of the input vector.Instead, the distance (represented with || • ||) between the input vector and each row of the matrix C is computed to generate a N C × 1 column vector, which is then multiplied element by element (represented with .×)with the scale vector s.
The transfer function of the radial basis function layer is Gauss function, which is a bell-shaped function concentrated at n = 0. Then for the first RBFd layer, we have in which c i is the transposition of the ith row of C, i.e., The transfer function of the second layer is the linear function defined in Eq. ( 1), so element k of the output vector y is Since the transfer function f 1 is a local function around scalar 0, equation Eq. ( 16) states that the output component y k is a linear combination of N C local functions, each concentrated around c i and scaled by s i , and finally biased by b k .The distance operator || • || makes the output of the transfer function f 1 symmetric about center vector c i , from which the name "radial basis function" derives.
Training of the RBFNN is quite different from that of the general multilayer networks.Normally the process is divided into two stages.In the first stage, the centers (rows of C) and scales (elements of s) are selected according to some criterion.Generally the centers should be distributed evenly in the input space.The scales should be set so that the adjacent basis functions overlap somewhat with each other.In the programs developed for this work, a spread parameter is used to specify the scale as s value.In the second stage, C and s are fixed, and W and b are solved.

Exact interpolation
One common scheme is to select the centers of the radial basis functions as the input vectors.Then the number of centers N C will be equal to the number of the known input vectors Q.For now we focus on one output component, y k .If we let the function approximation pass through the known input/output data, assuming b = 0, in which the second subscript of y is the index of the input vectors.
In matrix form, in which the element of matrix In the system of equations Eq. ( 17), the number of unknown weights and the number of equations are equal, so the weights could be solved with matrix inversion as w k = R −1 y k .An exact interpolation is obtained, because the approximation passes all the known data pairs.If there are multiple outputs (N O > 1), the previous process could be repeated for every output component in the set {y 1 , y 2 , . .., y N O }.
The disadvantage of the approach of exact interpolation is that we need as many radial basis functions as the number of the input vectors, so there will be too many basis functions or neurons in the hidden layer when many input vectors are available.Another problem is that if the data contain noise, the exact interpolation will lead to overfitting.This is better demonstrated with a simple problem. Figure 4 shows the function y = sin(x), x ∈ [0, 2π ], its samples with noise (level of 0.1), exact interpolation with RBFNN and the nonexact interpolation.
The result of approximation of the test function with the RBF network with exact interpolation is shown in Fig. 2d.The input vectors are generated on the 30 × 30 grid, and they are also used as the centers of the radial basis functions.The spread parameter is 0.5, so that the value of the basis function drops to 1/2 of the maximum value at a distance of 0.5 from the center.
In the process of structural optimization, the geometry of the EMAT model is continuously changing, so that the mesh of the model or, more specifically, the number of the elements, the coordinates of the nodes, etc. are also changing.In this way a noise or random error exists in the optimization process.For this reason, it is necessary to check the nonexact interpolation and compare the result with that of the exact interpolation to see the effect of meshing noise.

Nonexact interpolation
With the disadvantages of the version of the RBF network using exact interpolation, one modification is to use fewer radial basis functions in the hidden layer.The centers of the basis functions are not required to be in the set of the known input vectors.
We focus on one output component y k .The performance index is in which e q = t k,q − y k,q .
The augmented vector of unknown variables is in which w k is transposition of the kth row of W as For input vector x q , the augmented input to the second layer is Next we define Then the vector of weights and bias of the second output layer is solved as in which The term ρ denotes a small value added to diagonal elements of U T U to prevent overfitting.The result of approximation of the test function with the RBF network with nonexact interpolation is shown in Fig. 2e.The input vectors are generated on the 30 × 30 grid.The centers of the radial basis functions are generated on the 30 × 30 grid.The spread parameter is 0.2.

Time-and frequency-domain modeling of an omnidirectional EMAT
We consider an omnidirectional EMAT composed of a spiral coil and a cylindrical permanent magnet, used to generate omnidirectional Lamb waves in a plate.The coil is composed of tightly wound copper wires, instead of forming a meander pattern, so both S0-mode and A0-mode Lamb waves will be generated.In this work, the aim is to preferably generate A0-mode Lamb waves, so we built the model bearing this preference in mind.Justification of this choice of wave mode could be found in Huthwaite et al. (2013), where it was explained that the thickness variations were more sensitive with the A0 wave mode.
In this section, the formulations of an axisymmetric EMAT are given first (further details can be found in Wang et al., 2016b), and then the numerical model is described.

Formulations of an axisymmetric EMAT
The basic equations for the electromagnetic field simulation in the study of EMATs are Maxwell's equations (Ida, 2007).
These equations are Faraday's law, Ampere-Maxwell's law, Gauss's law for electric fields and Gauss's law for magnetic fields, respectively.E is the electric field, B is the magnetic flux density, t is the time variable, H is the magnetic field strength, J is the current density, D is the electric flux density, and ρ is the charge density.The term ∇× denotes curl of vector, and ∇• is divergence of vector.Since the frequency in normal EMAT operation is no higher than several MHz, the term ∂D ∂t in Ampere-Maxwell's law could be neglected.To solve Maxwell's equations, another set of equations called the constitutive equations is needed: in which µ is the magnetic permeability, and is the dielectric constant.
Because the magnetic flux density is solenoidal, magnetic vector potential (MVP) A is introduced through The following Columbo gauge is applied to define the MVP completely: With the MVP, the equation describing the eddy-current phenomenon is in which ∇ 2 is the vector Laplacian operator, σ is the conductivity, and J s is the source current density.The eddy-current density, not written explicitly in Eq. ( 34), is J e = −σ ∂A ∂t .This equation holds where there is a conductor and a source current flows inside of this conductor.At the region without the source current, the J s term is dropped.Where there is no conductor, like in the air, the negative eddy-current density term σ ∂A ∂t (or −J e ) is also dropped.The magnetic permeability µ appears in the denominator in Eq. ( 34), which implies that the material is isotropic.Besides this, an additional assumption in this work is that the material is linear, so that µ is fixed (uniformly distributed) in every material of the model.www.j-sens-sens-syst.net/6/269/2017/For 2-D axisymmetric field simplification assuming J s is perpendicular to the r-z plane, we have J s = J sφ e φ , with J sφ as the φ component of vector J s , and e φ as the unit vector along the φ axis.We also have A = A φ e φ .The term ∇ 2 A (the vector Laplacian operator applied on vector A) has to be treated carefully because, unlike in the Cartesian coordinates, we have in cylindrical coordinates Instead, the φ component of in which the scalar Laplacian operator applied on scalar A φ is (37) In a 2-D axisymmetric model, only the A φ component is nonzero, and it is independent of φ: Then the φ component of ∇ 2 A is simplified as From now on, A φ will be written as A and J sφ be written as J s for simplicity.With this notation, the vector Eq. ( 34) is transformed to the φ component scalar equation: This equation is a diffusion equation describing the eddycurrent phenomenon in cylindrical coordinates.Now the eddy-current density term is J e = −σ ∂A ∂t .Next, it is crucial to correctly decide the source current density term J s .One inaccurate way is to define total current divided by cross section area of the source conductor as the source current density.This definition was borrowed from magnetostatic simulation of magnetic field generated by a steady electric current.In alternating current simulation, the externally applied total current i should be the integral of the sum of the source current density J s and the eddy-current density J e , i = S (J s + J e ) dS, (41) in which S is the cross section of the source conductor.Unlike in the 2-D planar model, J s is no longer uniformly distributed in the cross section of the source conductor in cylindrical coordinates; instead, the product of J s and r is a constant for a particular conductor, i.e., J s is inversely proportional to r: where C is a constant with a different value for each source conductor.Combined with Eq. ( 41), the constant C could be derived as so the original eddy-current equation Eq. ( 40) in cylindrical coordinates could now be written as For steady-state or frequency-domain analysis, the phasor notation is adopted: in which the dots on A and i indicate they are complex phasors, ω is the angular frequency, and j is the imaginary unit.This steady-state equation was proposed in Preis (1983), following the work on the similar steady-state integrodifferential equation in a 2-D planar model in Konrad (1981).
In Wang et al. (2016b), we proposed solving the above integrodifferential equations in time and frequency domains via customizing the underlying equations in COMSOL package.
The model of an EMAT is multiphysics in nature.Besides the above electromagnetic equations, the equations describing the generation and propagation of the ultrasonic waves in an elastic solid are as follows (Auld, 1990).
These equations are equation of motion, Hook's law and strain-displacement relation, respectively.T is the stress tensor, ρ is the density (symbol is used to differentiate it from charge density in Maxwell's equations), F is the body force, c is the stiffness tensor, S is the strain tensor, and u is the displacement vector.The symbol : is the double dot product of a fourth rank tensor c and a second rank tensor S, and ∇ s u is the symmetric part of the gradient of the vector u.
For homogenous and isotropic media, from the above elastodynamic equations, Navier's equation could be derived as Here, λ and µ are Lamé constants.The symbols in µ is used to differentiate it from the magnetic permeability.
The link between the electromagnetic equations and the elastodynamic equations is the Lorentz force defined as in which B is the total magnetic flux density composed of the static flux density B 0 of the bias magnet and the dynamic flux density B d generated by the excitation coil.For moderate magnitude exciting current, the B d term is usually very small compared to B 0 .

The omnidirectional EMAT model
The complete EMAT model is composed of one magnetostatic submodel describing the magnetic field of the permanent magnet, one eddy-current submodel analyzing the eddycurrent phenomenon accompanied by the skin and proximity effects, and one elastodynamic submodel for the simulation of generation and propagation of Lamb waves in the plate.The two electromagnetic submodels share one geometry containing the air, the inner section of the plate, the copper wires and the permanent magnet, as in Fig. 5.Note that in this geometry only the inner section of the full plate is modeled.The elastodynamic submodel has its own geometry, only containing the full plate.The Lorentz forces calculated from the two electromagnetic submodels are transferred to the elastodynamic submodel as the driving forces of Lamb waves.With these two geometries, the structure of the EMAT model is very clear, compared to some previous work.Additionally, we can use different meshing rules for these two geometries, according to the respective physics.This two-geometry treatment is valid because the Lorentz forces are confined in the region of the plate just under the transducer.
In Fig. 5, it is only necessary to consider the region where r > 0, since this is an axisymmetric model.The testing frequency is 50 kHz.As in Seher et al. (2014), the relative magnetic permeability of the steel plate is 160, i.e., it is a simplified linear material.The conductivity is 4.032 MS m −1 .The thickness of the plate is 10 mm.The remanent magnetic flux density of the magnet is set to 1.3 T along the positive direction of the z axis.R M is the radius of the magnet, and l M is the liftoff distance of the magnet from its base to the top of the coil.The two parameters of R M and l M will be used as the design variables in the optimizations based on the genetic algorithm and the neural-network metamodels, while all the other parameters are fixed.
R C is the average radius of the coil decided as in which λ is the wavelength of the desired Lamb wave mode.
For the EMAT on the steel plate, n is chosen to be 1, i.e., R C = λ 4 , similar to Seher et al. (2015).

Air
Permanent magnet The geometry of the electromagnetic submodels.This geometry is used for magnetostatic analysis and eddy-current analysis.
As stated previously, we want to selectively generate A0mode Lamb waves.For A0-mode Lamb waves in a steel plate of 10 mm thickness at 50 kHz, from the dispersion curves generated with a program we developed, the phase velocity is 1867.78m s −1 , and then the wavelength λ is 37.36 mm.
W C is the radial width of the coil (difference between the outer and inner radii of the coil).The coil is composed of two layers of copper wires with conductivity as 5.998 ×10 7 S m −1 .The wires form an array of 23 columns and 2 rows, as in Wilcox et al. (2005).The wires have rectangular cross sections.The radial width of each wire is 0.3 mm, and the radial gap between adjacent wires in the same layer is 0.1 mm.The axial height of the wire and the gap between the two layers are both 0.1 mm.We have chosen to model each wire individually, so that all the formulations in Sect.3.1 are applicable.
The geometry of the elastodynamic submodel simply contains a full plate with a radius of 1.2 m.Young's modulus is 200 × 10 9 Pa, Poisson's ratio is 0.33, and the density is 7850 kg m −3 .The observation point to record the displacement components in the simulations is located at 60 cm from the z axis, in the middle plane of the plate.From the displacement wave structures of Lamb waves with the specified frequency and plate thickness, at the middle plane of the plate, the displacement component u = u r only corresponds to the S0 mode, while the other component w = u z only corresponds to the A0 mode.
The boundaries of the submodels must be handled with care.In the geometry for the electromagnetic submodels, there is a layer of infinite elements at the air boundary sim-ulating an air region extending infinitely.In the geometry of the elastodynamic submodel, the top and bottom boundaries of the full plate are free boundaries without constraints or loads.For a transient time-domain analysis, the outer edge of the plate (at r = 1.2 m) is also a free boundary.If the full plate is long enough in the radial direction and the total time of simulation is limited to a proper value, the reflections from the end of the plate can be avoided.For a frequency-domain analysis, an extra perfectly matched layer (PML) must be added to the end of the plate so that the energy in the plate can dissipate.
To further increase the accuracy of the model, fillets are added to the sharp corners of the magnet and the wires, so that the geometrical singularities are removed, while at the same time the number of elements and hence the complexity of the model is also increased.

Evaluations of the amplitudes of the displacement components
In optimizations of the EMAT, the amplitudes of the displacement components will be used to calculate the objective functions, so we must decide how to evaluate the amplitudes.We explore three approaches as in the following subsections.

Amplitudes from time-domain simulations
A natural choice is to do time-domain simulations and record the time waveforms of the displacement components and then calculate the envelopes of the waveforms and solve the maximum values as the amplitudes.These time-dependent simulations require small time steps to ensure the convergence, and thus are very time-consuming.Since we will apply evolutionary optimization algorithms, a great number of evaluations of the objective functions will be needed, so this approach is not practical.Nevertheless the time waveforms will be simulated, serving as references to test the other approaches.In this work, the number of time steps is usually set as 6000, for the tone-burst excitation signal x(t) composed of 5 sinusoidal periods modulated with a Hanning window function.

Amplitudes from frequency-domain simulations and the FFT/IFFT processing
The second approach is to transform the excitation timedomain burst signal into its spectrum with FFT, input them into the frequency-domain model of the EMAT, transform the resulting modified spectrum back into time domain with IFFT to obtain the time waveforms of the displacement components, in which F represents the Fourier transform, F −1 is the inverse Fourier transform, x(t) is the input tone-burst signal, H u (ω, R M , l M ) is the frequency response for the displacement component u, and H w (ω, R M , l M ) is the frequency response for the displacement component w.R M and l M are included to stress that these frequency responses change with the design variables, while the input signal x(t) is fixed.From the converted time waveforms, the envelopes are calculated and the peaks of the envelopes are solved as the desired amplitudes.
The spectrum of the excitation signal is bell-shaped and concentrated around the center frequency, so we could focus on the part of spectrum which is bigger than some predefined threshold value.For example, this threshold might be defined as a percentage of the maximum value of the spectrum.Generally tens of or even several spectrum components are enough for this approach, so it is less time-consuming than the first approach where an excessive number of steps are required.As an example, u and w waveforms from the time-dependent simulation and the frequency-domain model with FFT/IFFT processing are compared in Fig. 6.The design variables are selected as R M = 8 mm and l M = 1 mm.The threshold to select the frequency components is 10 %, that is, only the frequency components higher than 10 % of the peak value of the spectrum are used, and others are discarded.With this threshold, 11 components around the center frequency are kept.From the figure, we can see that the time waveforms from the multifrequency model with FFT/IFFT processing are very close to the waveforms directly solved from the transient analysis.From various tests, we found that this is only possible when we model each wire of the coil individually.For other current sources, the waveforms from the two approaches are different.
One important condition to apply the frequency model is that the whole model must be linear.This is satisfied only when the excitation current is small so that the dynamic magnetic field generated by the coil could be ignored.

Amplitudes from frequency-domain simulations with one single frequency
Since a narrow banded signal with spectrum around its center frequency is used in the testing, another possibility is using one single frequency in the frequency-domain model, i.e., resorting to a steady-state solution of the EMAT model.The solved displacement components are complex phasors, so their absolute values are used as the amplitudes in which ω c is the center frequency in radian, u is the complex phasor of u, and ẇ is the complex phasor of w.This approach, as an approximation of the second approach, is valid because the spectrum of the excitation is bell-shaped and concentrated around the center frequency ω c , and the frequency responses H u and H w change slowly around the center frequency.This method of evaluation of the amplitudes of the displacement components is the fastest.However, it should be stressed that this method is an approximation and so is not suitable when higher accuracy is desired.

Metamodels of the EMAT implemented with neural networks
With the time-domain and frequency-domain EMAT models, as well as the discussed approaches to obtain the amplitudes of the displacement components at the observation point, we can already calculate the objective functions of the optimization problem and solve the problem with some optimization algorithm.The approach using the frequency model with a single frequency seems the best because it is the fastest.One problem is that the amplitudes from the singlefrequency approach are only approximations.Now with the metamodels implemented with the artificial neural networks, we can resort to the more time-consuming multifrequency approach with FFT/IFFT processing, i.e., solving some sample points with the multifrequency approach, then building metamodels with these sample points, and finally employing these metamodels in the optimization algorithm.
With the design variables as the inputs, we are concerned with two outputs.The first is A u A w , the ratio of the amplitude of u waveform to that of w waveform, and the second is A w , the amplitude of w waveform.A u A w and −A w will be used as the two objective functions to minimize in the next section.
Since there are only two inputs, we can draw the surfaces of the outputs with respect to the two inputs.Figure 7 shows the A u A w and A w surfaces solved with the multifrequency model with FFT/IFFT processing, on a 50 × 50 grid of the design variables.This figure and the corresponding set of data will be used as a reference to test the performances of the neural-network metamodels.On a computer installed with Intel Xeon CPU E5-2650 at 2.30 GHz and 128 GB RAM, running the Windows 10 operating system, the total time to solve the 2500 samples is 478 071.7 s.In the A u A w surface, there is a valley along the l M axis, indicating that A u A w is mainly determined by the R M value.Obviously the minimum value of A u A w should be in this valley.Next we will build the neural-network metamodels based on a training set generated on a 25 × 25 grid of the design variables, i.e., 625 uniformly distributed samples will be used as the known training data.The time to calculate the samples of the training set is around 33 h.
First we explore the multilayer network with the basic BP algorithm.A total of 200 neurons are used in the hidden layer, the number of iterations is 6 × 10 7 , and the learning rate is 0.02.The transfer function of the hidden layer is the log-sigmoid function.The transfer function of the output layer is the linear function.In one test run of the program, the training time for the A u A w function is 24 141.458 s, and the training time for the A w function is 22 258.787s.After the networks are built, the A u A w and A w functions are predicted with the networks on a 60 × 60 grid of the design variables, as shown in Fig. 8.The values of A u A w and A w are both scaled to [−1, 1] before approximation with the network and the predicted values are scaled back accordingly to be drawn in the figure.From the figure it could be observed that the A u A w surface is smoother than it should be at the valley, with Fig. 7 as the reference, although the number of iterations is already big.From various tests with different parameters, we found that the multilayer network with the basic BP algorithm is difficult to train.With the error of approximation, the performance of this network might not be satisfactory.
Then we test the multilayer network with the LM algorithm.200 neurons are used in the hidden layer, and the number of iterations is 2000.The transfer functions are the same as in the previous network trained with the BP algorithm.In one run of the program, the training time for the A u A w function is 1775.372s, and the training time for the A w function is 1763.376s.After the networks are built, the functions are predicted with the networks on a 60 × 60 grid of the design variables, as shown in Fig. 9.The values of A u A w and A w are also scaled in the same way.From the figure, we can see that the multilayer networks trained with the LM algorithm approximate the surfaces better, compared to the networks trained with the basic BP algorithm.
Next we try the RBFNN with exact interpolation.All the 625 samples are used as the centers of the basis functions.The spread parameter is selected as 0.1 (normalized).In one run of the program, the time to build the network for the A u A w function is 0.457 s, and the time to build the network for the A w function is 0.372 s. Figure 10 shows the A u A w and A w surfaces predicted with the built networks, on a 60 × 60 grid of the design variables.From this figure, it could be observed that the performance of the RBFNN with exact interpolation is excellent.One notable phenomenon is that the noise in the surfaces relating to changing meshes are not big enough to cause overfitting.Compared to the multilayer feedforward networks, the additional advantage of the RBFNN is that the RBFNN model could be built very fast through simple algebraic calculations, instead of time-consuming iterative trainings.
Finally we test the RBFNN with nonexact interpolation.All the 625 samples are used as the centers of the basis func-tions.The spread parameter is selected as 0.18 (normalized).In one run of the program, the time to build the network for the A u A w function is 0.436 s, and the time to build the network for the A w function is 0.447 s. Figure 11 shows the A u A w and A w surfaces predicted with the networks, on a 60×60 grid of the design variables.From this figure, we can see that A u A w is somewhat smoother than it should be in the valley, with reference to Fig. 7.So the performance with nonexact interpolation is not as good as that with exact interpolation.
The built network models will be used in the optimizations of the EMAT.

Multiobjective optimizations of the EMAT accelerated with the neural-network metamodels
In this section we consider the multiobjective optimizations of the omnidirectional EMAT.The design variables are the lift-off of the magnet l M and its radius R M .The goal of optimization is to selectively generate the A0-mode Lamb waves.
With the special structure of the EMAT, inevitably both the S0 mode and the A0 mode will be generated at the same time.So one objective function to minimize could be set as the ratio of the amplitude of the S0 mode to the amplitude of the A0 mode.Using the ratio as the objective function only tells one side of the story.We also want to maximize the A0 mode directly, or set the negative amplitude of the A0 mode as the other objective function to minimize.With these considerations, the problem of multiobjective optimization of the EMAT could be formulated as The design variables R M and l M have upper and lower bounds as R M ∈ [2.5, 15] mm and l M ∈ [1, 3] mm.
In multiobjective optimizations, it will be difficult to obtain one single solution, because the multiple objective functions often conflict with each other, so generally we could only obtain a set of solutions in which no solution is better than other solutions on every objective functions.This set of solutions are nondominated solutions, and they form a Pareto front in the criterion space where the first objective function is used as the x axis and the second objective function is used as the y axis, in the problem where there are two objective functions.
A popular method to solve the approximated Pareto front in the criterion space is the multiobjective genetic algorithm (MOGA; Deb, 2001).With the MOGA, the set of solutions on the Pareto front can be solved with one single run of the program.We have developed the MOGA program to implement the multiobjective optimizations of the EMAT discussed in this work.The implemented genetic operators include uniform mutation, nonuniform mutation, Gaussian mutation, whole arithmetic crossover, simple crossover and single arithmetic crossover.Real coding is employed.As described in Wang et al. (2017), the internal status of the MOGA program is tracked carefully to avoid unnecessary evaluations of the objective functions, so as to decrease the number of evaluations and the total time needed to accomplish the optimization.This is possible because the program is stochastic in nature, so when the value of an individual is not changed by the genetic operation, the corresponding objective function should not be computed.
The results of the multiobjective optimizations using the MOGA and the neural networks as the metamodels of the EMAT are shown in Fig. 12.The number of generations in the MOGA is 300, and the number of individuals in the population is 30.Total time of optimization and the number of evaluations of the objective functions for different networks and algorithms from the test runs of the MOGA program are summarized in Table 1.By comparison, if we optimize the www.j-sens-sens-syst.net/6/269/2017/ S. Wang et al.: Optimizations of an EMAT with ANN metamodels EMAT using the multifrequency approach without the proposed metamodels, nearly 9000 evaluations of the objective functions will cost around 20 days.One notable detail about Table 1 is that the number of evaluations of the objective functions is different for different networks and algorithms, even with the same number of generations and individuals in the genetic algorithm.This is understandable considering the mechanism we employed to reduce the number of evaluations of the objective functions.By tracking the internal status of the program and avoiding unnecessary evaluations, we incorporate additional randomness in the program, so the number of evaluations is not fixed.
The scatter plot of the values of the objective functions solved on the 25 × 25 discrete grid of the design variables is also drawn in Fig. 12 as comparison.Obviously the Pareto front should be approximately the tangent line at the lower left corner of the scatter plot.From the figure, it could be observed that the RBFNN with exact interpolation has the best performance.The multilayer network trained with the LM algorithm has a similar performance.The RBFNN with nonexact interpolation is acceptable, except near the leftmost of the Pareto front.The performance of the multilayer network trained with the basic back propagation algorithm is poor compared to the other networks, since the solved solutions already enter the region of the scatter points corresponding to the data on the discrete grid of the design variables.This is normal because with this network, the performance of approximation is poor with the selected training parameters.

Conclusions
In this work, we explored the artificial neural networks as metamodels of an EMAT model for the purpose of optimizations.
We are mostly interested in two kinds of neural networks: the multilayer feedforward networks (trained with the basic back propagation algorithm and the LM algorithm) and the radial basis function networks (with exact and nonexact interpolations).The basic structures and the training algorithms  of these networks are reviewed, and the programs we have developed for this work are tested on an example problem.
Then the formulations of an axisymmetric EMAT were given, and the model of an omnidirectional EMAT for the generation of Lamb waves on a linearized steel plate was introduced; only the Lorentz force was considered.The axisymmetric model was divided into two geometries and three submodels.Radial and axial displacement components, corresponding to the S0-mode and the A0-mode Lamb waves respectively, of an observation point at the middle plane of the plate were recorded to solve the two objective functions, one defined as the ratio of the amplitude of the S0 mode to that of the A0 mode, and the other defined as the negative amplitude of the A0 mode.
Next, three different approaches to calculate the amplitudes of the displacement components were discussed.The first one is solving the peaks of the envelopes of the time waveforms from time-domain simulations, which involves small time steps and is very time-consuming.The second one is transforming the input excitation signal into its spectrum and feeding the spectrum to the frequency model, then transforming the modified spectrum back to the time domain and solving the peaks of the envelopes of the time waveforms.The third one is using the magnitudes of the phasors of the displacement components directly, i.e., using a single frequency in the frequency model.The waveforms from the first and second approaches were compared.The reason why the third approach could be applied as an approximation when evaluating the amplitudes was also explained.
With the data on a discrete grid of the design variables as the reference, A u /A w and A w surfaces were solved for the EMAT predicted with the neural networks.It could be observed that with the multilayer network trained with the basic BP algorithm, the A u /A w is smoother than it should be in the valley.With the multilayer network trained with the LM algorithm, the predicted surfaces were close to the reference surfaces.With the RBFNN with exact interpolation, the performance of approximation is good while with nonexact interpolation, the value of A u /A w is again smoother in the valley.For the multilayer networks, much work has to be done to tune the parameters of the network, i.e., the number of neurons in the hidden layer, and the number of iterations.The total time required to obtain the approximation with the RBFNN with exact interpolation is also much less than that with the multilayer networks, especially when the basic BP algorithm is applied.
Finally the obtained neural-network models, as approximations of the objective functions with the multifrequency approach, are applied in the multiobjective optimizations.The first objective is the ratio of the amplitude of the u waveform (S0 mode) to that of the w waveform (A0 mode), and the second objective is the negative amplitude of the w waveform.The multiobjective optimizations are accomplished with a MOGA program we have developed specifically for optimizations of EMATs.With the neural-network metamodels, we can use the more time-consuming multifrequency model, instead of using the approximate singlefrequency model.From the results of the multiobjective optimizations, the performance of the RBFNN with exact interpolation is the best, next comes the multilayer network trained with the LM algorithm, then the RBFNN with nonexact interpolation.The performance of the multilayer network trained with the basic back propagation algorithm is the poorest.
Besides the work described above, the authors would like to suggest some topics to explore further, as potential future studies for interested readers.Some examples are as follows: 1.In this work implemented with our own code, the two objective functions are solved with their respective networks (two networks for two objective functions).What if we build all the neural networks with two outputs in the output layer directly?
2. In this work the parameters of the networks are mainly found by trial and error.Can we incorporate a systematic way to find the best parameters for the neural networks automatically?
3. Implementation of multiobjective optimizations of the EMATs with constraints.Anyway in applications, we will often meet with complex constraints, instead of the simple upper and lower bounds studied in this work.
All the listed topics demand modifications to the neuralnetwork and/or the MOGA programs.

Figure 2 .
Figure 2. Test function and the approximations.

Figure 4 .
Figure 4. Exact and nonexact interpolations of a sine signal with noise.

Figure 6 .
Figure 6.The u and w waveforms from time-dependent simulation and frequency-domain model with FFT/IFFT processing.

Figure 7 .Figure 8 .
Figure 7.A /A w and A surfaces from multifrequency model with FFT/IFFT processing.

Figure 9 .Figure 10 .
Figure 9.A u /A w and A w surfaces predicted with the MLP with LM algorithm.

Figure 11 .
Figure 11.A u /A w and A w surfaces predicted with the RBFNN with nonexact interpolation.

Figure 12 .
Figure 12.Pareto fronts solved with the MOGA and the neuralnetwork metamodels and scatter plot of the objective function values solved on the discrete grid of the design variables.
Considering this, we want to approximate the function or forward model and substitute the approximation for the real forward model in the optimizations, where a great number of evaluations of the function are required.
Then this forward model is used in the optimizations in an iterative way.The forward model is a black-box function mapping the inputs (design variables in the optimizations) to the outputs (objective functions in the optimizations).Evaluations of the black-box function are of-ten computationally expensive.

Table 1 .
Total time of optimization and number of evaluations of the objective functions.