AI-assisted reliability life prediction model for wafer-level packaging using the random forest method


 We applied an artificial intelligence (AI) algorithm, the random forest (RF) model, using finite element simulations to predict the reliability life of wafer-level packaging (WLP). Due to rapid growth and an increasingly fast cycle pace of integrated circuits, it is imperative to shorten the development time of electronic packaging. This study focuses on packaging reliability analysis and prediction. In recent years, package reliability analysis has been performed using finite element method simulations, which reduce the required number of accelerated thermal cyclic tests. Compared with conventional ball grid array-type packaging, WLP has become the mainstream packaging type due to its small form factor, batch-type manufacturing process and low cost. We applied the RF model, a machine learning algorithm, to predict the reliability life of WLP. The finite element procedure, theory and mesh size were validated by a set of experiments, and a large dataset was generated for AI training purposes through the finite element simulations. The RF method was built using Python®. A fast and robust reliability assessment AI model of WLP can be achieved once the AI training accuracy is located within the target range; the designer only needs to input the geometry of each WLP component to obtain the reliability life cycle. WLP structural optimization can thus be easily achieved. The AI model also significantly shortens the design cycle to meet current design demands.


INTRODUCTION
Electronic packaging plays an important role in the development of the semiconductor industry and rapid growth of integrated circuits. Wafer-level packaging (WLP) has become the mainstream packaging type in recent years due to its small form factor, batch manufacturing process and relatively low cost, compared to conventional ball grid array-type packaging. The majority of the semiconductor industry currently uses accelerated temperature cyclic tests (ATCTs) to evaluate packaging reliability. Most damage to the electronic package is due to a thermal expansion coefficient mismatch between the chip and substrate that causes excessive thermal stress/strain on the solder ball. The completion of a single set of ATCTs is time consuming and costly; thus, finite element method (FEM) simulations [2] have been used to minimize the number of ATCTs required to analyze the possible reliability of electronic packaging to reduce time and costs. However, simulation analysis requires a large amount of memory storage space and high-performance hardware, and the computation time significantly increases with increasing FEM model complexity. To avoid these shortcomings, we propose artificial intelligence (AI)-assisted design-on-simulation methodology for the reliability assessment of electronic packaging. We apply a machine learning algorithm, the random forest (RF) method, for the reliability life prediction of WLP. The real-time reliability assessment of electronic packaging can be achieved to meet timeto-market demand. Machine learning can be used to determine the correlation between the input and results of a dataset and build a final trained model to predict the outcome of any unknown.
Data processing using different AI algorithms [3] tends to differ, and choosing a suitable algorithm is the most important step in the initial stage of machine learning. The RF algorithm was introduced by Breiman [5], who turned a combination of classification trees [4] into RFs, randomized variables (columns) and data (rows) to generate multiple decision trees and ultimately summarized all of the decision tree results. RF is known for its multivariable predicting accuracy without excessive computational effort. Furthermore, RF is less prone to overfitting issues and performs relatively stable when there are missing values or non-equilibrium data. The RF model is known as one of the best algorithms available today [6].

MACHINE LE ARNING ALGORITHM
There are three types of machine learning: supervised learning, unsupervised learning and reinforcement learning. Because the input samples are labeled, supervised learning algorithms are considered to create an electronic package lifetime prediction model.
Supervised learning is divided into two categories: regression and classification. Regression is used to predict continuous data; classification is used to predict discrete data. Because the lifetime   values of packaging are continuous data, algorithms that perform good regressions are more suitable in this research.

ANN and random forest
Artificial neural networks (ANNs) were inspired by the biological neural networks that constitute animal brains. An ANN consists of three layers: an input layer, a hidden layer and an output layer, as shown in Fig. 1. An ANN is based on a collection of connected units or nodes called neurons. Each neuron can receive a signal from the input layer or another neuron, process it and then send the signal to other connected neurons. Neurons typically have weights and activation functions, and they adjust as learning progresses, as shown in Fig. 2.

Cart decision tree
A classification and regression tree (CART) is a binary tree that produces either regression or classification learning, as shown in Fig. 3. Each root node represents an input variable (x) and a split point on that variable. The leaf nodes of the tree contain an output variable (y), which is used to make a prediction. A CART follows a forward-growing and background-pruning process to arrive at the optimal tree. Different approaches are used to split the regression tree and classification tree. The common method for building a regression model is to minimize the least-squares error criterion to choose the optimal cutting point (j, s), where j is the feature type and s is the value of j. The input space is divided into M regions, R 1 , R 2 , . . . , R M , where each average value in a region represents the regression model output.
After the growing process, the output contained in a leaf node is pure. This purity allows the model to perform well on training samples but poorly judge unknown samples, a situation known as "overfitting," which can be fixed by the pruning process. There are generally two methods within the pruning process: prepruning and post-pruning. Pre-pruning is the setting of stopping criteria during the growing process to control the tree depth; post-pruning involves fully growing the tree in its entirety, and then trimming the tree nodes in a bottom-up approach. There are many techniques available for post-pruning, the most common of which is cost-complexity pruning, C α (T): where C 1 (T) is the training/learning error, α is a regularization parameter and |T| is a function that returns the set of leaves of tree T, respectively.
The new data or cross-validation is then used to calculate the least-squares error of each tree to select the best CART regression model.

Random forest
An RF [5] is a collection of decision trees {(h, k), k = 1, 2, . . . . . .}. These tree models are usually composed of an unpruned CART. A CART will randomly select one or more features from all of the input features to set as the split criteria so that each CART in the forest will be overfit for the chosen features. The RF growing process is as follows: 1. Training set preparation: The bagging method [7] is used to build a training subset for each tree. Because of bagging, there is a certain amount of repetition in the training subset that can avoid local optimal solutions. 2. Branching of a regression CART: The bagging sampling training subsets generate numerous unpruned CARTs. Feature branches for each tree are randomly selected (no repeat selection) to participate in the splitting, and the best cutting method is based on the least-squares error. The purpose of randomly selecting features is to reduce the correlation between each tree and improve the classification accuracy of each tree to improve the performance of the RFs. 3. Establishment of a forest: An RF is composed of numerous unpruned CARTs. As for the RF regression model, the prediction results are the average of the output values from each tree. 4. Model prediction efficiency: Breiman [5] stated that as the number of CARTs increases in the forest, the generalization error, PE * , of the random RFs decreases to a certain value, as shown in Eq. (2). The RF algorithm is therefore not easy to overfit with an increasing number of trees. The PE * represents the classification error rate of the trained model. The formulation of the PE * bound in Eq. (3) shows that reducing the correlation between each tree (ρ) or increasing the classification performance (S) of each tree can improve the prediction accuracy of RFs because of the reduced PE * bound.

SOLDER BALL RELIABILIT Y PREDICTION
We used the Coffin-Manson strain-based model [8] as the life prediction model. The Coffin-Manson strain-based model is widely used to estimate solder ball fatigue life, and the empirical Coffin-Manson strain model is used as in Eq. (4): where N f is the mean cycles to failure, C and η are empirical constants, which for SAC305 material are 0.235 and −1.75, respectively, ε

PURPOSE OF THE WLP FE M MODEL
Reliability testing data are not easy to obtain because of the timeconsuming procedure and high costs of TCT experiments and fabrication. To obtain a reliability assessment model that correctly predicts WLP with different dimensions, we built an FEM model of WLP using ANSYS®. The lifetime prediction of several FEM models was then validated with the TCT experimental results. A training dataset was then created following the verified finite element procedure, theory and mesh size. The RF model was coded using Python®.

Finite element modeling of WLP
The WLP FEM model was established in ANSYS®. The model was built in 2D with a diagonal plane (Fig. 4) following the symmetric characteristics of WLP, and with a plane strain assumption. As shown in Fig. 7, the components within the model include a silicon chip, stress buffer layer, RDL (redistribution layer) layer, UBM (under bump metallurgy), copper pad and solder ball. The solder ball profile was predicted using a surface   evolver [9], as shown in Fig. 5. We selected plane 182 as the solder ball element because it deals well with large deformation; plane 42 was used for the elements of other components and assumed to be linear elastic material. The temperature-dependent Young's modulus is shown in Fig. 6 [10]. All of the material properties are listed in Table 1, as well as the size parameter of the packaging list.
All of the WLP materials are assumed to be isotropic and linearly elastic except for the solder joint, which is considered to be temperature dependent and nonlinear with thermal loading ranging from −40 to 125°C, which follows the JEDEC standard (JESD22-A104-B, condition G) [11]. The boundary conditions are shown in Fig. 8.

Mesh size control
Critical mesh size control is a key point when using an FEM model to verify packaging reliability [12][13][14]. The mesh size was measured by the average distance between four nodes of the critical mesh size, located at the corner of the solder joints. In this study, the suitable mesh size for the Coffin-Manson strain method was 12.5 μm width and 7.5 μm height, as shown in Fig. 14.

Validation of FEM model
We used ANSYS® to create a 2D WLP model with solder ball material fixed at SAC305. After mesh size control, the FEM model presents the correct prediction of packaging lifetime, which means that this verified FEM model can be used to obtain a reasonable machine learning dataset. We did not apply creep analysis to assess plastic strain in the adopted finite element analysis because the thermal cycling ramp rates were very similar in our five test cases. The maximum equivalent plastic strain   after eight cycles was saturated and therefore used for the reliability assessment, as well as substituted into the Coffin-Manson equation to predict the fatigue life of the solder ball. The fatigue ductility coefficient (C) and fatigue ductility exponent (η) for SAC305 are 0.235 and 1.75, respectively [12], are shown in the following equation: Five test vehicles (TV1-TV5) were verified using the same solution procedure, theory and mesh size, and correspond to the five verified FEM models shown in Figs 9-13. The size    information of these five pieces is shown in Table 3 and a comparison of experimental and predicted values is listed in Table 2.
The maximum stress location of the simulations is the same as the experimental results (Fig. 15); thus, the FEM model can be trusted to make predictions.

Machine learning dataset
Many packaging geometry parameters (e.g. upper pad, lower pad, chip thickness, ball pitch, ball diameter, buffer layer thickness) are known to affect WLP lifetime, as shown in Fig. 16. We built a training dataset with reference to the TV2 geometry. The premise assumptions of the prediction are as follows: (1) electronic packaging process differences are not considered; (2) the package type is WLP; (3) the solder ball material is SAC305; and (4) only plastic strain is considered in the pack-aging. We chose the four most important impact factors as the feature vector in the training dataset (Table 4), including the upper pad diameter, lower pad diameter, chip thickness and stress buffer layer thickness. There were a total of 913 data points in the training dataset and 15 interpolated (without repeat) data points for testing, as shown in Table 5.

R ANDOM FOREST RELIABILIT Y PREDICTION MODEL 5.1 Random forest model
The aim of this study is to apply machine learning to WLP reliability analysis. Figure 17 shows a flow chart of the training procedure and data preprocessing, which includes filling in missing values and data standardization. In the process of building the RF model, parameter estimation was achieved using a grid search with cross-validation.

Data preprocessing
The creation of a useful training dataset is important for predictive modeling. Each algorithm is suitable for different data preprocessing methods. We used two data preprocessing methods, including filling the missing values and standardization. The RF algorithm is not easy to overfit due to the interior randomness [7]. However, RF uses feature vectors to continuously segment the training dataset; thus, more complete training datasets are required for good predictive performance. That is why the missing values must be filled, and standardization helps the RF to optimize the training and testing results: where x (i) is the value being standardized, μ x is the mean of the distribution and σ x is the standard deviation of the distribution. As shown in Eq. (7), a random variable is standardized by subtracting the distribution mean from the values being standardized, and then dividing this difference by the standard deviation distribution.

Model performance
The optimized output model shows that the RF model fits the training dataset well. Predicting the test dataset using the same RF model also yields good results, with a prediction error of <3%. The lifetime differences of the testing data are shown in Table 6.

Characteristics of the random forest 5.4.1 Model parameters
RF is a fast, flexible and simple algorithm. In this study, two parameters had to be tuned during the training process: n_estimators, which is the tree number in the forest, and the random initial values that decide the random selection situation. Figure 18 illustrates that the prediction accuracy of the testing data increases within increasing tree number and enters saturation when n_estimators is ∼50. The graph also shows that the RF algorithm does not overfit. Figure 19 shows that the random initial value is irregular; thus, a grid search is required to   obtain the best combination of parameters. In this case, the best combination of parameters is an n_estimators of 160 and a random initial value of 132.

Data preprocessing effect
Previous studies [5,15] have mentioned that RFs do not require data preprocessing. However, the results of this study confirm that two preprocessing methods are effective for RFs, namely, filling missing values and standardization.
Filling missing values can complement the lack of information in the training dataset to provide a more complete learning environment for machine learning, which improves the model prediction accuracy. In this study, there were 576 data points in the original training dataset. Poor training effects inferred that the number of feature vectors was insufficient. The model was therefore optimized by filling missing values, as shown in Table 7.
Data standardization leads to a consistent standard deviation of the feature vectors. In this study, the standardized training dataset reduced the local error of testing data, such as no. 6 test points in Table 8. Standardization has an optimized positive impact on the RF regression model. However, the RF algorithm still offers good predictive effects even if not standardized, which is one of the advantages of this approach.
The predictive performance of the RF algorithm can be attributed to three key factors: the number of training data points (filling of missing values), the number of internal decision trees and decision tree classification (number of feature vectors). In addition to the nature of the algorithm itself, data preprocessing can also affect the performance of the prediction model. Filling missing values is the primary consideration when using RFs to handle regression problems and is the main factor for changing the predictive model performance.   1  190  190  390  10  939  2  190  210  440  9  726  3  190  230  430  8  592  4  210  190  410  27  1265  5  210  210  360  29  1178  6  210  230  260  19  1008  7  230  190  230  30  1432  8  230  210  270  21  1319  9  230  230  420  11  866  10  200  220  370  20 Figure 17 Flow chart of the training process.

CONCLUSIONS
We used an RF algorithm to successfully predict the life of an electronic package. The following recommendations are provided for handling regression problems using the RF algorithm: 1. The training dataset must provide sufficient information for decision tree branching. 2. The number of decision trees in the forest should not be too few (>50 trees).

Figure 19
Relationship between random initial values and model prediction accuracy. 3. Standardization helps improve predictive performance and reduce local errors. 4. Overfitting does not readily occur with the RF algorithm.
The RF algorithm is logically intuitive, with few model parameters and easy operation. The model training time is not affected by the amount of data, but requires more training data.
FEM may produce different results due to discrepancies in domain knowledge, the maturity of the theoretical background and the mastery of a modeling technique, and can create a serious problem for packaging design. This study applies the RF algorithm and big empirical data to predict WLP electronic package lifetimes. This AI-assisted design methodology consistently obtains reliable results for WLP reliability prediction. The advantage of this study is that the computer can make fast and accurate predictions on its own, which reduces the cost of reliability tests and the hardware needed. The RF algorithm is a good choice, considering the numerous factors that affect packaging Table 8 Prediction results before and after standardization (with missing values filled).

Simulation
Without With lifetime (cycles) standardization standardization