深度学习～生成式对抗神经网络GAN

出现背景(why?)

概念

出现背景(why?)

在分类任务中，训练机器学习和深度学习模块需要大量的真实世界数据，并且在某些情况下，获取足够数量的真实数据存在局限性，或者仅仅是时间和人力资源的投入也可能会受到限制。

概念

生成式对抗神经网络是由Goodfellow第一次提出的，现在被广泛地应用于计算机视觉CV，图像处理等许多领域。

IanGoodfellow,JeanPouget-Abadie,MehdiMirza,BingXu,DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. . Generative adversarial nets. InAdvances in neural information processing systems. 2672–2680.

GAN于提出并在最近几年变得越来越活跃，主要用于数据扩充，以解决如何通过隐含生成模型来生成人造自然外观样本以模拟真实世界数据的问题，从而可以增加未公开的训练数据样本数量[122]。

K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-gan: Generative adversarial networks for electroencephalograhic (eeg) brain signals,”arXiv preprint arXiv:1806.01875, .

对抗性攻击Adversarial Attacks
尽管深度学习模型具有出色的性能，但它们很容易受到对抗性攻击，其中精心设计的小扰动（可能很难被人眼或计算机程序检测到）被添加到良性示例中，从而误导了深度学习模型并导致性能急剧下降。这种现象最早是在在计算机视觉中发现的[159]，并很快引起了人们的广泛关注[160] [161] [162]。
Zhang and Wu [164] were the first to study adversarial attacks in EEG-based BCIs. They considered three different attack scenarios:
1) White-box attacks, where the attacker has access to all information of the target model, including its architecture and parameters;
2) Black-box attacks, where the attacker can observe the target model’s responses to inputs;
3) Gray-box attacks, where the attacker knows some but not all information about the target model, e.g., the training data that the target model is tuned on, instead of its architecture and parameters.
They showed that three popular CNN models in EEG-based BCIs, i.e., EEGNet [165], DeepCNN and ShallowCNN [166], can all be effectively attacked in all three scenarios.
reference:
[159] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv preprint arXiv:1312.6199, .
[164] X. Zhang and D. Wu, “On the vulnerability of cnn classifiers in eeg-based bcis,” IEEE Transactions on Neural Systems and Rehabili- tation Engineering, vol. 27, no. 5, pp. 814–825, .

模型架构

GAN包括两个在零和博弈框架(zero-sum game framework)中相互竞争的同步训练的神经网络，即生成器网络和鉴别器网络。生成网络学习从潜在空间映射到感兴趣的数据分布，而鉴别器网络则将生成器产生的候选与真实数据分布区分开。生成网络的训练目标是通过产生鉴别器标识为未合成的新颖候选（是真实数据分布的一部分）来提高鉴别器网络的错误率，即“欺骗”鉴别器网络[9]。该框架包括同时训练的两个模型：捕获数据分布的生成模型G和估计样本来自训练数据的概率的鉴别模型D。通过对抗过程，我们反复评估生成器模型以确定哪个模型是最适合实际数据分布的模型，从两个网络联合训练预测模型的目标函数是：

The generative network learns to map from a latent space to a data distribution of interest, while the discriminator network distinguishes candidates produced by the generator from the true data distribution. The generative network’s training objective is to increase the error rate of the discriminator network, i.e., “fool" the discriminator network by producing novel candidates that the discriminator identifies as not synthesized (are part of the true data distribution) [9]. Specifically, this framework includes two models simultaneously trained: a generative model Gthat captures the data distribution, and a discriminator modelD that estimates the probability that a sample came from the training data. Through an adversarial process, we iteratively evaluate the generative models to determine which one is the best for fitting real data distribution. The objective function for jointly training the prediction model from two networks is:

如图7所示。

生成器网络尝试通过生成逼真的标签来欺骗鉴别器。具体地说，鉴别器（θd）想要最大化目标，以使D（x）接近1（实数），而D（G（z））接近0（预测）。

Generator network:Try to fool the discriminator by generat- ing real-looking labels. Specifically, Discriminator (θd) wants to maximize the objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (prediction).

鉴别器网络尝试区分真实标签和预测标签。具体来说，生成器（θg）希望最小化目标，以使D（G（z））接近1（鉴别器被愚弄为认为生成的G（z）是实数）。

Discriminator network:Try to distinguish between the real and predicted label. Specifically, Generator (θg) wants to min- imize objectives such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real).

这两个神经网络旨在从经过预训练的生成器生成样本集合，并将样本用于其他功能（例如分类）。

正则化技术

为了解决过度拟合的问题，我们通过使用三种合适的正则化技术（即L1正则化，L2正则化和dropout技术）将附加信息引入网络。

In order to solve the overfitting problem, we introduce additional information to the network by using three suitable regularization techniques, i.e. L1 regularization, L2 regularization and dropout technique.

1）网络的L1正则化。众所周知，即使配置选项之间可能发生的交互数量是指数级的，但很大一部分潜在交互对软件系统的性能没有影响[16]。这意味着只有少量参数会对模型产生重大影响。换句话说，神经网络的参数可能是稀疏的。 L1正则化通过分配权重为零的不重要输入特征和权重为非零的有用特征来实现特征选择。因此，我们可以使用L1正则化来满足此条件。如等式中所示。在公式6中，L1正则化的想法是在参数上添加每个隐藏层。

1)L1 regularization of the network.It is known that even though the possible number of interactions among configuration options is exponential, a very large portion of potential interactions has no influence on the performance of software systems [16]. This means that only a small number of parameters have significant impact on the model. In other words, the parameters of the neural network could be sparse. L1 regularization implements feature selection by assigning insignificant input features with zero weight and useful features with a non zero weight. Hence, we can use L1 regularization to satisfy this condition. As show in Eq. 6, the idea of L1 regularization is to add every hidden layer on the parameters.

2）网络的L2正则化。尽管L2正则化不能稀疏产生，但它会迫使权重较小。 L2正则化可以通过为每个功能分配不同的权重来判断不同的功能是否会对输出产生不同的影响。可以说，它是用于对抗过度拟合的机器学习中最流行的技术。在我们的模型中，我们在参数的每个隐藏层中使用L2正则化。该公式在等式6中给出。

2)L2 regularization of the network.Although L2 regularization cannot produce sparsely, it forces the weights to be small. L2 regularization can judge whether the different features have the different impact on output through allocating different weights to every fea- ture. It is arguably the most popular technique in machine learning used to combat overfitting. In our model, we use L2 regularization in every hidden layer of the parameters. The formula is given in Eq. 6.

3）网络的dropout技术。具有大量参数的软件系统的深度神经网络确实是一个高计算程序，从而导致严重的过拟合问题。dropout也是解决此问题的一种技术。在训练过程中，它会从神经网络中随机丢弃一些单位（及其连接）。这样可以防止单元之间的相互适应过多。在训练期间，dropout从指数数量的不同“瘦”网络中抽取样本。在测试时，仅通过使用单个权重较小的未精简网络，就可以轻松近似出这些精简网络的平均预测结果。与其他正则化方法相比，这显着减少了过度拟合并带来了重大改进[29]。有了这个好处，我们在每个隐藏层中都应用了dropout技术。

应用这三种正则化技术并进行实验后，我们将在网络中选择最佳的正则化技术。根据我们的实验（在4.6节中描述），L2正则化在这三种正则化技术中表现最好。因此，我们在PERF-AL网络中选择L2正则化

3)Dropout technique of the network.The deep neural network of software systems with a large number of parameters is indeed a high computational program, resulting in serious overfitting problems. Dropout is also a technique for addressing this problem. It randomly drops units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks simply by using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods [29]. With this benefit, we apply dropout technique in every hidden layer.

After we apply these three regularization techniques and conduct experiments, we select the best regularization technique in our net- work. According to our experiments (described in Section 4.6), the L2 regularization performs best among these three regularization techniques. Therefore we choose L2 regularization in our PERF-AL network.

参考文献

Yangyang Shu, Yulei Sui, Hongyu Zhang, and Guandong Xu∗. . Perf- AL: Performance Prediction for Configurable Software through Adversarial Learning. InESEM ’20: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (ESEM ’20), October 8–9, , Bari, Italy.ACM, New York, NY, USA, 11 pages. /10. 1145/3382494.3410677