快速图像风格迁移（一）——图像生成网络

2018-08-19

之前在【paper笔记】图像风格迁移博文中大概介绍了图像风格迁移的原理。博主准备使用一个系列来讲解复现的过程，使用的深度学习框架是TensorFlow，首先实现的是图像生成网络模块。

1.TensorFlow预备知识

1.1`tf.pad(tensor, paddings, mode='CONSTANT', constant_values=0)`

按照某种方式填充张量。我们以2维张量为例，在维度0前添加0个维度，维度0后添加1个维度；维度1前添加1个维度，维度1后添加2个维度。

x = tf.ones(shape=[3, 2])
# [[1. 1.]
#  [1. 1.]
#  [1. 1.]]
x_padded = tf.pad(x, [[0, 1], [1, 2]])
# [[0. 1. 1. 0. 0.]
#  [0. 1. 1. 0. 0.]
#  [0. 1. 1. 0. 0.]
#  [0. 0. 0. 0. 0.]]

1.2`tf.where(condition, x=None, y=None)`

类似于np.where()，根据condition选择x或者y。比如可以用来将张量中的nan转化为0.

import numpy as np
x = tf.constant([1, 2, np.nan])
y = tf.where(tf.equal(x, x), x, tf.zeros_like(x)) #nan和本身不相等
# [1, 2, 0]

1.3`tf.nn.moments(x, axes, shift=None, name=None, keep_dims=False)`

计算x的均值和方差。

x = tf.constant([1., 2., 3., 4., 5., 6.], shape=[3, 2])
# [[1. 2.]
#  [3. 4.]
#  [5. 6.]]
mean, var = tf.nn.moments(x, [0, 1])
# mean, var = 3.5, 2.9166667

1.4`tf.stack(values, axis=0)`

把张量列表按照给定轴堆叠，堆叠张量会比元素张量多1个维度。设元素张量的尺寸为(A, B, C)，如果axis == 0，那么堆叠张量尺寸为(N, A, B, C). 如果axis == 1，那么堆叠张量尺寸为(A, N, B, C) .

# 元素尺寸是(2,)，N=3
x = tf.constant([1, 4])
y = tf.constant([2, 5])
z = tf.constant([3, 6])
stack0 = tf.stack([x, y, z], axis=0)
# [[1 4]
#  [2 5]
#  [3 6]]
stack1 = tf.stack([x, y, z], axis=1)
# [[1 2 3]
#  [4 5 6]]

1.5`tf.slice(input_, begin, size)`

对张量进行切片操作。input_是待切片张量，begin是切片的起始位置，和input_的维度对应，size是切片的大小，三者的尺寸一致。

x = tf.constant(np.arange(9), shape=[3, 3])
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]
tf.slice(x, [1, 0], [2, 3])
# [[3 4 5]
#  [6 7 8]]

2.各层网络实现

2.1`conv2d`定义

为了较好的处理图像边界像素，使用REFLECT的方式进行像素填充。在神经网络中常常使用4维张量表示图像数据集，第一维表示第几张图像，第二维表示图像高度，第三维表示图像宽度，第四维表示图像的通道数，因此定义的卷积函数在填充时，只需填充第二维和第三维的数据。

def conv2d(x, input_depth, output_depth, ksize, strides, mode='REFLECT'):
    with tf.variable_scope('conv'):
        shape = [ksize, ksize, input_depth, output_depth] # 正方形卷积核
        weight = tf.Variable(tf.truncated_normal(shape, stddev=0.1))
        x_padded = tf.pad(x, [[0, 0], [int(ksize / 2), int(ksize / 2)], [int(ksize / 2), int(ksize / 2)], [0, 0]], mode=mode)
        
        return tf.nn.conv2d(x_padded, weight, strides=[1, strides, strides, 1], padding='VALID', name='conv')

2.2`instance_norm`定义

InstanceNorm是将一个样本进行标准化（均值为0，标准差为1），类似于BatchNorm，这么做可以加快网络的收敛速度，提高非线性拟合的能力，而且可以有效地提高图像风格迁移的质量。

def instance_norm(x):
    epsilon = 1e-9
    mean, var = tf.nn.moments(x, [1, 2], keep_dims=True)
    
    return tf.div(tf.subtract(x, mean), tf.sqrt(tf.add(var, epsilon)))

分母增加了一个微小的平滑因子，避免除以0.

2.3`relu`定义

就是简单的包装了一下tf.nn.relu()，当某个数值是nan的时候，置为0.

def relu(x):
    relu = tf.nn.relu(x)
    # 把nan转化为0， nan和nan比较结果为False
    return tf.where(tf.equal(relu, relu), relu, tf.zeros_like(relu))

2.4`residual`定义

残差层如下图所示。

残差网络是何凯明等人在2015年的ImageNet比赛中提出来的，其在数据集上准确率已经超越了人类。残差网络非常深，达到了152层之深，网络名称是ResNet，其中最重要的就是残差层。

如果权重变得特别的小，那么残差层学习的就是恒等函数($H(x)\approx x$)，这是比较容易的，而且网络的性能不受影响，因此ResNet不管层数有多少，训练误差都不会像没有残差块的网络那样越来越大。

def residual(x, input_depth, ksize, strides):
    with tf.variable_scope('residual'):
        conv1 = conv2d(x, input_depth, input_depth, ksize, strides)
        conv2 = conv2d(relu(conv1), input_depth, input_depth, ksize, strides)
        residual = x + conv2
        
        return residual

2.5`resize_conv2d`定义

resize_conv2d()实现的是反卷积层的功能，反卷积(deconv)的名字其实有点迷惑性，因为它本质就是卷积。卷积有一个特点，就是不在边界填充数值的话，张量的尺寸会越来越小，因此卷积神经网络的结构一般也是越来越深，越来越窄。但是有时候需要扩大张量的尺寸，比如把深窄的张量还原成3层但是较大的张量，即图像。方法就是把提前把张量扩得超级大，然后进行卷积，这样卷积后的张量相比于卷积前的张量尺寸还是减小的，但是相比于扩大之前的张量，还是变大了。

关于反卷积可以查看博文深度学习|反卷积/转置卷积的理解 transposed conv/deconv，讲得很不错。

def resize_conv2d(x, input_depth, output_depth, ksize, strides, traning):
    # 先放大，再卷积
    with tf.variable_scope('conv_transpose'):
        # 源程序此处设定训练时的变量是数值，预测时的变量是tensor
        height = x.get_shape()[1].value
        width = x.get_shape()[2].value
        
        new_height = height * strides * 2
        new_width = width * strides * 2
        
        x_resized = tf.image.resize_images(x, [new_height, new_width], tf.image.ResizeMethod.NEAREST_NEIGHBOR)
        
        return conv2d(x_resized, input_depth, output_depth, ksize, strides)

3.图像生成网络搭建

拟搭建的网络结构如下图所示。

简单而言，就是3个卷积层，5个残差层，3个反卷积层。

使用第二节定义的各层函数，按照上图的结构搭建图像生成网络。

def net(image, training):
    # 在图片的上下左右加一些边框，消除边缘效应
    image = tf.pad(image, [[0, 0], [10, 10], [10, 10], [0, 0]], mode='REFLECT')
    
    # 3个卷积层，深度变化：3-->32-->64-->128
    with tf.variable_scope('conv1'):
        conv1 = relu(instance_norm(conv2d(image, 3, 32, 9, 1)))
    with tf.variable_scope('conv2'):
        conv2 = relu(instance_norm(conv2d(conv1, 32, 64, 3, 2)))
    with tf.variable_scope('conv3'):
        conv3 = relu(instance_norm(conv2d(conv2, 64, 128, 3, 2)))
        
    # 5个残差块
    with tf.variable_scope('res1'):
        res1 = residual(conv3, 128, 3, 1)
    with tf.variable_scope('res2'):
        res2 = residual(res1, 128, 3, 1)
    with tf.variable_scope('res3'):
        res3 = residual(res2, 128, 3, 1)
    with tf.variable_scope('res4'):
        res4 = residual(res3, 128, 3, 1)
    with tf.variable_scope('res5'):
        res5 = residual(res4, 128, 3, 1)
        
    # 使用反卷积重新生成图像
    with tf.variable_scope('deconv1'):
        deconv1 = relu(instance_norm(resize_conv2d(res5, 128, 64, 3, 2, training)))
    with tf.variable_scope('deconv2'):
        deconv2 = relu(instance_norm(resize_conv2d(deconv1, 64, 32, 3, 2, training)))
    with tf.variable_scope('deconv3'):
        # 因为到这一步生成的图像大小已经和原图像相同，故不再进行反卷积
        deconv3 = tf.nn.tanh(instance_norm(conv2d(deconv2, 32, 3, 9, 1)))
    # deconv3的值域属于(-1, 1)，变换到[0, 255]
    y = (deconv3 + 1) * 127.5
    
    # 去除一开始为了防止边缘效应而加入的“边框”
    height = tf.shape(y)[1]
    width = tf.shape(y)[2]
    y = tf.slice(y, [0, 10, 10, 0], tf.stack([-1, height - 20, width - 20, -1]))

    return y

4.参考资料

[1] 何之源. 21个项目玩转深度学习[M]. 北京:电子工业出版社, 2018.
[2] jdefla. 深度学习 | 反卷积/转置卷积的理解 transposed conv/deconv[EB/OL]. https://blog.csdn.net/u014722627/article/details/60574260.
[3] moverzp. 【paper笔记】图像风格迁移[EB/OL]. http://moverzp.com/2018/08/10/%E3%80%90paper%E7%AC%94%E8%AE%B0%E3%80%91%E5%9B%BE%E5%83%8F%E9%A3%8E%E6%A0%BC%E8%BF%81%E7%A7%BB/.
[4] Gatys L A, Ecker A S, Bethge M. A Neural Algorithm of Artistic Style[J]. Computer Science, 2016.
[5] Johnson J, Alahi A, Li F F. Perceptual Losses for Real-Time Style Transfer and Super-Resolution[C]// European Conference on Computer Vision. Springer, Cham, 2016:694-711.
[6] https://tensorflow.google.cn/tutorials/

1.TensorFlow预备知识

1.1tf.pad(tensor, paddings, mode='CONSTANT', constant_values=0)

1.2tf.where(condition, x=None, y=None)

1.3tf.nn.moments(x, axes, shift=None, name=None, keep_dims=False)

1.4tf.stack(values, axis=0)

1.5tf.slice(input_, begin, size)