Tensorfow2.0—RNN

RNN cell

Input：Vector：$\mathbf{X}{t-1}$，$\mathbf{y}{t-1}$

a single layer, with a single neuron：

a single layer, with a sin‐ gle neuron

1 2	model = keras.models.Sequential([ keras.layers.SimpleRNN(1, input_shape=[None, 1]) ])

By default, the SimpleRNN layer uses the hyperbolic tangent activation function.

the initial state h(init) is set to 0

In a simple RNN, this output is also the new state h0.

return one output per time step, you must set return_sequences=True

RNN layer

Input跟RNN cell一样，但是参数增加了，因为每个RNN cell对于$\mathbf{X}$还有$\mathbf{y}$分别有一个参数，多个RNN cell作为一层使得，有多个这样的参数。

Memory Cell

New output , namely hidden state: $\mathbf{h{(t)}}=f(\mathbf{x{t}}, \mathbf{h_{(t-1)}})$

Deep RNNs

stack multiple layers of cells

model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]), keras.layers.SimpleRNN(20),
keras.layers.Dense(1)
])

Handling Long Sequences

Leading to : Unstable Gradients Problem

nonsaturating activation functions (e.g., ReLU) may not help as much here; in fact, they may actually lead the RNN to be even more unstable during training

Well, suppose Gradient Descent updates the weights in a way that increases the outputs slightly at the first time step. Because the same weights are used at every time step, the outputs at the second time step may also be slightly increased, and those at the third, and so on until the outputs explode—and a nonsaturating acti‐ vation function does not prevent that.

Solution:

Layer Normalization : it is very similar to Batch Normalization, but instead of normalizing across the batch dimension, it normalizes across the features dimension.
Before activation

A cell must also have a state_size attribute and an output_size attribute.

class LNSimpleRNNCell(keras.layers.Layer):
  def __init__(self, units, activation="tanh", **kwargs):
  super().__init__(**kwargs)
  self.state_size = units
  self.output_size = units
  self.simple_rnn_cell = keras.layers.SimpleRNNCell(units,
  activation=None) self.layer_norm = keras.layers.LayerNormalization()
  self.activation = keras.activations.get(activation) 
  def call(self, inputs, states):
   #new_states[0] equal to outputs
	outputs, new_states = self.simple_rnn_cell(inputs, states) 
  norm_outputs = self.activation(self.layer_norm(outputs)) return norm_outputs, [norm_outputs]

model = keras.models.Sequential([ keras.layers.RNN(LNSimpleRNNCell(20),
                                                   return_sequences=True, 
                                                   input_shape=[None, 1]),
                                 keras.layers.RNN(LNSimpleRNNCell(20),
                                                  return_sequences=True),
                                 keras.layers.TimeDistributed(keras.layers.Dense(10)) ])

all recurrent layers (except for keras.layers.RNN) and all cells provided by Keras have a dropout hyperparameter and a recurrent_dropout hyperparameter: the former defines the dropout rate to apply to the inputs (at each time step), and the latter defines the dropout rate for the hidden states

LSTM

its state is split into two vectors: h(t)（short-term state） and c(t) （long-term state）

model = keras.models.Sequential([
  keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]),
  keras.layers.LSTM(20, return_sequences=True),
  keras.layers.TimeDistributed(keras.layers.Dense(10))
])

GRU cells

WaveNet

2021-03 更新

TF1

循环神经网络

输入X的长度：

X: Tensor(“embedding_lookup/Identity:0”, shape=(1, 75, 100), dtype=float32) [批量大小，点击序列长度，每个点击的Embedding长度]

with tf.variable_scope("bilstm", reuse=reuse):
  forward_output, _ = tf.nn.dynamic_rnn(
    tf.contrib.rnn.LSTMCell(self.hidden_size[0],
                            initializer=self.initializer,
                            reuse=tf.AUTO_REUSE),
    X,
    dtype=tf.float32,
    sequence_length=length, # length = 75 , 也就是点击序列的长度
    scope="RNN_forward")

https://zhuanlan.zhihu.com/p/43041436

LSTMCell：

init (

num_units,

use_peepholes=False,

cell_clip=None,

initializer=None,

num_proj=None,

proj_clip=None,

num_unit_shards=None,

num_proj_shards=None,

forget_bias=1.0,

state_is_tuple=True,

activation=None,

reuse=None,

name=None,

dtype=None)

参数说明：

num_units:LSTM cell中的单元数量，即隐藏层神经元数量。
use_peepholes:布尔类型，设置为True则能够使用peephole连接
cell_clip:可选参数，float类型，如果提供，则在单元输出激活之前，通过该值裁剪单元状态。
Initializer:可选参数，用于权重和投影矩阵的初始化器。
num_proj:可选参数，int类型，投影矩阵的输出维数，如果为None，则不执行投影。
pro_clip:可选参数，float型，如果提供了num_proj>0和proj_clip，则投影值将元素裁剪到[-proj_clip,proj_clip]范围。
num_unit_shards:弃用。
num_proj_shards:弃用。
forget_bias:float类型，偏置增加了忘记门。从CudnnLSTM训练的检查点(checkpoin)恢复时，必须手动设置为0.0。
state_is_tuple:如果为True，则接受和返回的状态是c_state和m_state的2-tuple；如果为False，则他们沿着列轴连接。后一种即将被弃用。
activation:内部状态的激活函数。默认为tanh
reuse:布尔类型，描述是否在现有范围中重用变量。如果不为True，并且现有范围已经具有给定变量，则会引发错误。
name:String类型，层的名称。具有相同名称的层将共享权重，但为了避免错误，在这种情况下需要reuse=True.
dtype:该层默认的数据类型。默认值为None表示使用第一个输入的类型。在call之前build被调用则需要该参数。
————————————————
版权声明：本文为CSDN博主「大雄没有叮当猫」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/u013230189/article/details/82811066

import tensorflow as tf

batch_size = 1

embedding_dim = 300

inputs = tf.Variable(tf.random_normal([batch_size, embedding_dim]))

previous_state = (tf.Variable(tf.random_normal([batch_size, 100])), tf.Variable(tf.random_normal([batch_size, 100])))

lstmcell = tf.nn.rnn_cell.LSTMCell(100)

outputs, (h_state, c_state) = lstmcell(inputs, previous_state)


sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(outputs))
print(outputs.shape)  # (10, 128)

print(h_state.shape)  # (10, 128)

print(c_state.shape)  # (10, 128)


[[-0.03908409 -0.15413918 -0.5543078  -0.16429745  0.15187332  0.22674957
   0.0491709   0.05772216 -0.44460103 -0.4685788  -0.20345497 -0.3048153
   0.08147392 -0.3003875   0.1934505  -0.2632878  -0.4045717  -0.27238455
   0.5274059   0.3149494  -0.04591073  0.2029352  -0.7432734   0.24162863
   0.02106089 -0.05926758 -0.38088006  0.00200358 -0.17426118  0.02138741
   0.13410263 -0.7480866   0.59177715 -0.16158845  0.10526363 -0.43394142
  -0.11014693 -0.02479873  0.12916292  0.3426005   0.3468578   0.03081424
  -0.5923045  -0.73410743 -0.3768449   0.18405321 -0.35003117 -0.04348066
   0.37911254 -0.35261196 -0.21207377  0.5164869   0.09950166 -0.02072151
  -0.32580587 -0.20204493  0.04182163 -0.551953    0.5776422  -0.15258075
  -0.19567099 -0.46144682  0.10801785  0.2929367   0.3800717  -0.10385328
  -0.23831426 -0.27831694 -0.26867956 -0.52392566 -0.35053068 -0.12362379
  -0.47640064  0.29312813 -0.63410735  0.27830732 -0.00418854 -0.06247476
   0.38740724  0.12090401 -0.34354135  0.26057196 -0.3492871   0.28602415
  -0.32755297 -0.48350778  0.03378379 -0.45831716 -0.33049983  0.5797502
  -0.57983845 -0.5102703  -0.05812724 -0.06680895  0.10354684 -0.22407153
  -0.00116582 -0.17466179 -0.24506229  0.40669897]]
(1, 100)
(1, 100)
(1, 100)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

dynamic RNN：

tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

具体解释：https://zhuanlan.zhihu.com/p/43041436

#coding=utf-8
import tensorflow as tf
import numpy as np
# 创建输入数据
X = np.random.randn(2, 10, 8)   # batch-size, 序列最大有效长度，每个序列的embedding大小

# 第二个example长度为6
X[1,6:] = 0

print("第二个样例:", X[1])
X_lengths = [10, 6]  # 第一个序列的有效长度为10，第二个序列的有效长度是6，超过6步的outputs，是直接被设置成0了，而last_states将7-10步的输出重复第6步的输出。可见节省了不少的计算开销

cell = tf.contrib.rnn.BasicLSTMCell(num_units=64, state_is_tuple=True) # 输出最后一维的大小等于num_units

outputs, last_states = tf.nn.dynamic_rnn(
    cell=cell,
    dtype=tf.float64,
    sequence_length=X_lengths,
    inputs=X)

result = tf.contrib.learn.run_n(
    {"outputs": outputs, "last_states": last_states},
    n=1,
    feed_dict=None)


np.set_printoptions(suppress=True)

print ("outputs shape: ", result[0]['outputs'].shape)
print("第二个样例的输出: ", result[0]['outputs'][1].shape)
for it in result[0]['outputs'][1]:
    print(it)
print ("last_states: ", result[0]['last_states'], result[0]['last_states'])


assert result[0]["outputs"].shape == (2, 10, 64)

# 第二个example中的outputs超过6步(7-10步)的值应该为0
assert (result[0]["outputs"][1,7,:] == np.zeros(cell.output_size)).all()

https://blog.csdn.net/u010223750/article/details/71079036

RNN相关

1、完全图解RNN、RNN变体、Seq2Seq、Attention机制 https://zhuanlan.zhihu.com/p/28054589

TF2

keras.layers.RNN：

>@deprecation.deprecated(
   None,
   "Please use `keras.layers.RNN(cell)`, which is equivalent to this API")
>@tf_export(v1=["nn.dynamic_rnn"])
>@dispatch.add_dispatch_support
>def dynamic_rnn(cell,
               inputs,
               sequence_length=None,
               initial_state=None,
               dtype=None,
               parallel_iterations=None,
               swap_memory=False,
               time_major=False,
               scope=None):

tf.keras.layers.LSTMCell

tf.keras.layers.LSTMCell(
    units, activation='tanh', recurrent_activation='sigmoid',
    use_bias=True, kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros', unit_forget_bias=True,
    kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
    kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
    dropout=0.0, recurrent_dropout=0.0, **kwargs
)

Arguments
`units`	Positive integer, dimensionality of the output space.
`activation`	Activation function to use. Default: hyperbolic tangent (`tanh`). If you pass `None`, no activation is applied (ie. “linear” activation: `a(x) = x`).
`recurrent_activation`	Activation function to use for the recurrent step. Default: sigmoid (`sigmoid`). If you pass `None`, no activation is applied (ie. “linear” activation: `a(x) = x`).
`use_bias`	Boolean, (default `True`), whether the layer uses a bias vector.
`kernel_initializer`	Initializer for the `kernel` weights matrix, used for the linear transformation of the inputs. Default: `glorot_uniform`.
`recurrent_initializer`	Initializer for the `recurrent_kernel` weights matrix, used for the linear transformation of the recurrent state. Default: `orthogonal`.
`bias_initializer`	Initializer for the bias vector. Default: `zeros`.
`unit_forget_bias`	Boolean (default `True`). If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force `bias_initializer="zeros"`. This is recommended in Jozefowicz et al.
`kernel_regularizer`	Regularizer function applied to the `kernel` weights matrix. Default: `None`.
`recurrent_regularizer`	Regularizer function applied to the `recurrent_kernel` weights matrix. Default: `None`.
`bias_regularizer`	Regularizer function applied to the bias vector. Default: `None`.
`kernel_constraint`	Constraint function applied to the `kernel` weights matrix. Default: `None`.
`recurrent_constraint`	Constraint function applied to the `recurrent_kernel` weights matrix. Default: `None`.
`bias_constraint`	Constraint function applied to the bias vector. Default: `None`.
`dropout`	Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0.
`recurrent_dropout`	Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0.

Call arguments:

inputs: A 2D tensor, with shape of [batch, feature].
states: List of 2 tensors that corresponding to the cell’s units. Both of them have shape [batch, units], the first tensor is the memory state from previous time step, the second tensor is the carry state from previous time step. For timestep 0, the initial state provided by user will be feed to cell.
training: Python boolean indicating whether the layer should behave in training mode or in inference mode. Only relevant when dropout or recurrent_dropout is used.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell

RNN

tf.keras.layers.RNN(
    cell, return_sequences=False, return_state=False, go_backwards=False,
    stateful=False, unroll=False, time_major=False, **kwargs
)

Arguments
`cell`	A RNN cell instance or a list of RNN cell instances. A RNN cell is a class that has:A `call(input_at_t, states_at_t)` method, returning `(output_at_t, states_at_t_plus_1)`. The call method of the cell can also take the optional argument `constants`, see section “Note on passing external constants” below.A `state_size` attribute. This can be a single integer (single state) in which case it is the size of the recurrent state. This can also be a list/tuple of integers (one size per state). The `state_size` can also be TensorShape or tuple/list of TensorShape, to represent high dimension state.A `output_size` attribute. This can be a single integer or a TensorShape, which represent the shape of the output. For backward compatible reason, if this attribute is not available for the cell, the value will be inferred by the first element of the `state_size`.A `get_initial_state(inputs=None, batch_size=None, dtype=None)` method that creates a tensor meant to be fed to `call()` as the initial state, if the user didn’t specify any initial state via other means. The returned initial state should have a shape of [batch_size, cell.state_size]. The cell might choose to create a tensor full of zeros, or full of other values based on the cell’s implementation. `inputs` is the input tensor to the RNN layer, which should contain the batch size as its shape[0], and also dtype. Note that the shape[0] might be `None` during the graph construction. Either the `inputs` or the pair of `batch_size` and `dtype` are provided. `batch_size` is a scalar tensor that represents the batch size of the inputs. `dtype` is `tf.DType` that represents the dtype of the inputs. For backward compatible reason, if this method is not implemented by the cell, the RNN layer will create a zero filled tensor with the size of [batch_size, cell.state_size]. In the case that `cell` is a list of RNN cell instances, the cells will be stacked on top of each other in the RNN, resulting in an efficient stacked RNN.
`return_sequences`	Boolean (default `False`). Whether to return the last output in the output sequence, or the full sequence.
`return_state`	Boolean (default `False`). Whether to return the last state in addition to the output.
`go_backwards`	Boolean (default `False`). If True, process the input sequence backwards and return the reversed sequence.
`stateful`	Boolean (default `False`). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
`unroll`	Boolean (default `False`). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
`time_major`	The shape format of the `inputs` and `outputs` tensors. If True, the inputs and outputs will be in shape `(timesteps, batch, ...)`, whereas in the False case, it will be `(batch, timesteps, ...)`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
`zero_output_for_mask`	Boolean (default `False`). Whether the output should use zeros for the masked timesteps. Note that this field is only used when `return_sequences` is True and mask is provided. It can useful if you want to reuse the raw output sequence of the RNN without interference from the masked timesteps, eg, merging bidirectional RNNs.

ValueError: Could not find matching function to call loaded from the SavedModel. Got:Positional arguments (5 total):


  ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (5 total):
    * Tensor("inputs:0", shape=(None, 75, 100), dtype=float32)
    * None
    * True
    * None
    * None
  Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (5 total):
    * TensorSpec(shape=(None, 75, 100), dtype=tf.float32, name='inputs')
    * TensorSpec(shape=(None, 75), dtype=tf.bool, name='mask')
    * True
    * None
    * None
  Keyword arguments: {}

Option 2:
  Positional arguments (5 total):
    * TensorSpec(shape=(None, 75, 100), dtype=tf.float32, name='inputs')
    * TensorSpec(shape=(None, 75), dtype=tf.bool, name='mask')
    * False
    * None
    * None
  Keyword arguments: {}

Option 3:
  Positional arguments (5 total):
    * [TensorSpec(shape=(None, None, 100), dtype=tf.float32, name='inputs/0')]
    * None
    * False
    * None
    * None
  Keyword arguments: {}

Option 4:
  Positional arguments (5 total):
    * [TensorSpec(shape=(None, None, 100), dtype=tf.float32, name='inputs/0')]
    * None
    * True
    * None
    * None
  Keyword arguments: {}

错误原因是tf2.1版本的bug：

inputs = tf.keras.Input(shape=([75]), dtype='float32', name="input_wv") 
# mask_zero=True带上的话，保存为pb后不能加载，类型错误
word_vectors = tf.keras.layers.Embedding(input_dim=word_num, output_dim=embedding_size, name='embedding_layer',
                                         trainable=True, mask_zero=True, weights=[w2v])(inputs)
# mask_word_vectors = tf.keras.layers.Masking(mask_value=0)(word_vectors)
forward_output = tf.keras.layers.RNN(
    tf.keras.layers.LSTMCell(100, kernel_initializer=keras.initializers.Orthogonal(),
                             recurrent_initializer=keras.initializers.Orthogonal()),
    return_sequences=True,
    return_state=False,
    stateful=False,
)(word_vectors)
model = tf.keras.Model(inputs=[inputs], outputs=[word_vectors, forward_output], name="LSTMAttn")
model.summary()

解决办法：升级tf2.2。