Zheng Chu's Blog

让希望永驻


  • 主页

  • 所有专栏

  • 历史文章

  • 标签

  • 关于我

Tensorfow2复习--RNN

Posted on 2020-07-04 Edited on 2021-03-16 In DeepLearning Views:

Tensorfow2.0—RNN

RNN cell

Input:Vector:$\mathbf{X}{t-1}$,$\mathbf{y}{t-1}$

a single layer, with a single neuron:

a single layer, with a sin‐ gle neuron

1
2
model = keras.models.Sequential([ keras.layers.SimpleRNN(1, input_shape=[None, 1])
])

By default, the SimpleRNN layer uses the hyperbolic tangent activation function.

the initial state h(init) is set to 0

In a simple RNN, this output is also the new state h0.

return one output per time step, you must set return_sequences=True

RNN layer

Input跟RNN cell一样,但是参数增加了,因为每个RNN cell对于$\mathbf{X}$还有$\mathbf{y}$分别有一个参数,多个RNN cell作为一层使得,有多个这样的参数。

image-20200704111333418

Memory Cell

New output , namely hidden state: $\mathbf{h{(t)}}=f(\mathbf{x{t}}, \mathbf{h_{(t-1)}})$

image-20200704112055603

Deep RNNs

stack multiple layers of cells

image-20200704112546983

1
2
3
4
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]), keras.layers.SimpleRNN(20),
keras.layers.Dense(1)
])

Handling Long Sequences

Leading to : Unstable Gradients Problem

nonsaturating activation functions (e.g., ReLU) may not help as much here; in fact, they may actually lead the RNN to be even more unstable during training

Well, suppose Gradient Descent updates the weights in a way that increases the outputs slightly at the first time step. Because the same weights are used at every time step, the outputs at the second time step may also be slightly increased, and those at the third, and so on until the outputs explode—and a nonsaturating acti‐ vation function does not prevent that.

Solution:

  • Layer Normalization : it is very similar to Batch Normalization, but instead of normalizing across the batch dimension, it normalizes across the features dimension.

  • Before activation

A cell must also have a state_size attribute and an output_size attribute.

1
2
3
4
5
6
7
8
9
10
11
12
class LNSimpleRNNCell(keras.layers.Layer):
def __init__(self, units, activation="tanh", **kwargs):
super().__init__(**kwargs)
self.state_size = units
self.output_size = units
self.simple_rnn_cell = keras.layers.SimpleRNNCell(units,
activation=None) self.layer_norm = keras.layers.LayerNormalization()
self.activation = keras.activations.get(activation)
def call(self, inputs, states):
#new_states[0] equal to outputs
outputs, new_states = self.simple_rnn_cell(inputs, states)
norm_outputs = self.activation(self.layer_norm(outputs)) return norm_outputs, [norm_outputs]
1
2
3
4
5
6
model = keras.models.Sequential([ keras.layers.RNN(LNSimpleRNNCell(20),
return_sequences=True,
input_shape=[None, 1]),
keras.layers.RNN(LNSimpleRNNCell(20),
return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(10)) ])
  • all recurrent layers (except for keras.layers.RNN) and all cells provided by Keras have a dropout hyperparameter and a recurrent_dropout hyperparameter: the former defines the dropout rate to apply to the inputs (at each time step), and the latter defines the dropout rate for the hidden states

LSTM

  • its state is split into two vectors: h(t)(short-term state) and c(t) (long-term state)

image-20200704115106939

1
2
3
4
5
model = keras.models.Sequential([
keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]),
keras.layers.LSTM(20, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(10))
])

GRU cells

image-20200704115624823

WaveNet

2021-03 更新

TF1

循环神经网络

输入X的长度:

X: Tensor(“embedding_lookup/Identity:0”, shape=(1, 75, 100), dtype=float32) [批量大小, 点击序列长度,每个点击的Embedding长度]

1
2
3
4
5
6
7
8
9
with tf.variable_scope("bilstm", reuse=reuse):
forward_output, _ = tf.nn.dynamic_rnn(
tf.contrib.rnn.LSTMCell(self.hidden_size[0],
initializer=self.initializer,
reuse=tf.AUTO_REUSE),
X,
dtype=tf.float32,
sequence_length=length, # length = 75 , 也就是点击序列的长度
scope="RNN_forward")

https://zhuanlan.zhihu.com/p/43041436

LSTMCell:

init (

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
num_units,

use_peepholes=False,

cell_clip=None,

initializer=None,

num_proj=None,

proj_clip=None,

num_unit_shards=None,

num_proj_shards=None,

forget_bias=1.0,

state_is_tuple=True,

activation=None,

reuse=None,

name=None,

dtype=None)

参数说明:

num_units:LSTM cell中的单元数量,即隐藏层神经元数量。
use_peepholes:布尔类型,设置为True则能够使用peephole连接
cell_clip:可选参数,float类型,如果提供,则在单元输出激活之前,通过该值裁剪单元状态。
Initializer:可选参数,用于权重和投影矩阵的初始化器。
num_proj:可选参数,int类型,投影矩阵的输出维数,如果为None,则不执行投影。
pro_clip:可选参数,float型,如果提供了num_proj>0和proj_clip,则投影值将元素裁剪到[-proj_clip,proj_clip]范围。
num_unit_shards:弃用。
num_proj_shards:弃用。
forget_bias:float类型,偏置增加了忘记门。从CudnnLSTM训练的检查点(checkpoin)恢复时,必须手动设置为0.0。
state_is_tuple:如果为True,则接受和返回的状态是c_state和m_state的2-tuple;如果为False,则他们沿着列轴连接。后一种即将被弃用。
activation:内部状态的激活函数。默认为tanh
reuse:布尔类型,描述是否在现有范围中重用变量。如果不为True,并且现有范围已经具有给定变量,则会引发错误。
name:String类型,层的名称。具有相同名称的层将共享权重,但为了避免错误,在这种情况下需要reuse=True.
dtype:该层默认的数据类型。默认值为None表示使用第一个输入的类型。在call之前build被调用则需要该参数。
————————————————
版权声明:本文为CSDN博主「大雄没有叮当猫」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u013230189/article/details/82811066

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import tensorflow as tf

batch_size = 1

embedding_dim = 300

inputs = tf.Variable(tf.random_normal([batch_size, embedding_dim]))

previous_state = (tf.Variable(tf.random_normal([batch_size, 100])), tf.Variable(tf.random_normal([batch_size, 100])))

lstmcell = tf.nn.rnn_cell.LSTMCell(100)

outputs, (h_state, c_state) = lstmcell(inputs, previous_state)


sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(outputs))
print(outputs.shape) # (10, 128)

print(h_state.shape) # (10, 128)

print(c_state.shape) # (10, 128)


[[-0.03908409 -0.15413918 -0.5543078 -0.16429745 0.15187332 0.22674957
0.0491709 0.05772216 -0.44460103 -0.4685788 -0.20345497 -0.3048153
0.08147392 -0.3003875 0.1934505 -0.2632878 -0.4045717 -0.27238455
0.5274059 0.3149494 -0.04591073 0.2029352 -0.7432734 0.24162863
0.02106089 -0.05926758 -0.38088006 0.00200358 -0.17426118 0.02138741
0.13410263 -0.7480866 0.59177715 -0.16158845 0.10526363 -0.43394142
-0.11014693 -0.02479873 0.12916292 0.3426005 0.3468578 0.03081424
-0.5923045 -0.73410743 -0.3768449 0.18405321 -0.35003117 -0.04348066
0.37911254 -0.35261196 -0.21207377 0.5164869 0.09950166 -0.02072151
-0.32580587 -0.20204493 0.04182163 -0.551953 0.5776422 -0.15258075
-0.19567099 -0.46144682 0.10801785 0.2929367 0.3800717 -0.10385328
-0.23831426 -0.27831694 -0.26867956 -0.52392566 -0.35053068 -0.12362379
-0.47640064 0.29312813 -0.63410735 0.27830732 -0.00418854 -0.06247476
0.38740724 0.12090401 -0.34354135 0.26057196 -0.3492871 0.28602415
-0.32755297 -0.48350778 0.03378379 -0.45831716 -0.33049983 0.5797502
-0.57983845 -0.5102703 -0.05812724 -0.06680895 0.10354684 -0.22407153
-0.00116582 -0.17466179 -0.24506229 0.40669897]]
(1, 100)
(1, 100)
(1, 100)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

dynamic RNN:

1
2
3
4
5
6
7
8
9
10
11
tf.nn.dynamic_rnn(
cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)

具体解释:https://zhuanlan.zhihu.com/p/43041436

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#coding=utf-8
import tensorflow as tf
import numpy as np
# 创建输入数据
X = np.random.randn(2, 10, 8) # batch-size, 序列最大有效长度,每个序列的embedding大小

# 第二个example长度为6
X[1,6:] = 0

print("第二个样例:", X[1])
X_lengths = [10, 6] # 第一个序列的有效长度为10,第二个序列的有效长度是6,超过6步的outputs,是直接被设置成0了,而last_states将7-10步的输出重复第6步的输出。可见节省了不少的计算开销

cell = tf.contrib.rnn.BasicLSTMCell(num_units=64, state_is_tuple=True) # 输出最后一维的大小等于num_units

outputs, last_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)

result = tf.contrib.learn.run_n(
{"outputs": outputs, "last_states": last_states},
n=1,
feed_dict=None)


np.set_printoptions(suppress=True)

print ("outputs shape: ", result[0]['outputs'].shape)
print("第二个样例的输出: ", result[0]['outputs'][1].shape)
for it in result[0]['outputs'][1]:
print(it)
print ("last_states: ", result[0]['last_states'], result[0]['last_states'])


assert result[0]["outputs"].shape == (2, 10, 64)

# 第二个example中的outputs超过6步(7-10步)的值应该为0
assert (result[0]["outputs"][1,7,:] == np.zeros(cell.output_size)).all()

https://blog.csdn.net/u010223750/article/details/71079036

RNN相关

1、完全图解RNN、RNN变体、Seq2Seq、Attention机制 https://zhuanlan.zhihu.com/p/28054589

TF2

keras.layers.RNN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>@deprecation.deprecated(
None,
"Please use `keras.layers.RNN(cell)`, which is equivalent to this API")
>@tf_export(v1=["nn.dynamic_rnn"])
>@dispatch.add_dispatch_support
>def dynamic_rnn(cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None):

tf.keras.layers.LSTMCell

1
2
3
4
5
6
7
8
9
tf.keras.layers.LSTMCell(
units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros', unit_forget_bias=True,
kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
dropout=0.0, recurrent_dropout=0.0, **kwargs
)
Arguments
units Positive integer, dimensionality of the output space.
activation Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
recurrent_activation Activation function to use for the recurrent step. Default: sigmoid (sigmoid). If you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias Boolean, (default True), whether the layer uses a bias vector.
kernel_initializer Initializer for the kernel weights matrix, used for the linear transformation of the inputs. Default: glorot_uniform.
recurrent_initializer Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal.
bias_initializer Initializer for the bias vector. Default: zeros.
unit_forget_bias Boolean (default True). If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros". This is recommended in Jozefowicz et al.
kernel_regularizer Regularizer function applied to the kernel weights matrix. Default: None.
recurrent_regularizer Regularizer function applied to the recurrent_kernel weights matrix. Default: None.
bias_regularizer Regularizer function applied to the bias vector. Default: None.
kernel_constraint Constraint function applied to the kernel weights matrix. Default: None.
recurrent_constraint Constraint function applied to the recurrent_kernel weights matrix. Default: None.
bias_constraint Constraint function applied to the bias vector. Default: None.
dropout Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0.
recurrent_dropout Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0.

Call arguments:

  • inputs: A 2D tensor, with shape of [batch, feature].
  • states: List of 2 tensors that corresponding to the cell’s units. Both of them have shape [batch, units], the first tensor is the memory state from previous time step, the second tensor is the carry state from previous time step. For timestep 0, the initial state provided by user will be feed to cell.
  • training: Python boolean indicating whether the layer should behave in training mode or in inference mode. Only relevant when dropout or recurrent_dropout is used.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell

RNN

1
2
3
4
tf.keras.layers.RNN(
cell, return_sequences=False, return_state=False, go_backwards=False,
stateful=False, unroll=False, time_major=False, **kwargs
)
Arguments
cell A RNN cell instance or a list of RNN cell instances. A RNN cell is a class that has:A call(input_at_t, states_at_t) method, returning (output_at_t, states_at_t_plus_1). The call method of the cell can also take the optional argument constants, see section “Note on passing external constants” below.A state_size attribute. This can be a single integer (single state) in which case it is the size of the recurrent state. This can also be a list/tuple of integers (one size per state). The state_size can also be TensorShape or tuple/list of TensorShape, to represent high dimension state.A output_size attribute. This can be a single integer or a TensorShape, which represent the shape of the output. For backward compatible reason, if this attribute is not available for the cell, the value will be inferred by the first element of the state_size.A get_initial_state(inputs=None, batch_size=None, dtype=None) method that creates a tensor meant to be fed to call() as the initial state, if the user didn’t specify any initial state via other means. The returned initial state should have a shape of [batch_size, cell.state_size]. The cell might choose to create a tensor full of zeros, or full of other values based on the cell’s implementation. inputs is the input tensor to the RNN layer, which should contain the batch size as its shape[0], and also dtype. Note that the shape[0] might be None during the graph construction. Either the inputs or the pair of batch_size and dtype are provided. batch_size is a scalar tensor that represents the batch size of the inputs. dtype is tf.DType that represents the dtype of the inputs. For backward compatible reason, if this method is not implemented by the cell, the RNN layer will create a zero filled tensor with the size of [batch_size, cell.state_size]. In the case that cell is a list of RNN cell instances, the cells will be stacked on top of each other in the RNN, resulting in an efficient stacked RNN.
return_sequences Boolean (default False). Whether to return the last output in the output sequence, or the full sequence.
return_state Boolean (default False). Whether to return the last state in addition to the output.
go_backwards Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
time_major The shape format of the inputs and outputs tensors. If True, the inputs and outputs will be in shape (timesteps, batch, ...), whereas in the False case, it will be (batch, timesteps, ...). Using time_major = True is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
zero_output_for_mask Boolean (default False). Whether the output should use zeros for the masked timesteps. Note that this field is only used when return_sequences is True and mask is provided. It can useful if you want to reuse the raw output sequence of the RNN without interference from the masked timesteps, eg, merging bidirectional RNNs.

ValueError: Could not find matching function to call loaded from the SavedModel. Got:Positional arguments (5 total):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (5 total):
* Tensor("inputs:0", shape=(None, 75, 100), dtype=float32)
* None
* True
* None
* None
Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
Positional arguments (5 total):
* TensorSpec(shape=(None, 75, 100), dtype=tf.float32, name='inputs')
* TensorSpec(shape=(None, 75), dtype=tf.bool, name='mask')
* True
* None
* None
Keyword arguments: {}

Option 2:
Positional arguments (5 total):
* TensorSpec(shape=(None, 75, 100), dtype=tf.float32, name='inputs')
* TensorSpec(shape=(None, 75), dtype=tf.bool, name='mask')
* False
* None
* None
Keyword arguments: {}

Option 3:
Positional arguments (5 total):
* [TensorSpec(shape=(None, None, 100), dtype=tf.float32, name='inputs/0')]
* None
* False
* None
* None
Keyword arguments: {}

Option 4:
Positional arguments (5 total):
* [TensorSpec(shape=(None, None, 100), dtype=tf.float32, name='inputs/0')]
* None
* True
* None
* None
Keyword arguments: {}

​

错误原因是tf2.1版本的bug:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
inputs = tf.keras.Input(shape=([75]), dtype='float32', name="input_wv") 
# mask_zero=True带上的话,保存为pb后不能加载,类型错误
word_vectors = tf.keras.layers.Embedding(input_dim=word_num, output_dim=embedding_size, name='embedding_layer',
trainable=True, mask_zero=True, weights=[w2v])(inputs)
# mask_word_vectors = tf.keras.layers.Masking(mask_value=0)(word_vectors)
forward_output = tf.keras.layers.RNN(
tf.keras.layers.LSTMCell(100, kernel_initializer=keras.initializers.Orthogonal(),
recurrent_initializer=keras.initializers.Orthogonal()),
return_sequences=True,
return_state=False,
stateful=False,
)(word_vectors)
model = tf.keras.Model(inputs=[inputs], outputs=[word_vectors, forward_output], name="LSTMAttn")
model.summary()

解决办法:升级tf2.2。

# DeepLearning
Python-Trick
Tensorfow2复习--LowerAPI
  • Table of Contents
  • Overview
Zheng Chu

Zheng Chu

90 posts
20 categories
25 tags
GitHub 简书 CSDN E-Mail
  1. 1. Tensorfow2.0—RNN
    1. 1.0.0.1. RNN cell
    2. 1.0.0.2. RNN layer
    3. 1.0.0.3. Memory Cell
    4. 1.0.0.4. Deep RNNs
    5. 1.0.0.5. Handling Long Sequences
    6. 1.0.0.6. LSTM
    7. 1.0.0.7. GRU cells
    8. 1.0.0.8. WaveNet
  • 1.1. 2021-03 更新
    1. 1.1.1. TF1
    2. 1.1.2. 循环神经网络
  • 1.2. RNN相关
    1. 1.2.1. TF2
      1. 1.2.1.1. tf.keras.layers.LSTMCell
      2. 1.2.1.2. Call arguments:
      3. 1.2.1.3. RNN
      4. 1.2.1.4. ValueError: Could not find matching function to call loaded from the SavedModel. Got:Positional arguments (5 total):
  • © 2021 Zheng Chu
    Powered by Hexo v4.2.1
    |
    Theme – NexT.Pisces v7.3.0
    |