Tensorfow2.0—RNN
RNN cell
Input:Vector:$\mathbf{X}{t-1}$,$\mathbf{y}{t-1}$
a single layer, with a single neuron:
1 | model = keras.models.Sequential([ keras.layers.SimpleRNN(1, input_shape=[None, 1]) |
By default, the SimpleRNN layer uses the hyperbolic tangent activation function.
the initial state h(init) is set to 0
In a simple RNN, this output is also the new state h0.
return one output per time step, you must set return_sequences=True
RNN layer
Input跟RNN cell一样,但是参数增加了,因为每个RNN cell对于$\mathbf{X}$还有$\mathbf{y}$分别有一个参数,多个RNN cell作为一层使得,有多个这样的参数。
Memory Cell
New output , namely hidden state: $\mathbf{h{(t)}}=f(\mathbf{x{t}}, \mathbf{h_{(t-1)}})$
Deep RNNs
stack multiple layers of cells
1 | model = keras.models.Sequential([ |
Handling Long Sequences
Leading to : Unstable Gradients Problem
nonsaturating activation functions (e.g., ReLU) may not help as much here; in fact, they may actually lead the RNN to be even more unstable during training
Well, suppose Gradient Descent updates the weights in a way that increases the outputs slightly at the first time step. Because the same weights are used at every time step, the outputs at the second time step may also be slightly increased, and those at the third, and so on until the outputs explode—and a nonsaturating acti‐ vation function does not prevent that.
Solution:
Layer Normalization : it is very similar to Batch Normalization, but instead of normalizing across the batch dimension, it normalizes across the features dimension.
Before activation
A cell must also have a state_size
attribute and an output_size
attribute.
1 | class LNSimpleRNNCell(keras.layers.Layer): |
1 | model = keras.models.Sequential([ keras.layers.RNN(LNSimpleRNNCell(20), |
- all recurrent layers (except for keras.layers.RNN) and all cells provided by Keras have a dropout hyperparameter and a recurrent_dropout hyperparameter: the former defines the dropout rate to apply to the inputs (at each time step), and the latter defines the dropout rate for the hidden states
LSTM
- its state is split into two vectors: h(t)(short-term state) and c(t) (long-term state)
1 | model = keras.models.Sequential([ |
GRU cells
WaveNet
2021-03 更新
TF1
循环神经网络
输入X
的长度:
X: Tensor(“embedding_lookup/Identity:0”, shape=(1, 75, 100), dtype=float32) [批量大小, 点击序列长度,每个点击的Embedding长度]
1 | with tf.variable_scope("bilstm", reuse=reuse): |
https://zhuanlan.zhihu.com/p/43041436
LSTMCell
:
init (
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 num_units,
use_peepholes=False,
cell_clip=None,
initializer=None,
num_proj=None,
proj_clip=None,
num_unit_shards=None,
num_proj_shards=None,
forget_bias=1.0,
state_is_tuple=True,
activation=None,
reuse=None,
name=None,
dtype=None)
参数说明:
num_units:LSTM cell中的单元数量,即隐藏层神经元数量。
use_peepholes:布尔类型,设置为True则能够使用peephole连接
cell_clip:可选参数,float类型,如果提供,则在单元输出激活之前,通过该值裁剪单元状态。
Initializer:可选参数,用于权重和投影矩阵的初始化器。
num_proj:可选参数,int类型,投影矩阵的输出维数,如果为None,则不执行投影。
pro_clip:可选参数,float型,如果提供了num_proj>0和proj_clip,则投影值将元素裁剪到[-proj_clip,proj_clip]范围。
num_unit_shards:弃用。
num_proj_shards:弃用。
forget_bias:float类型,偏置增加了忘记门。从CudnnLSTM训练的检查点(checkpoin)恢复时,必须手动设置为0.0。
state_is_tuple:如果为True,则接受和返回的状态是c_state和m_state的2-tuple;如果为False,则他们沿着列轴连接。后一种即将被弃用。
activation:内部状态的激活函数。默认为tanh
reuse:布尔类型,描述是否在现有范围中重用变量。如果不为True,并且现有范围已经具有给定变量,则会引发错误。
name:String类型,层的名称。具有相同名称的层将共享权重,但为了避免错误,在这种情况下需要reuse=True.
dtype:该层默认的数据类型。默认值为None表示使用第一个输入的类型。在call之前build被调用则需要该参数。
————————————————
版权声明:本文为CSDN博主「大雄没有叮当猫」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u013230189/article/details/82811066
1 | import tensorflow as tf |
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
dynamic RNN
:
1
2
3
4
5
6
7
8
9
10
11 tf.nn.dynamic_rnn(
cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
具体解释:https://zhuanlan.zhihu.com/p/43041436
1 | #coding=utf-8 |
https://blog.csdn.net/u010223750/article/details/71079036
RNN相关
1、完全图解RNN、RNN变体、Seq2Seq、Attention机制 https://zhuanlan.zhihu.com/p/28054589
TF2
keras.layers.RNN
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 >@deprecation.deprecated(
None,
"Please use `keras.layers.RNN(cell)`, which is equivalent to this API")
>@tf_export(v1=["nn.dynamic_rnn"])
>@dispatch.add_dispatch_support
>def dynamic_rnn(cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None):
tf.keras.layers.LSTMCell
1 | tf.keras.layers.LSTMCell( |
Arguments | |
---|---|
units |
Positive integer, dimensionality of the output space. |
activation |
Activation function to use. Default: hyperbolic tangent (tanh ). If you pass None , no activation is applied (ie. “linear” activation: a(x) = x ). |
recurrent_activation |
Activation function to use for the recurrent step. Default: sigmoid (sigmoid ). If you pass None , no activation is applied (ie. “linear” activation: a(x) = x ). |
use_bias |
Boolean, (default True ), whether the layer uses a bias vector. |
kernel_initializer |
Initializer for the kernel weights matrix, used for the linear transformation of the inputs. Default: glorot_uniform . |
recurrent_initializer |
Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal . |
bias_initializer |
Initializer for the bias vector. Default: zeros . |
unit_forget_bias |
Boolean (default True ). If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros" . This is recommended in Jozefowicz et al. |
kernel_regularizer |
Regularizer function applied to the kernel weights matrix. Default: None . |
recurrent_regularizer |
Regularizer function applied to the recurrent_kernel weights matrix. Default: None . |
bias_regularizer |
Regularizer function applied to the bias vector. Default: None . |
kernel_constraint |
Constraint function applied to the kernel weights matrix. Default: None . |
recurrent_constraint |
Constraint function applied to the recurrent_kernel weights matrix. Default: None . |
bias_constraint |
Constraint function applied to the bias vector. Default: None . |
dropout |
Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0. |
recurrent_dropout |
Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0. |
Call arguments:
inputs
: A 2D tensor, with shape of[batch, feature]
.states
: List of 2 tensors that corresponding to the cell’s units. Both of them have shape[batch, units]
, the first tensor is the memory state from previous time step, the second tensor is the carry state from previous time step. For timestep 0, the initial state provided by user will be feed to cell.training
: Python boolean indicating whether the layer should behave in training mode or in inference mode. Only relevant whendropout
orrecurrent_dropout
is used.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell
RNN
1 | tf.keras.layers.RNN( |
Arguments | |
---|---|
cell |
A RNN cell instance or a list of RNN cell instances. A RNN cell is a class that has:A call(input_at_t, states_at_t) method, returning (output_at_t, states_at_t_plus_1) . The call method of the cell can also take the optional argument constants , see section “Note on passing external constants” below.A state_size attribute. This can be a single integer (single state) in which case it is the size of the recurrent state. This can also be a list/tuple of integers (one size per state). The state_size can also be TensorShape or tuple/list of TensorShape, to represent high dimension state.A output_size attribute. This can be a single integer or a TensorShape, which represent the shape of the output. For backward compatible reason, if this attribute is not available for the cell, the value will be inferred by the first element of the state_size .A get_initial_state(inputs=None, batch_size=None, dtype=None) method that creates a tensor meant to be fed to call() as the initial state, if the user didn’t specify any initial state via other means. The returned initial state should have a shape of [batch_size, cell.state_size]. The cell might choose to create a tensor full of zeros, or full of other values based on the cell’s implementation. inputs is the input tensor to the RNN layer, which should contain the batch size as its shape[0], and also dtype. Note that the shape[0] might be None during the graph construction. Either the inputs or the pair of batch_size and dtype are provided. batch_size is a scalar tensor that represents the batch size of the inputs. dtype is tf.DType that represents the dtype of the inputs. For backward compatible reason, if this method is not implemented by the cell, the RNN layer will create a zero filled tensor with the size of [batch_size, cell.state_size]. In the case that cell is a list of RNN cell instances, the cells will be stacked on top of each other in the RNN, resulting in an efficient stacked RNN. |
return_sequences |
Boolean (default False ). Whether to return the last output in the output sequence, or the full sequence. |
return_state |
Boolean (default False ). Whether to return the last state in addition to the output. |
go_backwards |
Boolean (default False ). If True, process the input sequence backwards and return the reversed sequence. |
stateful |
Boolean (default False ). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch. |
unroll |
Boolean (default False ). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences. |
time_major |
The shape format of the inputs and outputs tensors. If True, the inputs and outputs will be in shape (timesteps, batch, ...) , whereas in the False case, it will be (batch, timesteps, ...) . Using time_major = True is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form. |
zero_output_for_mask |
Boolean (default False ). Whether the output should use zeros for the masked timesteps. Note that this field is only used when return_sequences is True and mask is provided. It can useful if you want to reuse the raw output sequence of the RNN without interference from the masked timesteps, eg, merging bidirectional RNNs. |
ValueError: Could not find matching function to call loaded from the SavedModel. Got:Positional arguments (5 total):
1 |
|
错误原因是tf2.1版本的bug:
1 | inputs = tf.keras.Input(shape=([75]), dtype='float32', name="input_wv") |
解决办法:升级tf2.2。