Estimator封装了四个主要功能:
- training
- evaluation
- prediction
- export for serving
Esitmator提供了现在tf.keras
正在构建中的功能:
- Parameter server based training
- Full TFX integration.
Pre-made Estimators封装了很多已有的networks,便于对不同模型结构做测试。
编写一个Estimator程序的步骤如下:
数据处理接口:
1
2
3def input_fn(dataset):
... # manipulate dataset, extracting the feature dict and the label
return feature_dict, label定义
feature columns
:通过接口tf.feature_column
定义特征的name,types,预处理方式等。1
2
3
4
5
6# Define three numeric feature columns.
population = tf.feature_column.numeric_column('population')
crime_rate = tf.feature_column.numeric_column('crime_rate')
median_education = tf.feature_column.numeric_column(
'median_education',
normalizer_fn=lambda x: x - global_education_mean) #包含数据处理
定义一个Estimator:
1
2
3# Instantiate an estimator, passing the feature columns.
estimator = tf.estimator.LinearClassifier(
feature_columns=[population, crime_rate, median_education])
training,evaluate等
1
2# `input_fn` is the function created in Step 1
estimator.train(input_fn=my_training_set, steps=2000)
从keras构建Estimator
1 | keras_mobilenet_v2 = tf.keras.applications.MobileNetV2( |
feature_columns
https://www.tensorflow.org/tutorials/structured_data/feature_columns
特征列 通常用于对结构化数据实施特征工程时候使用,图像或者文本数据一般不会用到特征列。
使用特征列可以将类别特征转换为one-hot编码特征,将连续特征构建分桶特征,以及对多个特征生成交叉特征等等。
要创建特征列,请调用 tf.feature_column 模块的函数。该模块中常用的九个函数如下图所示,所有九个函数都会返回一个 Categorical-Column 或一个 Dense-Column 对象,但却不会返回 bucketized_column,后者继承自这两个类。
注意:所有的Catogorical Column类型最终都要通过indicator_column转换成Dense Column类型才能传入模型!
numeric_column 数值列,最常用。
bucketized_column 分桶列,由数值列生成,可以由一个数值列出多个特征,one-hot编码。
categorical_column_with_identity 分类标识列,one-hot编码,相当于分桶列每个桶为1个整数的情况。
categorical_column_with_vocabulary_list 分类词汇列,one-hot编码,由list指定词典。
categorical_column_with_vocabulary_file 分类词汇列,由文件file指定词典。
categorical_column_with_hash_bucket 哈希列,整数或词典较大时采用。
indicator_column 指标列,由Categorical Column生成,one-hot编码
embedding_column 嵌入列,由Categorical Column生成,嵌入矢量分布参数需要学习。嵌入矢量维数建议取类别数量的 4 次方根。
crossed_column 交叉列,可以由除categorical_column_with_hash_bucket的任意分类列构成。
What’s the difference between a Tensorflow Keras Model and Estimator
What’s the difference between a Tensorflow Keras Model and Estimator?
最大的差别就是分布式训练:
Distribution
You can conduct distributed training across multiple servers with the Estimators API, but not with Keras API.
From the Tensorflow Keras Guide, it says that:
The Estimators API is used for training models for distributed environments.
And from the Tensorflow Estimators Guide, it says that:
You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
PS: Keras does handle low level operations, it’s just not very standard. Its backend (import keras.backend as K
) contains lots of functions that wrap around the backend functions. They’re meant to be used in custom layers, custom metrics, custom loss functions, etc
Multi-worker training with Estimator
Note: While you can use Estimators with tf.distribute
API, it’s recommended to use Keras with tf.distribute
, see multi-worker training with Keras. Estimator training with tf.distribute.Strategy
has limited support.(用keras更方便?)
when using Estimator for multi-worker training, it is necessary to shard the dataset by the number of workers to ensure model convergence. The input data is sharded by worker index, so that each worker processes 1/num_workers
distinct portions of the dataset.(需要把dataset均匀分布在多个worker上来保证收敛,根据worker的索引分配dataset)
Another reasonable approach to achieve convergence would be to shuffle the dataset with distinct seeds at each worker.(另一种保证收敛,对数据在每个worker上都进行shuffle操作。 )
1 | BUFFER_SIZE = 10000 |
Write the layers, the optimizer, and the loss function for training. This tutorial defines the model with Keras layers, similar to the multi-GPU training tutorial.
1 | LEARNING_RATE = 1e-4 |
Note: Although the learning rate is fixed in this example, in general it may be necessary to adjust the learning rate based on the global batch size.(学习率根据全局batchsize动态变换更合适)
It is also possible to distribute the evaluation via eval_distribute
1 | strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy() |