Finetune XLNET-Bahasa
Contents
Finetune XLNET-Bahasa#
This tutorial is available as an IPython notebook at Malaya/finetune/xlnet.
In this notebook, I will going to show to finetune pretrained XLNET-Bahasa using Tensorflow Estimator.
TF-Estimator is really a great module created by Tensorflow Team to train a model for a very long period.
[2]:
# !pip3 install tensorflow==1.15 xlnet-tensorflow
Download pretrained model#
https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/xlnet#download, In this example, we are going to try BASE size. Just uncomment below to download pretrained model and tokenizer.
[4]:
# !wget https://f000.backblazeb2.com/file/malaya-model/bert-bahasa/xlnet-base-500k-20-10-2020.gz
# !wget https://raw.githubusercontent.com/huseinzol05/Malaya/master/pretrained-model/preprocess/sp10m.cased.v9.model
# !wget https://raw.githubusercontent.com/huseinzol05/Malaya/master/pretrained-model/xlnet/config/xlnet-base_config.json
# !tar -zxf xlnet-base-500k-20-10-2020.gz
!ls
sp10m.cased.v9.model xlnet-base-500k-20-10-2020.gz
tf-estimator-text-classification.ipynb xlnet-base_config.json
xlnet-base
[5]:
!ls xlnet-base
model.ckpt-500000.data-00000-of-00001 model.ckpt-500000.meta
model.ckpt-500000.index xlnet-base_config.json
There is a helper function malaya/finetune/utils.py to help us to train the model on single GPU or multiGPUs.
[6]:
import sys
sys.path.insert(0, '../')
import utils
Load dataset#
Just going to train on very small news bahasa sentiment.
[7]:
import pandas as pd
df = pd.read_csv('../sentiment-data-v2.csv')
df.head()
[7]:
label | text | |
---|---|---|
0 | Negative | Lebih-lebih lagi denganĀ kemudahan internet da... |
1 | Positive | boleh memberi teguran kepada parti tetapi perl... |
2 | Negative | Adalah membingungkan mengapa masyarakat Cina b... |
3 | Positive | Kami menurunkan defisit daripada 6.7 peratus p... |
4 | Negative | Ini masalahnya. Bukan rakyat, tetapi sistem |
[8]:
labels = df['label'].values.tolist()
texts = df['text'].values.tolist()
unique_labels = sorted(list(set(labels)))
unique_labels
[8]:
['Negative', 'Positive']
[10]:
import numpy as np
import tensorflow as tf
from xlnet import model_utils
from xlnet import xlnet
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
[11]:
import sentencepiece as spm
from xlnet.prepro_utils import preprocess_text, encode_ids
sp_model = spm.SentencePieceProcessor()
sp_model.Load('sp10m.cased.v9.model')
SEG_ID_A = 0
SEG_ID_B = 1
SEG_ID_CLS = 2
SEG_ID_SEP = 3
SEG_ID_PAD = 4
special_symbols = {
'<unk>': 0,
'<s>': 1,
'</s>': 2,
'<cls>': 3,
'<sep>': 4,
'<pad>': 5,
'<mask>': 6,
'<eod>': 7,
'<eop>': 8,
}
VOCAB_SIZE = 32000
UNK_ID = special_symbols['<unk>']
CLS_ID = special_symbols['<cls>']
SEP_ID = special_symbols['<sep>']
MASK_ID = special_symbols['<mask>']
EOD_ID = special_symbols['<eod>']
def tokenize_fn(text):
text = preprocess_text(text, lower = False)
return encode_ids(sp_model, text)
def token_to_ids(text, maxlen = 512):
tokens_a = tokenize_fn(text)
if len(tokens_a) > maxlen - 2:
tokens_a = tokens_a[: (maxlen - 2)]
segment_id = [SEG_ID_A] * len(tokens_a)
tokens_a.append(SEP_ID)
tokens_a.append(CLS_ID)
segment_id.append(SEG_ID_A)
segment_id.append(SEG_ID_CLS)
input_mask = [0.0] * len(tokens_a)
assert len(tokens_a) == len(input_mask) == len(segment_id)
return {
'input_id': tokens_a,
'input_mask': input_mask,
'segment_id': segment_id,
}
input_id
, integer representation of tokenized words, sorted based on sentencepiece weightage.input_mask
, attention masking. During training, short words will padded with1
, so we do not want the model learn padded values as part of the context. https://github.com/zihangdai/xlnet/blob/master/classifier_utils.py#L113segment_id
, Use for text pair classification, in this case, we can simply put0
.
[12]:
token_to_ids(texts[0])
[12]:
{'input_id': [1620,
13,
5177,
53,
33,
2808,
3168,
24,
3400,
807,
21,
16179,
31,
742,
578,
17153,
9,
4,
3],
'input_mask': [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
'segment_id': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2]}
TF-Estimator#
TF-Estimator, required 2 parts,
Input pipeline, https://www.tensorflow.org/api_docs/python/tf/data/Dataset
Model definition, https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator
[13]:
def generate():
while True:
for i in range(len(texts)):
if len(texts[i]) > 5:
d = token_to_ids(texts[i])
d['label'] = [unique_labels.index(labels[i])]
d.pop('tokens', None)
yield d
[14]:
g = generate()
next(g)
[14]:
{'input_id': [1620,
13,
5177,
53,
33,
2808,
3168,
24,
3400,
807,
21,
16179,
31,
742,
578,
17153,
9,
4,
3],
'input_mask': [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
'segment_id': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
'label': [0]}
It must a function return a function.
def get_dataset(batch_size = 32, shuffle_size = 32):
def get():
return dataset
return get
[15]:
def get_dataset(batch_size = 32, shuffle_size = 32):
def get():
dataset = tf.data.Dataset.from_generator(
generate,
{'input_id': tf.int32, 'input_mask': tf.float32, 'segment_id': tf.int32, 'label': tf.int32},
output_shapes = {
'input_id': tf.TensorShape([None]),
'input_mask': tf.TensorShape([None]),
'segment_id': tf.TensorShape([None]),
'label': tf.TensorShape([None])
},
)
dataset = dataset.shuffle(shuffle_size)
dataset = dataset.padded_batch(
batch_size,
padded_shapes = {
'input_id': tf.TensorShape([None]),
'input_mask': tf.TensorShape([None]),
'segment_id': tf.TensorShape([None]),
'label': tf.TensorShape([None])
},
padding_values = {
'input_id': tf.constant(0, dtype = tf.int32),
'input_mask': tf.constant(1.0, dtype = tf.float32),
'segment_id': tf.constant(4, dtype = tf.int32),
'label': tf.constant(0, dtype = tf.int32),
},
)
return dataset
return get
Test data pipeline using tf.session#
[17]:
tf.reset_default_graph()
sess = tf.InteractiveSession()
iterator = get_dataset()()
iterator = iterator.make_one_shot_iterator().get_next()
WARNING:tensorflow:From <ipython-input-17-2f00f4f10c26>:4: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
[18]:
iterator
[18]:
{'input_id': <tf.Tensor 'IteratorGetNext:0' shape=(?, ?) dtype=int32>,
'input_mask': <tf.Tensor 'IteratorGetNext:1' shape=(?, ?) dtype=float32>,
'segment_id': <tf.Tensor 'IteratorGetNext:3' shape=(?, ?) dtype=int32>,
'label': <tf.Tensor 'IteratorGetNext:2' shape=(?, ?) dtype=int32>}
[19]:
sess.run(iterator)
[19]:
{'input_id': array([[1084, 791, 835, ..., 0, 0, 0],
[ 256, 8993, 9, ..., 0, 0, 0],
[8110, 87, 1743, ..., 0, 0, 0],
...,
[ 767, 250, 51, ..., 0, 0, 0],
[ 398, 8269, 742, ..., 9, 4, 3],
[3593, 21, 7901, ..., 0, 0, 0]], dtype=int32),
'input_mask': array([[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
...,
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 1., 1., 1.]], dtype=float32),
'segment_id': array([[0, 0, 0, ..., 4, 4, 4],
[0, 0, 0, ..., 4, 4, 4],
[0, 0, 0, ..., 4, 4, 4],
...,
[0, 0, 0, ..., 4, 4, 4],
[0, 0, 0, ..., 0, 0, 2],
[0, 0, 0, ..., 4, 4, 4]], dtype=int32),
'label': array([[0],
[0],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1],
[1],
[0],
[1],
[1],
[1],
[0],
[1],
[0],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[0],
[0],
[1],
[0],
[1],
[0],
[1]], dtype=int32)}
Model definition#
It must a function accepts 4 parameters.
def model_fn(features, labels, mode, params):
[22]:
kwargs = dict(
is_training = True,
use_tpu = False,
use_bfloat16 = False,
dropout = 0.1,
dropatt = 0.1,
init = 'normal',
init_range = 0.1,
init_std = 0.05,
clamp_len = -1,
)
xlnet_parameters = xlnet.RunConfig(**kwargs)
xlnet_config = xlnet.XLNetConfig(json_path = 'xlnet-base_config.json')
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/xlnet.py:64: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.
[26]:
epoch = 10
batch_size = 32
warmup_proportion = 0.1
num_train_steps = 10
num_warmup_steps = int(num_train_steps * warmup_proportion)
learning_rate = 2e-5
training_parameters = dict(
decay_method = 'poly',
train_steps = num_train_steps,
learning_rate = learning_rate,
warmup_steps = num_warmup_steps,
min_lr_ratio = 0.0,
weight_decay = 0.00,
adam_epsilon = 1e-8,
num_core_per_host = 1,
lr_layer_decay_rate = 1,
use_tpu = False,
use_bfloat16 = False,
dropout = 0.0,
dropatt = 0.0,
init = 'normal',
init_range = 0.1,
init_std = 0.05,
clip = 1.0,
clamp_len = -1,
)
[27]:
class Parameter:
def __init__(
self,
decay_method,
warmup_steps,
weight_decay,
adam_epsilon,
num_core_per_host,
lr_layer_decay_rate,
use_tpu,
learning_rate,
train_steps,
min_lr_ratio,
clip,
**kwargs
):
self.decay_method = decay_method
self.warmup_steps = warmup_steps
self.weight_decay = weight_decay
self.adam_epsilon = adam_epsilon
self.num_core_per_host = num_core_per_host
self.lr_layer_decay_rate = lr_layer_decay_rate
self.use_tpu = use_tpu
self.learning_rate = learning_rate
self.train_steps = train_steps
self.min_lr_ratio = min_lr_ratio
self.clip = clip
training_parameters = Parameter(**training_parameters)
init_checkpoint = 'xlnet-base/model.ckpt-500000'
[28]:
def model_fn(features, labels, mode, params):
Y = tf.cast(features['label'][:, 0], tf.int32)
xlnet_model = xlnet.XLNetModel(
xlnet_config = xlnet_config,
run_config = xlnet_parameters,
input_ids = tf.transpose(features['input_id'], [1, 0]),
seg_ids = tf.transpose(features['segment_id'], [1, 0]),
input_mask = tf.transpose(features['input_mask'], [1, 0]),
)
output_layer = xlnet_model.get_sequence_output()
output_layer = tf.transpose(output_layer, [1, 0, 2])
logits_seq = tf.layers.dense(output_layer, 2)
logits = logits_seq[:, 0]
loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits = logits, labels = Y
)
)
tf.identity(loss, 'train_loss')
accuracy = tf.metrics.accuracy(
labels = Y, predictions = tf.argmax(logits, axis = 1)
)
tf.identity(accuracy[1], name = 'train_accuracy')
variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
assignment_map, initialized_variable_names = utils.get_assignment_map_from_checkpoint(
variables, init_checkpoint
)
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
if mode == tf.estimator.ModeKeys.TRAIN:
train_op, _, _ = model_utils.get_train_op(training_parameters, loss)
estimator_spec = tf.estimator.EstimatorSpec(
mode = mode, loss = loss, train_op = train_op
)
elif mode == tf.estimator.ModeKeys.EVAL:
estimator_spec = tf.estimator.EstimatorSpec(
mode = tf.estimator.ModeKeys.EVAL,
loss = loss,
eval_metric_ops = {'accuracy': accuracy},
)
return estimator_spec
Initiate training session#
[29]:
train_dataset = get_dataset()
[ ]:
train_hooks = [
tf.train.LoggingTensorHook(
['train_accuracy', 'train_loss'], every_n_iter = 1
)
]
utils.run_training(
train_fn = train_dataset,
model_fn = model_fn,
model_dir = 'finetuned-xlnet-base',
num_gpus = 1,
log_step = 1,
save_checkpoint_step = epoch,
max_steps = epoch,
train_hooks = train_hooks,
)
WARNING:tensorflow:From ../utils.py:62: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From ../utils.py:62: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
INFO:tensorflow:Using config: {'_model_dir': 'finetuned-xlnet-base', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 1, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f31fb236fd0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/xlnet.py:221: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/xlnet.py:221: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/modeling.py:453: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/modeling.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/modeling.py:535: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow: name = model/transformer/r_w_bias:0, shape = (12, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/r_r_bias:0, shape = (12, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/r_s_bias:0, shape = (12, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/seg_embed:0, shape = (12, 2, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (768, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = dense/kernel:0, shape = (768, 2)
INFO:tensorflow: name = dense/bias:0, shape = (2,)
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/model_utils.py:96: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/model_utils.py:108: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/model_utils.py:123: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/ubuntu/.local/lib/python3.6/site-packages/xlnet/model_utils.py:131: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into finetuned-xlnet-base/model.ckpt.
INFO:tensorflow:train_accuracy = 0.5, train_loss = 0.8626036
INFO:tensorflow:loss = 0.8626036, step = 1
[ ]: