Kürzlich spiele ich das Blattklassifikationsproblem in Kaggle. Ich habe ein Notebook Simple Keras 1D CNN + features split gesehen. Aber als ich versuchte, das gleiche Modell mit Tensorflow zu konstruieren, erzeugte es sehr wenig Genauigkeit und Verluständerung wenig. Hier ist mein Code:Warum funktioniert die gleiche neurale Architektur in Keras, aber nicht Tensorflow (Blattklassifikation)?
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale,StandardScaler
#preparing data
train=pd.read_csv('E:\\DataAnalysis\\Kaggle\\leaf\\train.csv',sep=',')
test=pd.read_csv('E:\\DataAnalysis\\Kaggle\\leaf\\test.csv',sep=',')
subexp=pd.read_csv('E:/DataAnalysis/Kaggle/leaf/sample_submission.csv')
x_train=np.asarray(train.drop(['species','id'],axis=1),dtype=np.float32)
x_train=scale(x_train).reshape([990,64,3])
ids=list(subexp)[1:]
spec=np.asarray(train['species'])
y_train=np.asarray([[int(x==ids[i]) for i in range(len(ids))] for x in spec],dtype=np.float32)
drop=0.75
batch_size=16
max_epoch=10
iter_per_epoch=int(990/batch_size)
max_iter=int(max_epoch*iter_per_epoch)
features=192
keep_prob=0.75
#inputs, weights, and biases
x=tf.placeholder(tf.float32,[None,64,3])
y=tf.placeholder(tf.float32,[None,99])
w={
'w1':tf.Variable(tf.truncated_normal([1,3,512],dtype=tf.float32)),
'wd1':tf.Variable(tf.truncated_normal([64*512,2048],dtype=tf.float32)),
'wd2':tf.Variable(tf.truncated_normal([2048,1024],dtype=tf.float32)),
'wd3':tf.Variable(tf.truncated_normal([1024,99],dtype=tf.float32))
}
b={
'b1':tf.Variable(tf.truncated_normal([512],dtype=tf.float32)),
'bd1':tf.Variable(tf.truncated_normal([2048],dtype=tf.float32)),
'bd2':tf.Variable(tf.truncated_normal([1024],dtype=tf.float32)),
'bd3':tf.Variable(tf.truncated_normal([99],dtype=tf.float32))
}
#model.
def conv(x,we,bi):
l1a=tf.nn.relu(tf.nn.conv1d(value=x,filters=we['w1'],stride=1,padding='SAME'))
l1a=tf.reshape(tf.nn.bias_add(l1a,bi['b1']),[-1,64*512])
l1=tf.nn.dropout(l1a,keep_prob=0.4)
l2a=tf.nn.relu(tf.add(tf.matmul(l1,we['wd1']),bi['bd1']))
l3a=tf.nn.relu(tf.add(tf.matmul(l2a,we['wd2']),bi['bd2']))
out=tf.nn.softmax(tf.matmul(l3a,we['wd3']))
return out
#optimizer and accuracy
out=conv(x,w,b)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=out,targets=y))
train_op=tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#train
with tf.Session() as sess :
sess.run(tf.global_variables_initializer())
step=0
while step<max_iter :
d =(step % iter_per_epoch)*batch_size
batch_x=x_train[d:d+batch_size:1]
batch_y=y_train[d:d+batch_size:1]
sess.run(train_op,feed_dict={x: batch_x,y: batch_y})
if step%10==0:
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,})
print("Iter: ", step," loss:",loss, " accuracy:",acc)
step+=1
print('Training finished!')
Das Ergebnis ist so etwas wie:
Iter: 0 loss: 0.69941 accuracy: 0.0
Iter: 10 loss: 0.69941 accuracy: 0.0
Iter: 20 loss: 0.69941 accuracy: 0.0
Iter: 30 loss: 0.69941 accuracy: 0.0
Iter: 40 loss: 0.69941 accuracy: 0.0
Iter: 50 loss: 0.698778 accuracy: 0.0625
Iter: 60 loss: 0.698778 accuracy: 0.0625
Iter: 70 loss: 0.69941 accuracy: 0.0
Iter: 80 loss: 0.69941 accuracy: 0.0
Iter: 90 loss: 0.69941 accuracy: 0.0
Iter: 100 loss: 0.69941 accuracy: 0.0
Iter: 110 loss: 0.69941 accuracy: 0.0
Iter: 120 loss: 0.69941 accuracy: 0.0
Iter: 130 loss: 0.69941 accuracy: 0.0
Iter: 140 loss: 0.69941 accuracy: 0.0
Iter: 150 loss: 0.69941 accuracy: 0.0
Iter: 160 loss: 0.69941 accuracy: 0.0
Iter: 170 loss: 0.698778 accuracy: 0.0625
......
Aber wenn die gleichen Daten und Modell in Keras verwendet, ist es in der Tat sehr gutes Ergebnis zu erzeugen. Code:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Convolution1D, Dropout
from keras.optimizers import SGD
from keras.utils import np_utils
model = Sequential()
model.add(Convolution1D(nb_filter=512, filter_length=1, input_shape=(64, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1024, activation='relu'))
model.add(Dense(99))
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, nesterov=True, decay=1e-6, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
model.fit(x_train, y_train, nb_epoch=5, batch_size=16)
Ergebnis:
Epoch 1/5
990/990 [==============================] - 78s - loss: 4.3229 - acc: 0.1404
Epoch 2/5
990/990 [==============================] - 76s - loss: 1.6020 - acc: 0.6384
Epoch 3/5
990/990 [==============================] - 74s - loss: 0.2723 - acc: 0.9384
Epoch 4/5
990/990 [==============================] - 73s - loss: 0.1061 - acc: 0.9758
By the way, verwendet keras Tensorflow Backend. Gibt es einen Vorschlag?
Die Softmax statt Logits ist der wahrscheinlichste Schuldige hier. Alle anderen Unterschiede können die Konvergenz verhindern oder verhindern, aber das ist ziemlich sicher, um es zu verhindern :) –