Warum machen Softmax und Crossentropy separat unterschiedliche Ergebnisse, als wenn sie gemeinsam mit softmax_cross_entropy_with_logits arbeiten?

Ich machte einen Computer, um eine handschriftliche Nummer aus MNist-Datensatz mit Softmax-Funktion vorherzusagen. und etwas Seltsames ist passiert. Die Kosten sanken im Laufe der Zeit und werden irgendwann um 0,0038 .... (Ich habe softmax_crossentropy_with_logits() für die Kostenfunktion verwendet). Die Genauigkeit war jedoch ziemlich niedrig wie 33%. Also dachte ich "naja ... ich weiß nicht, was dort passiert ist, aber wenn ich softmax und crossentropy separat mache, wird es vielleicht ein anderes Ergebnis bringen!" und Boom! Genauigkeit stieg auf 89%. Ich habe keine Ahnung, warum Softmax und Crossentropy so unterschiedliche Ergebnisse liefern. Ich sah sogar hier oben: difference between tensorflow tf.nn.softmax and tf.nn.softmax_cross_entropy_with_logits Warum machen Softmax und Crossentropy separat unterschiedliche Ergebnisse, als wenn sie gemeinsam mit softmax_cross_entropy_with_logits arbeiten?

so dies ist der Code, den ich softmax_cross_entropy_with_logits() verwendet für die Kostenfunktion (Genauigkeit: 33%)

import tensorflow as tf 
import numpy as np 
from tensorflow.examples.tutorials.mnist import input_data 

mnist = input_data.read_data_sets("MNIST_data", one_hot=True) 

X = tf.placeholder(shape=[None,784],dtype=tf.float32) 
Y = tf.placeholder(shape=[None,10],dtype=tf.float32) 

W1= tf.Variable(tf.random_normal([784,20])) 
b1= tf.Variable(tf.random_normal([20])) 
layer1 = tf.nn.softmax(tf.matmul(X,W1)+b1) 

W2 = tf.Variable(tf.random_normal([20,10])) 
b2 = tf.Variable(tf.random_normal([10])) 

logits = tf.matmul(layer1,W2)+b2 
hypothesis = tf.nn.softmax(logits) # just so I can figure our the accuracy 

cost_i= tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y) 
cost = tf.reduce_mean(cost_i) 
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost) 


batch_size = 100 
train_epoch = 25 
display_step = 1 
with tf.Session() as sess: 
    sess.run(tf.initialize_all_variables()) 

    for epoch in range(train_epoch): 
     av_cost = 0 
     total_batch = int(mnist.train.num_examples/batch_size) 
     for batch in range(total_batch): 
      batch_xs, batch_ys = mnist.train.next_batch(batch_size) 
      sess.run(optimizer,feed_dict={X:batch_xs,Y:batch_ys}) 
     av_cost += sess.run(cost,feed_dict={X:batch_xs,Y:batch_ys})/total_batch 
     if epoch % display_step == 0: # Softmax 
      print ("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(av_cost)) 
    print ("Optimization Finished!") 

    correct_prediction = tf.equal(tf.argmax(hypothesis,1),tf.argmax(Y,1)) 
    accuray = tf.reduce_mean(tf.cast(correct_prediction,'float32')) 
    print("Accuracy:",sess.run(accuray,feed_dict={X:mnist.test.images,Y:mnist.test.labels}))

und das ist die, die ich tat softmax und cross_entropy getrennt (Genauigkeit: 89%)

import tensorflow as tf #89 % accuracy one 
import numpy as np 
from tensorflow.examples.tutorials.mnist import input_data 

mnist = input_data.read_data_sets("MNIST_data", one_hot=True) 

X = tf.placeholder(shape=[None,784],dtype=tf.float32) 
Y = tf.placeholder(shape=[None,10],dtype=tf.float32) 

W1= tf.Variable(tf.random_normal([784,20])) 
b1= tf.Variable(tf.random_normal([20])) 
layer1 = tf.nn.softmax(tf.matmul(X,W1)+b1) 

W2 = tf.Variable(tf.random_normal([20,10])) 
b2 = tf.Variable(tf.random_normal([10])) 


#logits = tf.matmul(layer1,W2)+b2 
#cost_i= tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y) 

logits = tf.matmul(layer1,W2)+b2 

hypothesis = tf.nn.softmax(logits) 
cost = tf.reduce_mean(tf.reduce_sum(-Y*tf.log(hypothesis))) 


optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost) 

batch_size = 100 
train_epoch = 25 
display_step = 1 
with tf.Session() as sess: 
    sess.run(tf.initialize_all_variables()) 

    for epoch in range(train_epoch): 
     av_cost = 0 
     total_batch = int(mnist.train.num_examples/batch_size) 
     for batch in range(total_batch): 
      batch_xs, batch_ys = mnist.train.next_batch(batch_size) 
      sess.run(optimizer,feed_dict={X:batch_xs,Y:batch_ys}) 
     av_cost += sess.run(cost,feed_dict={X:batch_xs,Y:batch_ys})/total_batch 
     if epoch % display_step == 0: # Softmax 
      print ("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(av_cost)) 
    print ("Optimization Finished!") 

    correct_prediction = tf.equal(tf.argmax(hypothesis,1),tf.argmax(Y,1)) 
    accuray = tf.reduce_mean(tf.cast(correct_prediction,'float32')) 
    print("Accuracy:",sess.run(accuray,feed_dict={X:mnist.test.images,Y:mnist.test.labels}))

Quelle

2017-05-12 Kanna Kim

Wenn Sie tf.reduce_sum() im oberen Beispiel zu verwenden, wie Sie in den unteren tun, sollen Sie in der Lage sein, ähnliche Ergebnisse mit beiden Methoden zu erreichen: cost = tf.reduce_mean(tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))).

Ich erhöhte die Anzahl der Trainingsepochen auf 50 und erreichte Genauigkeiten von 93,06% (tf.nn.softmax_cross_entropy_with_logits()) und 93,24% (Softmax und Crossentropie getrennt), so dass die Ergebnisse ziemlich ähnlich sind.

Quelle

2017-05-12 06:38:40 ml4294

Es wirkt wie ein Zauber Ich dachte cost_i = tf.nn.softmax_cross_entropy_with_logits (Logits = Logits, Etiketten = Y) und tf.reduce_sum (Y * tf.log (Hypothese)) war eine gleiche Sache –

Von Tensorflow API here der zweite Weg ist cost = tf.reduce_mean(tf.reduce_sum(-Y*tf.log(hypothesis))) numerisch instabil, und aus diesem Grund Sie nicht dieselben Ergebnisse,

Was auch immer bekommen, Sie auf meine GitHub die Umsetzung von numerisch stabilen Kreuzentropie Verlustfunktion finden Das hat das gleiche Ergebnis wie tf.nn.softmax_cross_entropy_with_logits() Funktion.

Sie können sehen, dass tf.nn.softmax_cross_entropy_with_logits() nicht berechnet die großen Zahlen Softmax Normalisierung, nur annähern sie, mehr Details sind in README Abschnitt.

Quelle

2017-05-12 07:51:11

Warum machen Softmax und Crossentropy separat unterschiedliche Ergebnisse, als wenn sie gemeinsam mit softmax_cross_entropy_with_logits arbeiten?

Antwort

Verwandte Themen