Ich teste Google Cloud ML, um mein ML-Modell mit Tensorflow zu beschleunigen.Unterstützt Google Cloud ML GPU?
Leider scheint Google Cloud ML extrem langsam zu sein. Mein Mainstream-Level-PC ist mindestens 10x schneller als Google Cloud ML.
Ich bezweifle, dass es GPU verwendet, also habe ich einen Test gemacht. Ich modifizierte eine Probe code, um die Verwendung von GPU erzwingen.
diff --git a/mnist/trainable/trainer/task.py b/mnist/trainable/trainer/task.py
index 9acb349..a64a11d 100644
--- a/mnist/trainable/trainer/task.py
+++ b/mnist/trainable/trainer/task.py
@@ -131,11 +131,12 @@ def run_training():
images_placeholder, labels_placeholder = placeholder_inputs(
FLAGS.batch_size)
- # Build a Graph that computes predictions from the inference model.
- logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)
+ with tf.device("/gpu:0"):
+ # Build a Graph that computes predictions from the inference model.
+ logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)
- # Add to the Graph the Ops for loss calculation.
- loss = mnist.loss(logits, labels_placeholder)
+ # Add to the Graph the Ops for loss calculation.
+ loss = mnist.loss(logits, labels_placeholder)
# Add to the Graph the Ops that calculate and apply gradients.
train_op = mnist.training(loss, FLAGS.learning_rate)
Dieses Training Code funktioniert an meinem PC (gcloud beta ml local train ...
), aber nicht in der Wolke. Es gibt Fehler wie folgt:
"Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 239, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 235, in main
run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 177, in run_training
sess.run(init)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Cannot assign a device to node 'softmax_linear/biases': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices:
ApplyGradientDescent: CPU
Identity: CPU
Assign: CPU
Variable: CPU
[[Node: softmax_linear/biases = Variable[container="", dtype=DT_FLOAT, shape=[10], shared_name="", _device="/device:GPU:0"]()]]
Unterstützt Google Cloud ML GPU?
Haben Sie Glück, sie zur Arbeit zu bringen? Wenn ich versuche, meinen Job Angabe einer GPU auszuführen, setzt der Job gerade in der Warteschlange ... '' ' gcloud Beta ml Jobs Ausbildung gpu_job_basic_gpu \ --package-path = Zug \ --staging-Eimer einreichen = "$ {STAGING_BUCKET}" \ --Modulname = zug.1-multiplizieren \ --region = us-central1 \ --scale-tier = BASIC_GPU '' ' – eggie5
Probieren Sie die Region us-east1 aus. –
wow, das hat funktioniert. Was gab Ihnen die Intuition, nach Osten zu gehen, wenn die Ärzte klar sagen, dass die Zentrale funktionieren sollte? – eggie5