Hallo Ich habe eine Wield-Situation, wenn Sie Schätzer + Experiment-Klasse für verteiltes Training verwenden.tensorflow verteilte Ausbildung w/Schätzer + Experiment-Framework
Hier ist ein Beispiel: https://gist.github.com/protoget/2cf2b530bc300f209473374cf02ad829
Dies ist ein einfacher Fall, dass
- DNNClassifier von TF offiziellen Tutorial Rahmen
- Experiment
- 1 Arbeiter und 1 ps auf dem gleichen Host verwendet, um mit unterschiedlichem Häfen.
Was passiert ist,
1), wenn ich ps Job starten, es sieht gut aus:
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:200] Initialize GrpcChannelCache for job ps -> {0 -> localhost:9000}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:200] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:9001}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:221] Started server with target: grpc://localhost:9000
2), wenn ich Arbeiter Job starten, der Job unbemerkt beendet, überhaupt kein Protokoll verlassen .
Eagerly Hilfe suchend.