2015-08-19 3 views
8

Ich habe ein Cluster gebildet durch zwei Slaves und einem Master und einrichten und I vorlegen ein Glas (Scala) an die Funken Master (192.168.1.64):Zündkerzentreiber dissoziiert und von dem Master entfernt

spark-submit --master spark://spark-master:7077 --class tests.elements target/scala-2.10/zzz-project_2.10-1.0.jar 

irgendwann läuft nach ganz gut so stoppt abrupt mit den letzten Zeilen auf dem Terminal ist

... 
15/08/19 17:45:24 INFO scheduler.TaskSchedulerImpl: Adding task set 411292.0 with 6 tasks 
15/08/19 17:45:24 WARN scheduler.TaskSetManager: Stage 411292 contains a task of very large size (2762 KB). The maximum recommended task size is 100 KB. 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 411292.0 (TID 1832, 192.168.1.64, PROCESS_LOCAL, 2828792 bytes) 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 411292.0 (TID 1833, 192.168.1.62, PROCESS_LOCAL, 2310009 bytes) 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 411292.0 (TID 1834, 192.168.1.64, PROCESS_LOCAL, 2669188 bytes) 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 411292.0 (TID 1835, 192.168.1.62, PROCESS_LOCAL, 2295676 bytes) 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 411292.0 (TID 1836, 192.168.1.64, PROCESS_LOCAL, 2847786 bytes) 
15/08/19 17:45:24 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 411292.0 (TID 1837, 192.168.1.64, PROCESS_LOCAL, 2913528 bytes) 
Killed 

und den Fehler im Hauptprotokoll auftritt, ist die folgende:

... 
15/08/19 16:09:49 INFO master.Master: Launching executor app-20150819160949-0001/0 on worker worker-20150819160925-192.168.1.64-51640 
15/08/19 16:09:49 INFO master.Master: Launching executor app-20150819160949-0001/1 on worker worker-20150819160938-192.168.1.62-38007 
15/08/19 16:15:44 INFO master.Master: akka.tcp://[email protected]:46823 got disassociated, removing it. 
15/08/19 16:15:44 INFO master.Master: Removing app app-20150819160949-0001 
15/08/19 16:15:44 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:46823] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
15/08/19 16:15:44 WARN master.Master: Application testPageRank is still in progress, it may be terminated abnormally. 
... 

Beide Arbeiter haben in ihren Protokollen so etwas wie dieses

... 
15/08/19 16:15:49 INFO worker.Worker: Executor app-20150819160949-0001/0 finished with state EXITED message Command exited with code 1 exitStatus 1 
15/08/19 16:15:50 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:54799] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 

und

... 
15/08/19 16:15:43 INFO worker.Worker: Executor app-20150819160949-0001/1 finished with state EXITED message Command exited with code 1 exitStatus 1 
15/08/19 16:15:43 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:53325] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 

sind. Die Arbeit/app-Dateien enthalten so etwas wie dieses

... 
15/08/19 16:15:41 INFO executor.Executor: Finished task 1.0 in stage 387758.0 (TID 1803). 1911 bytes result sent to driver 
15/08/19 16:15:41 INFO executor.Executor: Finished task 4.0 in stage 387758.0 (TID 1806). 1911 bytes result sent to driver 
15/08/19 16:15:41 INFO storage.BlockManager: Found block rdd_1206_5 locally 
15/08/19 16:15:41 INFO executor.Executor: Finished task 5.0 in stage 387758.0 (TID 1807). 1911 bytes result sent to driver 
15/08/19 16:15:41 INFO storage.BlockManager: Found block rdd_1206_3 locally 
15/08/19 16:15:41 INFO executor.Executor: Finished task 3.0 in stage 387758.0 (TID 1805). 1911 bytes result sent to driver 
15/08/19 16:15:44 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.64:46823 disassociated! Shutting down. 
15/08/19 16:15:44 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:46823] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
15/08/19 16:15:45 INFO storage.DiskBlockManager: Shutdown hook called 
15/08/19 16:15:46 INFO util.Utils: Shutdown hook called 

und

... 
15/08/19 16:15:41 INFO storage.BlockManager: Found block rdd_1206_0 locally 
15/08/19 16:15:41 INFO executor.Executor: Finished task 2.0 in stage 387758.0 (TID 1804). 1911 bytes result sent to driver 
15/08/19 16:15:41 INFO executor.Executor: Finished task 0.0 in stage 387758.0 (TID 1802). 1911 bytes result sent to driver 
15/08/19 16:15:42 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.64:46823 disassociated! Shutting down. 
15/08/19 16:15:42 INFO storage.DiskBlockManager: Shutdown hook called 
15/08/19 16:15:42 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:46823] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
15/08/19 16:15:42 INFO util.Utils: Shutdown hook called 

sind. Es scheint keinen anderen Fehler in hdfs oder Sparks zu geben.

Ich vermute, dass der Fehler liegt im Master-Protokoll, die dritte Zeile (15/08/19 16:15:44 INFO master.Master: akka.tcp://[email protected]:46823 got disassociated, removing it.), aber ich kann nicht herausfinden, warum. Ich habe versucht, die spark.akka.heartbeat.interval auf 100 zu ändern, wie in einigen Posts vorgeschlagen, aber kein Glück. Jeder würde wissen, warum es passiert und wie man das löst? Vielen Dank.

+0

Ich glaube nicht, dass dies das gleiche Problem ist. – sofia

+0

Ich habe dein Problem. Finde eine Lösung? – theShadow89

+0

Nicht wirklich. Irgendwann bin ich zu anderen Dingen umgezogen. Aber ich vermute, dass es mit der Tatsache zu tun haben könnte, dass mein Cluster in Bezug auf RAM zu klein war (2-3 Maschinen mit jeweils 4-6G RAM). Daher bin ich immer noch an einer Antwort interessiert. – sofia

Antwort

Verwandte Themen