2016-03-28 10 views
0

Wir verwenden DSE Spark mit einem 3-Knoten-Cluster mit 5 Jobs. Wir sehen, dass SIGTERM-Befehle in die Datei /var/log/spark/worker/worker-0/worker.log gelangen, die unsere Jobs stoppen. Während dieser Zeiten sehen wir keine entsprechenden Speicher- oder Prozessorbeschränkungen, und niemand hat diese Aufrufe manuell ausgeführt.Datastax-Spark-Jobs ohne Grund getötet

Ich habe ein paar ähnliche Probleme gesehen, die zu einem Heap-Größenproblem mit YARN oder Mesos führen, aber da wir DSE verwenden, schienen diese nicht relevant zu sein.

Unten finden Sie eine Probe des Protokollinfos von 1 Server, die 2 der Jobs ausgeführt wurde:

ERROR [SIGTERM handler] 2016-03-26 00:43:28,780 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM 
ERROR [SIGHUP handler] 2016-03-26 00:43:28,788 SignalLogger.scala:57 - RECEIVED SIGNAL 1: SIGHUP 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:28,795 Logging.scala:59 - Killing process! 
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr] 2016-03-26 00:43:28,848 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr 
java.io.IOException: Stream closed 
     at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71] 
     at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71] 
     at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout] 2016-03-26 00:43:28,892 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout 
java.io.IOException: Stream closed 
     at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71] 
     at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71] 
     at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
ERROR [SIGTERM handler] 2016-03-26 00:43:29,070 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,079 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated ! 
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,080 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect... 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,081 Logging.scala:59 - Connecting to master akka.tcp://[email protected]:7077/user/Master... 
WARN [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,091 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,101 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated ! 
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect... 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:59 - Not spawning another attempt to register with the master, since there is an attempt scheduled already. 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,323 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:49943] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,330 Logging.scala:59 - Executor app-20160325132151-0004/0 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,414 Logging.scala:59 - Killing process! 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,415 Logging.scala:59 - Executor app-20160325131848-0001/0 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,417 Logging.scala:59 - Killing process! 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,422 Logging.scala:59 - Unknown Executor app-20160325132151-0004/0 finished with state EXITED message Worker shutting down exitStatus 129 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,425 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]7:32874] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,433 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:56212] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,441 Logging.scala:59 - Executor app-20160325131918-0002/1 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,448 Logging.scala:59 - Unknown Executor app-20160325131918-0002/1 finished with state EXITED message Worker shutting down exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,448 Logging.scala:59 - Shutdown hook called 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,449 Logging.scala:59 - Deleting directory /var/lib/spark/rdd/spark-28fa2f73-d2aa-44c0-ad4e-3ccfd07a95d2 

Antwort

0

Fehler mir scheint gerade nach vorne

Fehler beim Schreiben der Strom/var Datei/lib/Zünd-/Arbeiter/Arbeiter-0/app-20160325131848-0.001/0/stdout java.io.IOException: Strom bei java.io.BufferedInputStream.getBufIfOpen (BufferedInputStream.java:170) geschlossen

Entweder spielt hier ein Netzwerkproblem zwischen Ihrer Datenquelle (Cassandra) und Spark. Denken Sie daran, dass Spark in Wirklichkeit auf Knoten1 Daten von Knoten2 von Cassandra ziehen kann/will, obwohl es versucht, diese zu minimieren.

Oder Ihre Serialisierung hat Problem. Fügen Sie diesen Parameter in Spark-Konfiguration hinzu, um zu Kryo zu wechseln.

spark.serializer=org.apache.spark.serializer.KryoSerializer