2016-08-17 5 views
5

Ich habe Cluster mit 1 Master und 6 Sklaven, die vorgefertigte Version von Hadoop 2.6.0 und Spark 1.6.2 verwendet. Ich habe Hadoop-MR und Funke-Jobs ohne Probleme mit Openjdk 7 auf allen Knoten installiert. Jedoch, wenn ich openjdk 7 auf openjdk 8 auf allen Knoten aktualisiert habe, funken submit und funke-shell mit Garn verursachten Fehler.Laufgarn mit Funke funktioniert nicht mit Java 8

16/08/17 14:06:22 ERROR client.TransportClient: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelExce  ption 
java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$m  cV$sp(YarnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelException 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) 
     at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
     at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
     at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
     ... 1 more 
Caused by: java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 ERROR spark.SparkContext: Error initializing SparkContext. 
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "/home/hd_spark/spark2/python/pyspark/shell.py", line 49, in <module> 
    spark = SparkSession.builder.getOrCreate() 
    File "/home/hd_spark/spark2/python/pyspark/sql/session.py", line 169, in getOrCreate 
    sc = SparkContext.getOrCreate(sparkConf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 294, in getOrCreate 
    SparkContext(conf=conf or SparkConf()) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 115, in __init__ 
    conf, jsc, profiler_cls) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 168, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 233, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__ 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 

Ich habe JAVA_HOME auf Bashrc exportiert und haben die openjdk 8 als Standard-Java-Set mit

sudo update-alternatives --config java 
sudo update-alternatives --config javac 

diese Befehle. Auch ich habe mit oracle java 8 versucht und der gleiche Fehler kommt auf. Der Container, der sich an den Slave-Knoten anmeldet, hat den gleichen Fehler wie unten.

SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/filecache/17/__spark_libs__8247267244939901627.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
16/08/17 14:05:11 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: [email protected] 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for TERM 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for HUP 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for INT 
16/08/17 14:05:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 78 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 1 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/appcache/application_1471352972661_0005/blockmgr-d9f23a56-1420-4cd4-abfd-ae9e128c688c 
16/08/17 14:05:12 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 
16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrai[email protected]:37417 
16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 
16/08/17 14:05:13 INFO storage.DiskBlockManager: Shutdown hook called 
16/08/17 14:05:13 INFO util.ShutdownHookManager: Shutdown hook called 

Ich habe versucht, mit Funken 1.6.2 vorgefertigte Version, funkt 2.0 vorgefertigte Version und auch mit Funken 2.0 versucht, indem sie es selbst zu bauen.

Der Hadoop-Job funktioniert auch nach dem Upgrade auf Java 8. Wenn ich zurück zu Java 7 wechsle, funktioniert Spark einwandfrei.

Meine Scala-Version ist 2.11 und OS ist Ubuntu 14.04.4 LTS.

Es wird sehr gut sein, wenn jemand mir eine Idee geben kann, um dieses Problem zu lösen.

Danke!

ps Ich habe meine IP-Adresse als xxx.xxx.xxx.xx in den Protokollen geändert.

+0

wie die Arbeiter Sieht versuchen, die Einrichtung Verbindung zum Treiber herstellen, aber scheitern: '16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Verbindung zum Treiber: spark: //[email protected]: 37417 16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM'. Was sagt das Fahrerprotokoll? –

+0

Wo finde ich das Treiberprotokoll? Ich habe Arbeiterknotenprotokolle im Verzeichnis hadoop/logs/userlog gefunden, aber ich kann keine Protokolle finden, die sich auf den Treiber im Hauptknoten beziehen. Im Verzeichnis spark/logs sind nur die Protokolle des Verlaufsservers und der Hadoop/logs/userlog im Hauptknoten leer. Vielen Dank! – jmoa

+0

http://spark.apache.org/docs/latest/running-on-yarn.html –

Antwort

8

Ab 12. September 2016, ist dies ein Blocker Problem: https://issues.apache.org/jira/browse/YARN-4714

Sie können dies beheben, indem Sie die folgenden Eigenschaften in Garn-site.xml

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name> 
    <value>false</value> 
</property> 

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property> 
+0

Danke für die Antwort! Ich bin kürzlich zu Java 7 zurückgekehrt, aber ich werde es versuchen und kommentieren, wenn es funktioniert. – jmoa

+0

@jmoa Irgendwas Glück? – simpleJack

+0

Das funktioniert perfekt für mich. –