2016-08-11 2 views
1

Ich habe erfolgreich installieren Spark1.6 und Anaconda2. Wenn ich versuche zu verwenden ipython, habe ich das Problem, wie folgend:Was ist die Bedeutung der Fehlermeldung mit Ipython in Spark?

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. 

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
java.io.IOException: Cannot run program "/root/anaconda2/bin": error=13,Permission denied 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at  org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:161) 
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87) 
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63) 
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) 
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) 
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

Caused by: java.io.IOException: error=13, Permission denied 
at java.lang.UNIXProcess.forkAndExec(Native Method) 
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) 
at java.lang.ProcessImpl.start(ProcessImpl.java:130) 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) 
... 14 more 

Driver stacktrace: 
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) 
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) 
at scala.Option.foreach(Option.scala:236) 
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) 
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) 
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) 
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) 
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) 
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) 
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) 
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) 
at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:393) 
at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
at py4j.Gateway.invoke(Gateway.java:259) 
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
at py4j.commands.CallCommand.execute(CallCommand.java:79) 
at py4j.GatewayConnection.run(GatewayConnection.java:209) 
at java.lang.Thread.run(Thread.java:745) 

Caused by: java.io.IOException: Cannot run program "/root/anaconda2/bin": error=13, Permission denied 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) 
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:161) 
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87) 
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63) 
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) 
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) 
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
... 1 more 
Caused by: java.io.IOException: error=13, Permission denied 
at java.lang.UNIXProcess.forkAndExec(Native Method) 
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) 
at java.lang.ProcessImpl.start(ProcessImpl.java:130) 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) 
... 14 more 

Der ipython Code, den ich wie folgt zu verwenden, und ich bekam die Fehlermeldung, wenn ich die letzte Zeile codiert.

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel 

Laden und analysieren die Daten

def parsePoint(line): 
    values = [float(x) for x in line.replace(',', ' ').split(' ')] 
    return LabeledPoint(values[0], values[1:]) 

data = sc.textFile("data/mllib/ridge-data/lpsa.data") 
parsedData = data.map(parsePoint) 

beim Aufbau der Modellfehler

model = LinearRegressionWithSGD.train(parsedData, iterations=100, step=0.00000001) 
+0

Bitte geben Sie uns weitere Informationen. Welches System benutzen Sie (Windows, Linux oder OSX)? Installationsverzeichnis von Anaconda? Wenn Linux, dann haben Sie wahrscheinlich sudo bei der Installation von Anaconda (sudo bash Anaconda.xx.sh) verwendet, aufgrund dessen es um die Erlaubnis von root bittet. – ashwinids

+0

Das System, das ich verwende, ist Linux-Centos. Ich installiere das Anaconda in root Benutzerdesktop. Ich habe die Anaconda als Root-Benutzer installiert. Und ich benutze diesen Befehl als root Benutzer. – Chauncey

+0

Wie haben Sie Ihren obigen Code ausgeführt? Vom Notebook, von der pyspark-Shell oder von der Funkenvorlage? Was ist dein 'PYSPARK_PYTHON'? – ShuaiYuan

Antwort

0

aufgetreten Ich schlage vor, Sie zuerst die Verteilung anaconda entfernen

sudo -r rm anaconda_installation_path 

Dann installieren es ohne sudo

sh Anaconda.xx.sh 

Weitere Informationen finden Sie unter page.

0

wieder installieren Anakonda, setzen Anakonda in/opt/anaconda Verzeichnis mein problm lösen

Verwandte Themen