Ich habe ein paar einfache Spark-Jobs und einige Tests für sie geschrieben. Ich habe alles in IntelliJ gemacht und es funktioniert großartig. Nun möchte ich sicherstellen, dass mein Code mit sbt
erstellt wird. Kompilieren ist in Ordnung, aber ich bekomme seltsame Fehler beim Laufen und Testen.Kann Funke Jobs lokal nicht mit sbt ausführen, funktioniert aber in IntelliJ
Ich bin mit Scala Version 2.11.8
und sbt
Version 0.13.8
Meine build.sbt
Datei sieht wie folgt aus:
name := "test"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "javax.mail" % "javax.mail-api" % "1.5.6"
libraryDependencies += "com.sun.mail" % "javax.mail" % "1.5.6"
libraryDependencies += "commons-cli" % "commons-cli" % "1.3.1"
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "3.0.0" % "test"
libraryDependencies += "com.holdenkarau" % "spark-testing-base_2.11" % "2.0.0_0.4.4" % "test" intransitive()
Ich versuche, meinen Code mit sbt "run-main com.test.email.processor.bin.Runner"
Hier laufen die Ausgabe:
[info] Loading project definition from /Users/max/workplace/test/project
[info] Set current project to test (in build file:/Users/max/workplace/test/)
[info] Running com.test.email.processor.bin.Runner -j recipientCount -e /Users/max/workplace/data/test/enron_with_categories/*/*.txt
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/23 18:46:55 INFO SparkContext: Running Spark version 2.0.0
16/08/23 18:46:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/23 18:46:55 INFO SecurityManager: Changing view acls to: max
16/08/23 18:46:55 INFO SecurityManager: Changing modify acls to: max
16/08/23 18:46:55 INFO SecurityManager: Changing view acls groups to:
16/08/23 18:46:55 INFO SecurityManager: Changing modify acls groups to:
16/08/23 18:46:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(max); groups with view permissions: Set(); users with modify permissions: Set(max); groups with modify permissions: Set()
16/08/23 18:46:56 INFO Utils: Successfully started service 'sparkDriver' on port 61759.
16/08/23 18:46:56 INFO SparkEnv: Registering MapOutputTracker
16/08/23 18:46:56 INFO SparkEnv: Registering BlockManagerMaster
16/08/23 18:46:56 INFO DiskBlockManager: Created local directory at /private/var/folders/75/4dydy_6110v0gjv7bg265_g40000gn/T/blockmgr-9eb526c0-b7e5-444a-b186-d7f248c5dc62
16/08/23 18:46:56 INFO MemoryStore: MemoryStore started with capacity 408.9 MB
16/08/23 18:46:56 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/23 18:46:56 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/23 18:46:56 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.11:4040
16/08/23 18:46:56 INFO Executor: Starting executor ID driver on host localhost
16/08/23 18:46:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61760.
16/08/23 18:46:57 INFO NettyBlockTransferService: Server created on 192.168.1.11:61760
16/08/23 18:46:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.11:61760 with 408.9 MB RAM, BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.11, 61760)
16/08/23 18:46:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 128.0 KB, free 408.8 MB)
16/08/23 18:46:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.6 KB, free 408.8 MB)
16/08/23 18:46:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.11:61760 (size: 14.6 KB, free: 408.9 MB)
16/08/23 18:46:57 INFO SparkContext: Created broadcast 0 from wholeTextFiles at RecipientCountJob.scala:22
16/08/23 18:46:58 WARN ClosureCleaner: Expected a closure; got com.test.email.processor.util.cleanEmail$
16/08/23 18:46:58 INFO FileInputFormat: Total input paths to process : 1702
16/08/23 18:46:58 INFO FileInputFormat: Total input paths to process : 1702
16/08/23 18:46:58 INFO CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
16/08/23 18:46:58 INFO SparkContext: Starting job: take at RecipientCountJob.scala:35
16/08/23 18:46:58 WARN DAGScheduler: Creating new stage failed due to exception - job: 0
java.lang.ClassNotFoundException: scala.Function0
at sbt.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:63)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.immutable.Range.foreach(Range.scala:166)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.twitter.chill.KryoBase.<init>(KryoBase.scala:41)
at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:57)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:86)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:274)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:259)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:175)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:182)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:178)
at org.apache.spark.shuffle.sort.SortShuffleManager$.canUseSerializedShuffle(SortShuffleManager.scala:187)
at org.apache.spark.shuffle.sort.SortShuffleManager.registerShuffle(SortShuffleManager.scala:99)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:90)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:91)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:235)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:233)
at org.apache.spark.scheduler.DAGScheduler.visit$2(DAGScheduler.scala:418)
at org.apache.spark.scheduler.DAGScheduler.getAncestorShuffleDependencies(DAGScheduler.scala:433)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:288)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:394)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:391)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:391)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:403)
at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:304)
at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:339)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:849)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/08/23 18:46:58 INFO DAGScheduler: Job 0 failed: take at RecipientCountJob.scala:35, took 0.076653 s
[error] (run-main-0) java.lang.ClassNotFoundException: scala.Function0
java.lang.ClassNotFoundException: scala.Function0
[trace] Stack trace suppressed: run last compile:runMain for the full output.
16/08/23 18:46:58 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:175)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1229)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:67)
16/08/23 18:46:58 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:67)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:66)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:66)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:65)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1229)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:64)
java.lang.RuntimeException: Nonzero exit code: 1
Haben Sie Scala 2.11 installiert? –
Ich habe es installiert, aber wie kann ich sbt wissen, wo es ist? – Max
Solange SCALA_HOME gesetzt ist, sind Sie gut –