2016-07-27 8 views
1

ich gerade gebaut ein Funke 2.0 Stand-alone-Single Node-Cluster auf Ubuntu 14. Der Versuch, einen pyspark Job submit:, wie man richtig Funken Jobs auf einem eigenständigen Cluster

~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master spark://ip-10-180-191-14:7077 examples/src/main/python/pi.py 

Funken gibt mir diese Nachricht:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 

ist die komplette Ausgabe:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/07/27 17:45:18 INFO SparkContext: Running Spark version 2.0.0 
16/07/27 17:45:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/07/27 17:45:18 INFO SecurityManager: Changing view acls to: ubuntu 
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls to: ubuntu 
16/07/27 17:45:18 INFO SecurityManager: Changing view acls groups to: 
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls groups to: 
16/07/27 17:45:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set() 
16/07/27 17:45:19 INFO Utils: Successfully started service 'sparkDriver' on port 36842. 
16/07/27 17:45:19 INFO SparkEnv: Registering MapOutputTracker 
16/07/27 17:45:19 INFO SparkEnv: Registering BlockManagerMaster 
16/07/27 17:45:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e25f3ae9-be1f-4ea3-8f8b-b3ff3ec7e978 
16/07/27 17:45:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 
16/07/27 17:45:19 INFO SparkEnv: Registering OutputCommitCoordinator 
16/07/27 17:45:19 INFO log: Logging initialized @1986ms 
16/07/27 17:45:19 INFO Server: jetty-9.2.16.v20160414 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/job,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/job/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/pool,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandl[email protected]{/stages/pool/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/rdd,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/rdd/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/environment,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/environment/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/threadDump,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/threadDump/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/static,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/api,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage/kill,null,AVAILABLE} 
16/07/27 17:45:19 INFO ServerConnector: Started [email protected]{HTTP/1.1}{0.0.0.0:4040} 
16/07/27 17:45:19 INFO Server: Started @2150ms 
16/07/27 17:45:19 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
16/07/27 17:45:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.180.191.14:4040 
16/07/27 17:45:19 INFO Utils: Copying /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py to /tmp/spark-ee1ceb06-a7c4-4b18-8577-adb02f97f31e/userFiles-565d5e0b-5879-40d3-8077-d9d782156818/pi.py 
16/07/27 17:45:19 INFO SparkContext: Added file file:/home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py at spark://10.180.191.14:36842/files/pi.py with timestamp 1469641519759 
16/07/27 17:45:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-180-191-14:7077... 
16/07/27 17:45:19 INFO TransportClientFactory: Successfully created connection to ip-10-180-191-14/10.180.191.14:7077 after 25 ms (0 ms spent in bootstraps) 
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20160727174520-0006 
16/07/27 17:45:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39047. 
16/07/27 17:45:20 INFO NettyBlockTransferService: Server created on 10.180.191.14:39047 
16/07/27 17:45:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.180.191.14:39047 with 366.3 MB RAM, BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/metrics/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/execution,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/execution/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/static/sql,null,AVAILABLE} 
16/07/27 17:45:20 INFO SharedState: Warehouse path is 'file:/home/ubuntu/spark/spark-2.0.0/spark-warehouse'. 
16/07/27 17:45:20 INFO SparkContext: Starting job: reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43 
16/07/27 17:45:20 INFO DAGScheduler: Got job 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) with 2 output partitions 
16/07/27 17:45:20 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) 
16/07/27 17:45:20 INFO DAGScheduler: Parents of final stage: List() 
16/07/27 17:45:20 INFO DAGScheduler: Missing parents: List() 
16/07/27 17:45:20 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43), which has no missing parents 
16/07/27 17:45:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.6 KB, free 366.3 MB) 
16/07/27 17:45:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.0 KB, free 366.3 MB) 
16/07/27 17:45:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.180.191.14:39047 (size: 3.0 KB, free: 366.3 MB) 
16/07/27 17:45:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012 
16/07/27 17:45:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) 
16/07/27 17:45:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
16/07/27 17:45:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
16/07/27 17:45:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 

ich renne nicht Funken o n Oberseite von hadoop oder Garn, allein, alleinstehend. Was kann ich tun, um diese Jobs mit Funken zu bearbeiten?

Antwort

2

Versuchen Einstellmeisters auf lokale wie diese, um den lokalen Modus zu verwenden:

~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master local[2] examples/src/main/python/pi.py 

können Sie auch die verwenden müssen

--py-files 

Option. Spark submit options

1

Wenn Sie wie oben beschrieben den Master auf local setzen, wird Ihr Programm nur im lokalen Modus ausgeführt - was für Anfänger/kleine Lasten für einen einzelnen Computer geeignet ist. Es wird jedoch nicht für die Ausführung auf einem Cluster konfiguriert. Was Sie brauchen, um zu tun, um Ihr Programm in einem realen Cluster (und möglicherweise auf mehreren Maschinen), um einen Master setzt und Slaves die Skripts befindet sich mit:

<spark-install-dir>/start-master.sh

Ihre Sklaven (Sie müssen mindestens ein) soll mit gestartet werden:

<spark-install-dir> start-slave.sh spark://<master-address>:7077

So kann man in der Lage, in einem echten Cluster-Modus laufen zu lassen - die Benutzeroberfläche zeigt Ihnen, Ihre Mitarbeiter und Arbeitsplätze usw. Sie den Haupt-UI sehen in Port 8080 auf dem Master Maschine. Port 4040 auf dem Computer, auf dem der Treiber ausgeführt wird, zeigt die Benutzeroberfläche der Anwendung an. Port 8081 zeigt Ihnen die UI des Arbeiters (wenn Sie viele Slaves auf demselben Rechner verwenden, sind die Ports 8081 für den ersten, 8082 für den zweiten usw.)

Sie können so viele Slaves ausführen, wie Sie möchten viele Maschinen - und liefern die Anzahl der Kerne für jeden Slave (es ist möglich, ein paar Slaves von derselben Maschine bereitzustellen - geben Sie ihnen einfach die entsprechende Anzahl von Kernen/RAM - damit Sie den Scheduler nicht verwirren).

Verwandte Themen