2017-04-25 4 views
0

Ich versuche, Hive-Tabellen über ein Java-Programm zugreifen, aber sieht aus wie mein Programm keine Tabelle in der Standarddatenbank sieht. Ich kann jedoch die gleichen Tabellen sehen und sie durch Spark-Shell abfragen. Ich habe hive-site.xml im Verzeichnis Spark conf kopiert. Der einzige Unterschied - die Funken Shell läuft Funken Version 1.6.0, wo mein Java-Programm läuft Funken 2.1.0Hive-Tabellen nicht in Spark SQL gefunden - spark.sql.AnalysisException in Cloudera VM

package spark_210_test; 

import java.util.List; 
import org.apache.spark.SparkConf; 
import org.apache.spark.sql.Dataset; 
import org.apache.spark.sql.Row; 
import org.apache.spark.sql.SparkSession; 

public class SparkTest { 

private static SparkConf sparkConf; 
private static SparkSession sparkSession; 

public static void main(String[] args) { 
    String warehouseLocation = "hdfs://quickstart.cloudera/user/hive/warehouse/"; 
    sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]") 
      .set("spark.sql.warehouse.dir", warehouseLocation); 

    sparkSession = SparkSession 
     .builder() 
     .config(sparkConf) 
     .enableHiveSupport() 
     .getOrCreate(); 

    Dataset<Row> df0 = sparkSession.sql("show tables"); 
    List<Row> currentTablesList = df0.collectAsList(); 
    if (currentTablesList.size() > 0) { 
     for (int i=0; i< currentTablesList.size(); i++) { 
      String table = currentTablesList.get(i).getAs("name"); 
      System.out.printf("%s, ", table); 
     } 
    } 
    else System.out.printf("No Table found for %s.\n", warehouseLocation); 

    Dataset<Row> dfCount = sparkSession.sql("select count(*) from sample_07"); 
    System.out.println(dfCount.collect().toString()); 
} 
} 

Der Ausgang scheint nicht zu irgendetwas aus dem Bienenstock Lager Exception in thread zu lesen „main“ org.apache.spark.sql.AnalysisException: Tabelle oder Ansicht nicht gefunden: sample_07; 1 line pos 21

Die gesamte Ausgabe wird unter

gegeben
SLF4J: Class path contains multiple SLF4J bindings. 
    SLF4J: Found binding in [jar:file:/home/cloudera/workspace/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
    SLF4J: Found binding in [jar:file:/home/cloudera/workspace/PortalHandlerTest.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
    SLF4J: Found binding in [jar:file:/home/cloudera/workspace/SparkTest.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
    SLF4J: Found binding in [jar:file:/home/cloudera/workspace/JARs/slf4j-log4j12-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
    SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). 
log4j:WARN Please initialize the log4j system properly. 
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
17/04/25 12:01:51 INFO SparkContext: Running Spark version 2.1.0 
17/04/25 12:01:51 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 
17/04/25 12:01:51 WARN SparkContext: Support for Scala 2.10 is deprecated as of Spark 2.1.0 
17/04/25 12:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/04/25 12:01:52 INFO SecurityManager: Changing view acls to: cloudera 
17/04/25 12:01:52 INFO SecurityManager: Changing modify acls to: cloudera 
17/04/25 12:01:52 INFO SecurityManager: Changing view acls groups to: 
17/04/25 12:01:52 INFO SecurityManager: Changing modify acls groups to: 
17/04/25 12:01:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set() 
17/04/25 12:01:53 INFO Utils: Successfully started service 'sparkDriver' on port 50644. 
17/04/25 12:01:53 INFO SparkEnv: Registering MapOutputTracker 
17/04/25 12:01:53 INFO SparkEnv: Registering BlockManagerMaster 
17/04/25 12:01:53 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 
17/04/25 12:01:53 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 
17/04/25 12:01:53 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f44e093c-d9a9-42ad-8f5f-9e21b99f0e45 
17/04/25 12:01:53 INFO MemoryStore: MemoryStore started with capacity 375.7 MB 
17/04/25 12:01:53 INFO SparkEnv: Registering OutputCommitCoordinator 
17/04/25 12:01:54 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 
17/04/25 12:01:54 INFO Utils: Successfully started service 'SparkUI' on port 4041. 
17/04/25 12:01:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4041 
17/04/25 12:01:54 INFO Executor: Starting executor ID driver on host localhost 
17/04/25 12:01:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43409. 
17/04/25 12:01:54 INFO NettyBlockTransferService: Server created on 10.0.2.15:43409 
17/04/25 12:01:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 
17/04/25 12:01:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 43409, None) 
17/04/25 12:01:54 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:43409 with 375.7 MB RAM, BlockManagerId(driver, 10.0.2.15, 43409, None) 
17/04/25 12:01:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 43409, None) 
17/04/25 12:01:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 43409, None) 
17/04/25 12:01:54 INFO SharedState: Warehouse path is 'hdfs://quickstart.cloudera/user/hive/warehouse/'. 
17/04/25 12:01:54 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 
17/04/25 12:01:55 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 
17/04/25 12:01:55 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 
17/04/25 12:01:55 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 
17/04/25 12:01:55 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 
17/04/25 12:01:55 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 
17/04/25 12:01:55 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 
17/04/25 12:01:57 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 
17/04/25 12:01:57 INFO ObjectStore: ObjectStore, initialize called 
17/04/25 12:01:57 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 
17/04/25 12:01:57 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 
17/04/25 12:02:01 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 
17/04/25 12:02:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 
17/04/25 12:02:05 INFO Query: Reading in results for query "[email protected]" since the connection used is closing 
17/04/25 12:02:05 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 
17/04/25 12:02:05 INFO ObjectStore: Initialized ObjectStore 
17/04/25 12:02:05 INFO HiveMetaStore: Added admin role in metastore 
17/04/25 12:02:05 INFO HiveMetaStore: Added public role in metastore 
17/04/25 12:02:05 INFO HiveMetaStore: No user is added in admin role, since config is empty 
17/04/25 12:02:06 INFO HiveMetaStore: 0: get_all_databases 
17/04/25 12:02:06 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_all_databases 
17/04/25 12:02:06 INFO HiveMetaStore: 0: get_functions: db=default pat=* 
17/04/25 12:02:06 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_functions: db=default pat=* 
17/04/25 12:02:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 
17/04/25 12:02:07 INFO SessionState: Created local directory: /tmp/135d2e8d-2300-4f62-b445-ec6e8b0461a7_resources 
17/04/25 12:02:07 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7 
17/04/25 12:02:07 INFO SessionState: Created local directory: /tmp/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7 
17/04/25 12:02:07 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/135d2e8d-2300-4f62-b445-ec6e8b0461a7/_tmp_space.db 
17/04/25 12:02:07 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is hdfs://quickstart.cloudera/user/hive/warehouse/ 
17/04/25 12:02:07 INFO HiveMetaStore: 0: get_database: default 
17/04/25 12:02:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default 
17/04/25 12:02:07 INFO HiveMetaStore: 0: get_database: global_temp 
17/04/25 12:02:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: global_temp 
17/04/25 12:02:07 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException 
17/04/25 12:02:08 INFO SparkSqlParser: Parsing command: show tables 
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_database: default 
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default 
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_database: default 
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default 
17/04/25 12:02:12 INFO HiveMetaStore: 0: get_tables: db=default pat=* 
17/04/25 12:02:12 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_tables: db=default pat=*  
No Table found for hdfs://quickstart.cloudera/user/hive/warehouse/. 
17/04/25 12:02:13 INFO SparkSqlParser: Parsing command: select count(*) from sample_07 
17/04/25 12:02:13 INFO HiveMetaStore: 0: get_table : db=default tbl=sample_07 
17/04/25 12:02:13 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_table : db=default tbl=sample_07  
17/04/25 12:02:13 INFO HiveMetaStore: 0: get_table : db=default tbl=sample_07 
17/04/25 12:02:13 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_table : db=default tbl=sample_07  
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: sample_07; line 1 pos 21 
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:478) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:463) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61) 
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:58) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:58) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329) 
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:58) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:463) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:453) 
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) 
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) 
    at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) 
    at scala.collection.immutable.List.foldLeft(List.scala:84) 
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) 
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) 
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64) 
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62) 
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) 
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) 
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) 
    at spark_210_test.SparkTest.main(SparkTest.java:35) 
17/04/25 12:02:13 INFO SparkContext: Invoking stop() from shutdown hook 
17/04/25 12:02:13 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4041 
17/04/25 12:02:13 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
17/04/25 12:02:13 INFO MemoryStore: MemoryStore cleared 
17/04/25 12:02:13 INFO BlockManager: BlockManager stopped 
17/04/25 12:02:13 INFO BlockManagerMaster: BlockManagerMaster stopped 
17/04/25 12:02:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
17/04/25 12:02:13 INFO SparkContext: Successfully stopped SparkContext 
17/04/25 12:02:13 INFO ShutdownHookManager: Shutdown hook called 
17/04/25 12:02:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-7c1cfc73-34b9-463d-b12a-5cbcb832b0f8 

Nur für den Fall hilft es, meine pom.xml unter

<project xmlns="http://maven.apache.org/POM/4.0.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
    <groupId>spark_test_210</groupId> 
    <artifactId>spark_test_210</artifactId> 
    <version>0.0.1-SNAPSHOT</version> 
    <dependencies> 
    <dependency> 
    <groupId>org.apache.spark</groupId> 
    <artifactId>spark-hive_2.10</artifactId> 
    <version>2.1.0</version> 
    </dependency></dependencies> 
    <build> 
    <sourceDirectory>src</sourceDirectory> 
    </build> 
</project> 

Jede Hilfe würde geschätzt

+0

Sie bitte Ihre bieten Funken einreichen Befehl. Implementieren Sie im Cluster-Modus? –

+0

Ich erstelle ein Jar von diesem Projekt und führe es von der Befehlszeile aus. Das ultimative Ziel wäre, ein ähnliches Programm von anderen Programmen aufzurufen. 'jar -cp"/usr/lib/hadoop/lib/*:/usr/lib/spark/lib/* "spark_210_test.SparkTest' Dies ist auf einer Cloudera-VM. – Joydeep

+0

Haben Sie hive-site.xml in Ihrem Spark conf dir? Andernfalls kann der Job nicht wissen, woher er die Metaspeicherinformationen abrufen soll. –

Antwort

2

Mehrere Schritte wurden benötigt.

  1. Verwenden von SparkSession.enableHiveSupport() anstelle von veraltetem SQLContext oder HiveContext.
  2. Kopie hive-site.xml in SPARK CONF (/ usr/lib/Funken/conf) Verzeichnis
  3. das gleiche Verzeichnis auf dem Classpath hinzufügen, während das Glas (Dank an Paul und Samson oben) Ausführung
0

Befolgen Sie die oben genannten Schritte wie von Joydeep angegeben. Geben Sie außerdem den Datenbanknamen zusammen mit dem Namen der Tabelle oder der Ansicht an. Wie employee.emp

import java.io.File; 

import org.apache.spark.sql.SparkSession; 

/** 
* @author dinesh.lomte 
* 
*/ 
public class SparkHiveExample { 

    public static void main(String... args) { 
     String warehouseLocation = new File("spark-employee").getAbsolutePath(); 
     SparkSession spark = SparkSession.builder().appName("Java Spark Hive Example").master("local") 
       .config("spark.sql.employee.dir", warehouseLocation).enableHiveSupport().getOrCreate(); 

     spark.sql("SELECT * FROM employee.emp").show(); 
    } 
} 
Verwandte Themen