2017-11-19 1 views
0

Ich habe drei Tabellen in Mysql namens Film, Filmgenre, Genre. Wenn ich versuche, sie in hdfs zu importieren Sqoop Freiform Abfrage:sqoop Importabfrage gibt doppelten Namen Fehler

sqoop import --connect jdbc:mysql://localhost/movielens --username user --password *** 
--query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name 
from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid 
WHERE $CONDITIONS' --split-by m.id --target-dir /user/sqoop/moviegenre 

wirft Fehler:

Imported Failed: Duplicate Column identifier specified: 'name'.

Wenn ich die gleiche Abfrage in MySQL schreiben Sie es mir Ausgang gibt, ist das, was ich wollen:

id  movie_name   year genre_name 
1  Toy Story   1995 Animation 
2  Jumanji   1995 Adventure 
..  .......   .... ........ 

ich diesen Link gefolgt und hat genau wie Antwort: Imported Failed: Duplicate Column identifier specified (sqoop) aber, dass entweder gar nicht zu helfen scheint.

Felder in den Tabellen sind wie folgt: Film = ID, Name, Jahr

Genre = id, name

moviegenre = MovieID, GenreID

Bitte zeigen Sie den Fehler in meiner Anfrage aus .

Antwort

0

Es gibt keine Fehler im SQOOP-Befehl. Ich habe gerade die Tabellen in meiner Cloudera-Schnellstart-VM erstellt und den SQOOP-Import ausgeführt. Es hat gut funktioniert und die Ergebnisse produziert. Wahrscheinlich haben Sie den Befehl vor dem Hinzufügen der Aliase ausgeführt. Der andere Unterschied ist, dass ich den Befehl formatiert habe.

Falls Sie es benötigt, ist unten, was ich lief

MySQL Befehl

mysql> use retail_db; 
mysql> create table movie (id integer, name varchar(100), year integer); 
mysql> insert into movie values (1, 'TEST', 2016); 
mysql> create table genre (id integer, name varchar(100)); 
mysql> insert into genre values (1, 'TEST'); 
mysql> create table moviegenre (movieid integer, genreid integer); 
mysql> insert into moviegenre values (1, 1); 

Sqoop Befehl

sqoop import \ 
--connect jdbc:mysql://localhost/retail_db \ 
--username root \ 
--password cloudera \ 
--query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE $CONDITIONS' --split-by m.id --target-dir /user/cloudera/moviegenre 

Sqoop Standardausgabe

[[email protected] ~]$ sqoop import \ 
> --connect jdbc:mysql://localhost/retail_db \ 
> --username root \ 
> --password cloudera \ 
> --query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE $CONDITIONS' --split-by m.id --target-dir /user/cloudera/moviegenre 
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. 
Please set $ACCUMULO_HOME to the root of your Accumulo installation. 
17/11/19 13:08:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0 
17/11/19 13:08:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
17/11/19 13:08:01 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 
17/11/19 13:08:01 INFO tool.CodeGenTool: Beginning code generation 
17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE (1 = 0) 
17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE (1 = 0) 
17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE (1 = 0) 
17/11/19 13:08:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce 
Note: /tmp/sqoop-cloudera/compile/3b35f51458e53da94c6852dcfc0b904a/QueryResult.java uses or overrides a deprecated API. 
Note: Recompile with -Xlint:deprecation for details. 
17/11/19 13:08:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/3b35f51458e53da94c6852dcfc0b904a/QueryResult.jar 
17/11/19 13:08:04 INFO mapreduce.ImportJobBase: Beginning query import. 
17/11/19 13:08:04 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 
17/11/19 13:08:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 
17/11/19 13:08:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 
17/11/19 13:08:05 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
17/11/19 13:08:08 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException 
    at java.lang.Object.wait(Native Method) 
    at java.lang.Thread.join(Thread.java:1281) 
    at java.lang.Thread.join(Thread.java:1355) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:951) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:689) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:878) 
17/11/19 13:08:08 INFO db.DBInputFormat: Using read commited transaction isolation 
17/11/19 13:08:08 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(t1.id), MAX(t1.id) FROM (select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE (1 = 1)) AS t1 
17/11/19 13:08:08 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1 to: 1 
17/11/19 13:08:08 INFO mapreduce.JobSubmitter: number of splits:1 
17/11/19 13:08:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510865312807_0011 
17/11/19 13:08:09 INFO impl.YarnClientImpl: Submitted application application_1510865312807_0011 
17/11/19 13:08:09 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1510865312807_0011/ 
17/11/19 13:08:09 INFO mapreduce.Job: Running job: job_1510865312807_0011 
17/11/19 13:08:33 INFO mapreduce.Job: Job job_1510865312807_0011 running in uber mode : false 
17/11/19 13:08:33 INFO mapreduce.Job: map 0% reduce 0% 
17/11/19 13:08:52 INFO mapreduce.Job: map 100% reduce 0% 
17/11/19 13:08:52 INFO mapreduce.Job: Job job_1510865312807_0011 completed successfully 
17/11/19 13:08:52 INFO mapreduce.Job: Counters: 30 
    File System Counters 
     FILE: Number of bytes read=0 
     FILE: Number of bytes written=148008 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=99 
     HDFS: Number of bytes written=17 
     HDFS: Number of read operations=4 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=2 
    Job Counters 
     Launched map tasks=1 
     Other local map tasks=1 
     Total time spent by all maps in occupied slots (ms)=16886 
     Total time spent by all reduces in occupied slots (ms)=0 
     Total time spent by all map tasks (ms)=16886 
     Total vcore-seconds taken by all map tasks=16886 
     Total megabyte-seconds taken by all map tasks=17291264 
    Map-Reduce Framework 
     Map input records=1 
     Map output records=1 
     Input split bytes=99 
     Spilled Records=0 
     Failed Shuffles=0 
     Merged Map outputs=0 
     GC time elapsed (ms)=702 
     CPU time spent (ms)=2330 
     Physical memory (bytes) snapshot=231743488 
     Virtual memory (bytes) snapshot=1567064064 
     Total committed heap usage (bytes)=221249536 
    File Input Format Counters 
     Bytes Read=0 
    File Output Format Counters 
     Bytes Written=17 
17/11/19 13:08:52 INFO mapreduce.ImportJobBase: Transferred 17 bytes in 46.7631 seconds (0.3635 bytes/sec) 
17/11/19 13:08:52 INFO mapreduce.ImportJobBase: Retrieved 1 records. 
+0

Nachdem ich deine Antwort gesehen habe, habe ich versucht, es auf verschiedenen VM es funktioniert wahrlich. Scheint, sqoop funktionierte nicht richtig. –