2016-03-31 9 views
0

Wir haben eine 1 GB CSV-Datei, die wir versuchen, in Hive-Tabellen zu laden. Zuerst verschieben wir die Daten in die temporäre Tabelle unter Verwendung der folgenden Hive-Abfrage. Diese Datei enthält 656 Spalten. Dann verschieben wir die Daten von tmp Tabelle zu Staging-Tabelle mit der folgenden Abfrage. GC-Overhead-Limit überschritten Fehler beim Verschieben von Daten aus temporärer Tabelle in Staging-Tabelle

use ${hiveconf:database_name}; 
SET mapred.job.queue.name=root.dev; 
set hive.exec.max.dynamic.partitions.pernode = 500; 
SET hive.variable.substitute.depth=100; 
SET PATTERN='\\^'; 
SET REPLACEMENT=''; 
INSERT OVERWRITE TABLE STAGING_TABLE partition(FILE_NAME="${hiveconf:PARTITION_BY}") 
             SELECT 
             COLUMN1, 
             COLUMN2, 
             .. 
             COLUMN656 
             FROM TEMP_TABLE; 

Während ich das obige Skript ausführe, erhalte ich den folgenden Fehler.



Logging initialized using configuration in file:/opt/mapr/hive/hive-0.13/conf/hive-log4j.properties 
OK 
Time taken: 0.486 seconds 
Total jobs = 3 
Launching Job 1 out of 3 
Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1455125666889_268626, Tracking URL = 
Kill Command = /opt/mapr/bin/hadoop job -kill job_1455125666889_268626 Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0 
2016-03-29 01:46:37,753 Stage-1 map = 0%, reduce = 0% 
2016-03-29 01:47:11,979 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 599.7 sec 
2016-03-29 01:47:15,076 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 669.22 sec 
2016-03-29 01:47:18,169 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 738.77 sec 
2016-03-29 01:47:19,200 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 753.23 sec 
2016-03-29 01:47:46,028 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 1366.8 sec 
2016-03-29 01:47:47,067 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 1388.92 sec 
2016-03-29 01:47:51,216 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 1429.25 sec 
2016-03-29 01:47:52,245 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1470.08 sec MapReduce Total cumulative CPU time: 24 minutes 30 seconds 80 msec Ended Job = job_1455125666889_268626 
Stage-4 is filtered out by condition resolver. 
Stage-3 is selected by condition resolver. 
Stage-5 is filtered out by condition resolver. 
Launching Job 3 out of 3 
Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1455125666889_268633, Tracking URL = 
Kill Command = /opt/mapr/bin/hadoop job -kill job_1455125666889_268633 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 
2016-03-29 01:48:01,025 Stage-3 map = 0%, reduce = 0% 
2016-03-29 01:49:01,552 Stage-3 map = 0%, reduce = 0%, Cumulative CPU 240.81 sec 
2016-03-29 01:49:11,808 Stage-3 map = 16%, reduce = 0%, Cumulative CPU 300.08 sec 
2016-03-29 01:49:17,956 Stage-3 map = 0%, reduce = 0% 
2016-03-29 01:50:18,409 Stage-3 map = 0%, reduce = 0%, Cumulative CPU 243.14 sec 
2016-03-29 01:50:25,577 Stage-3 map = 16%, reduce = 0%, Cumulative CPU 284.99 sec 
2016-03-29 01:50:31,717 Stage-3 map = 0%, reduce = 0% 
2016-03-29 01:51:32,060 Stage-3 map = 0%, reduce = 0%, Cumulative CPU 255.38 sec 
2016-03-29 01:51:41,264 Stage-3 map = 16%, reduce = 0%, Cumulative CPU 302.65 sec 
2016-03-29 01:51:47,396 Stage-3 map = 0%, reduce = 0% 
2016-03-29 01:52:47,713 Stage-3 map = 0%, reduce = 0%, Cumulative CPU 230.81 sec 
2016-03-29 01:53:03,040 Stage-3 map = 100%, reduce = 0% MapReduce Total cumulative CPU time: 3 minutes 50 seconds 810 msec Ended Job = job_1455125666889_268633 with errors Error during job, obtaining debugging information... 
Examining task ID: task_1455125666889_268633_m_000000 (and more) from job job_1455125666889_268633 

Task with the most failures(4): 
----- 
Task ID: 
    task_1455125666889_268633_m_000000 

----- 
Diagnostic Messages for this Task: 
Error: GC overhead limit exceeded 

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
MapReduce Jobs Launched: 
Job 0: Map: 7 Cumulative CPU: 1478.78 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS 
Job 1: Map: 1 Cumulative CPU: 230.81 sec MAPRFS Read: 0 MAPRFS Write: 0 FAIL 
Total MapReduce CPU Time Spent: 28 minutes 29 seconds 590 msec 


Wenn die Dateigröße weniger als 300 MB ist, haben wir keine Probleme mit der obigen Abfrage. Wenn die Dateigröße über 300 MB liegt, erhalten wir das GC-Limit-Problem.

Als wir das infra-Team fragten, wurde uns gesagt, wir sollten unsere Anfrage umschreiben.Kann jemand bitte erklären, was wir in der obigen Frage falsch machen?

Vielen Dank im Voraus

Antwort

Verwandte Themen