2017-05-18 4 views
1

Ich bin Anfänger zu Hadoop nur einige der Tutorial-Projekte geübt. Zunächst tat Projekt in hadoop mit Python, wo ich kann in der Lage separat Mapper und Minderer-Datei angeben hadoop jar /usr/local/hadoop/hadoop-2.8.0/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input input1 -output joboutputRunning Hadoop-Job ohne Erstellen von JAR-Datei

Aber ich mag das gleiche in Java zu tun, aber wo ich Tutorials nur von JAR-Dateien zu erstellen. Ich habe keine Möglichkeiten gefunden, Java Mapper und Reducer Code zu debuggen. Gibt es Ideen oder Möglichkeiten, unseren Code mit Debug-Optionen zu testen?

Hiermit poste ich Screenshots, in die ich mich stoße.

Sample input file in csv

Mapper-Code

package SalesCountry; 

import java.io.IOException; 

import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.*; 

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 
    //private final static IntWritable one = new IntWritable(1); 

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 

     String valueString = value.toString(); 
     String[] SingleCountryData = valueString.split(","); 
     output.collect(new Text(SingleCountryData[7]), new IntWritable(Integer.parseInt(SingleCountryData[2]))); 
    } 
} 

Reducer-Code

`package SalesCountry; 

import java.io.IOException; 
import java.util.*; 

import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.*; 

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 

    public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { 
     Text key = t_key; 
     int salesForCountry = 0; 
     while (values.hasNext()) { 
      // replace type of value with the actual type of our value 
      IntWritable value = (IntWritable) values.next(); 
      salesForCountry += value.get(); 

     } 
     output.collect(key, new IntWritable(salesForCountry)); 
    } 
} 
` 

Terminal-Ausgang

$HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales 
17/05/18 12:52:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
17/05/18 12:52:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
17/05/18 12:52:47 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
17/05/18 12:52:47 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 
17/05/18 12:52:47 INFO mapred.FileInputFormat: Total input files to process : 1 
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: number of splits:1 
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1862814770_0001 
17/05/18 12:52:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
17/05/18 12:52:47 INFO mapreduce.Job: Running job: job_local1862814770_0001 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Waiting for map tasks 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Starting task: attempt_local1862814770_0001_m_000000_0 
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 
17/05/18 12:52:47 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
17/05/18 12:52:47 INFO mapred.MapTask: Processing split: file:/home/deevita/MapReduceTutorial/inputMapReduce/SalesJan2009.csv:0+123638 
17/05/18 12:52:47 INFO mapred.MapTask: numReduceTasks: 1 
17/05/18 12:52:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
17/05/18 12:52:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
17/05/18 12:52:47 INFO mapred.MapTask: soft limit at 83886080 
17/05/18 12:52:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
17/05/18 12:52:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
17/05/18 12:52:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
17/05/18 12:52:47 INFO mapred.LocalJobRunner: map task executor complete. 
17/05/18 12:52:47 WARN mapred.LocalJobRunner: job_local1862814770_0001 
java.lang.Exception: java.lang.NumberFormatException: For input string: "Price" 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) 
Caused by: java.lang.NumberFormatException: For input string: "Price" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.parseInt(Integer.java:615) 
    at SalesCountry.SalesMapper.map(SalesMapper.java:17) 
    at SalesCountry.SalesMapper.map(SalesMapper.java:10) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 running in uber mode : false 
17/05/18 12:52:48 INFO mapreduce.Job: map 0% reduce 0% 
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 failed with state FAILED due to: NA 
17/05/18 12:52:48 INFO mapreduce.Job: Counters: 0 
java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873) 
    at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148) 
[email protected]:~/MapReduceTutorial$ $HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales 
17/05/18 16:15:12 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
17/05/18 16:15:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
17/05/18 16:15:12 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/deevita/MapReduceTutorial/mapreduce_output_sales already exists 
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) 
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270) 
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:141) 
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) 
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) 
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870) 
    at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148) 

Ich weiß, es Zahlenformat Ausnahme aber zu kompilieren dies ist, ich habe also jedes Mal, jar-Datei erstellen Gibt es eine Möglichkeit zur Ausführung mapreduce allein jedes Mal ohne Glas Gebäude

Antwort

0

Die Number von einem weißen Raum kommen können (Notwendigkeit zuerst getrimmt werden).

Ich würde vorschlagen, dass Sie Komponententest für Ihre Jobs schreiben, die es ermöglichen würden, sie zu debuggen, ohne den gesamten Jar/Deploy-Zyklus zu machen.

Hier ist ein Beispiel mit mrunit.

<dependency> 
<groupId>org.apache.mrunit</groupId> 
<artifactId>mrunit</artifactId> 
<version>1.0.0</version> 
<classifier>hadoop1</classifier> 
<scope>test</scope> 
</dependency> 

-Test

public class HadoopTest { 
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; 

@Before 
public void setUp() { 
    SalesMapper mapper = new SalesMapper(); 
    mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); 
    mapDriver.setMapper(mapper); 
} 

@Test 
public void testMapper() throws Exception { 
    mapDriver.withInput(new LongWritable(1), new Text("date,product,1200,Visa,carolina,baslidoni,england,UK")); 
    mapDriver.withOutput(new Text("UK"), new IntWritable(1200)); 
    mapDriver.runTest(); 
} 
}