2017-02-25 3 views
1

Mit Bezug auf diese Seite habe ich ein ähnliches Problem wie er ist. Ich muss eine Karte zur Verfügung stellen und Methode reduzieren, um Wortlänge (1 bis n) Häufigkeit zu zählen. reference links Ich habe die Methode der Antwort versucht, diese Implementierung zu haben.MapReduce Hadoop Wortlänge Frequenz funktioniert nicht

import java.io.IOException; 
import java.util.StringTokenizer; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 

public class WordCount { 

    //Mapper which implement the mapper() function 
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { 
    //public static class TokenizerMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> { 

    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     while (itr.hasMoreTokens()) { 
     //check whether word is start from a or b 
     String wordToCheck = itr.nextToken(); 
     word.set(String.valueOf(wordToCheck.length())); 
     context.write(word, one); 
     //if (wordToCheck.startsWith("a")||wordToCheck.startsWith("b")){ 
     // word.set(wordToCheck); 
     // context.write(word, one); 
     //} 
     //check for word length 
     //if (wordToCheck.length() > 8) { 
     // } 
     } 
    } 
    } 
    //Reducer which implement the reduce() function 
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 
    private IntWritable result = new IntWritable(); 

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { 
     int sum = 0; 
     for (IntWritable val : values) { 
     sum += val.get(); 
     } 
     result.set(sum); 
     context.write(key, result); 
    } 
    } 
    //Driver class to specific the Mapper and Reducer 
    public static void main(String[] args) throws Exception { 
    Configuration conf = new Configuration(); 
    Job job = Job.getInstance(conf, "word count"); 
    job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 
    job.setMapOutputKeyClass(Text.class); 
    job.setMapOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 
    System.exit(job.waitForCompletion(true) ? 0 : 1); 
    } 
} 

Ich habe die folgenden Ausnahmen. mit Hadoop 2.6.3 in einem ubuntu LTXTerminal

17/02/25 17:02:34 INFO mapreduce.Job: map 0% reduce 0% 
17/02/25 17:02:36 INFO mapreduce.Job: map 100% reduce 0% 
17/02/25 17:02:36 INFO mapreduce.Job: Task Id : attempt_1488013180963_0001_m_000000_2, Status : FAILED 
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable 
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069) 
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) 
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) 
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) 
    at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

Ich entwickle diese Klasse in Eclipse Kepler und diese Klasse als JAR-Datei ausführen. Was ist das Problem? Ich habe auch versucht, IntWritable wie in der Antwort vorgeschlagen, aber es hat auch ähnliche Reaktionen.

Antwort

1

Ich bin nicht 100% sicher, aber wenn Sie Dateien als Eingabe verwenden, sollte Mapper LongWritable Typ für Schlüssel (entsprechend der Zeilennummer in einer Datei) und für Werte (Dateizeile als Text) haben.

So ist die mögliche Lösung könnte

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { 

mit

public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> { 
+0

Gut beantworteten Frage zu ersetzen sein. Ich danke dir sehr. –

Verwandte Themen