Wie bekomme ich NN und NNS von einem Text?

Ich möchte NN oder NNS aus einem Beispieltext wie im folgenden Skript angegeben erhalten. Zu diesem Zweck, wenn ich den Code unten verwenden, ist die Ausgabe:Wie bekomme ich NN und NNS von einem Text?

types 
synchronization 
phase 
synchronization 
-RSB- 
synchronization 
-LSB- 
-RSB- 
projection 
synchronization

Hier warum ich erhalte [-RSB-] oder [-LSB-]? Sollte ich ein anderes Muster verwenden, um NN oder NNS gleichzeitig zu bekommen?

   atic = "So far, many different types of synchronization have been investigated, such as complete synchronization [8], generalized synchronization [9], phase synchronization [10], lag synchronization [11], projection synchronization [12, 13], and so forth."; 

Reader reader = new StringReader(atic); 
DocumentPreprocessor dp = new DocumentPreprocessor(reader);   
docs_terms_unq.put(rs.getString("u"), new ArrayList<String>()); 
docs_terms.put(rs.getString("u"), new ArrayList<String>()); 

for (List<HasWord> sentence : dp) { 

List<TaggedWord> tagged = tagger.tagSentence(sentence); 
GrammaticalStructure gs = parser.predict(tagged); 


Tree x = parserr.parse(sentence); 
System.out.println(x); 
TregexPattern NPpattern = TregexPattern.compile("@NN|NNS"); 
TregexMatcher matcher = NPpattern.matcher(x); 


while (matcher.findNextMatchingNode()) { 

Tree match = matcher.getMatch(); 
ArrayList hh = match.yield();  
Boolean b = false; 

System.out.println(hh.toString());}

Quelle

2016-04-27 mlee_jordan

Ich weiß nicht, warum diese kommen. Sie erhalten jedoch genauere POS-Tags, wenn Sie den Wortbestandteil-Tagger verwenden. Ich würde vorschlagen, nur direkt auf die Anmerkung zu schauen. Hier ist ein Beispielcode.

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.util.CoreMap; 

import java.util.Properties; 

public class NNExample { 

    public static void main(String[] args) { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "So far, many different types of synchronization have been investigated, such as complete " + 
       "synchronization [8], generalized synchronization [9], phase synchronization [10], " + 
       "lag synchronization [11], projection synchronization [12, 13], and so forth."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { 
      for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 
       String partOfSpeechTag = token.get(CoreAnnotations.PartOfSpeechAnnotation.class); 
       if (partOfSpeechTag.equals("NN") || partOfSpeechTag.equals("NNS")) { 
        System.out.println(token.word()); 
       } 
      } 
     } 
    } 
}

Und der Ausgang bekomme ich.

types 
synchronization 
synchronization 
synchronization 
phase 
synchronization 
lag 
synchronization 
projection 
synchronization

Quelle

2016-04-28 01:37:25 StanfordNLPHelp

Ihnen sehr danken. Ich habe in der Tat erkannt, dass bei Verwendung meiner früheren Vorgehensweise weniger Nomen verwendet wurden! –

Eine kleine Frage. Sollte ich den gleichen Ansatz verwenden, wenn NP gewünscht wird? Für den, den ich gepostet habe, verwende ich ihn wie TregexPattern.compile ("@ NP! << @NP"). Kann ich partOfSpeechTag.equals ("@ NP! << @NP") verwenden? –

Hier ist ein Beispiel für die NP aus einem Satz bekommen:

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.Word; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.trees.*; 

import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Properties; 

public class TreeExample { 

    public static void printNounPhrases(Tree inputTree) { 
     if (inputTree.label().value().equals("NP")) { 
      ArrayList<Word> words = new ArrayList<Word>(); 
      for (Tree leaf : inputTree.getLeaves()) { 
       words.addAll(leaf.yieldWords()); 
      } 
      System.out.println(words); 
     } else { 
      for (Tree subTree : inputTree.children()) { 
       printNounPhrases(subTree); 
      } 
     } 
    } 

    public static void main (String[] args) throws IOException { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "Susan Thompson is from Florida."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
       TreeCoreAnnotations.TreeAnnotation.class); 
     //System.out.println(sentenceTree); 
     printNounPhrases(sentenceTree); 
    } 

}

Quelle

2016-04-29 01:54:22 StanfordNLPHelp

Wenn ich dieses Beispiel mit dem Beispieltext versuche, den ich in meinem Post gegeben habe, bekam ich: '[complete, Synchronisation, -LSB-, 8, -RSB-, ,, verallgemeinerte, Synchronisation, -LSB-, 9, -RRSB- , ,, phase, synchronisation, -LSB-, 10, -RSB- ,, ,, lag, synchronisation, -LSB-, 11, -RSB- ,, ,, projection, synchronisation, -LSB-, 12, ,, 13, - RSB-] ' –

Wie bekomme ich NN und NNS von einem Text?

Antwort

Verwandte Themen