Wie definiert man Token in SpaCy NLP in Python?

Ich möchte SpaCy Funktionen von NLP in meiner FlaskApp verwenden. Ich habe schon verschiedene Beispiele auf offiziellen Websites suchen: (für Spacy) https://spacy.io/docs/usage/tutorials Wie definiert man Token in SpaCy NLP in Python?

und (für Flask) https://realpython.com/blog/python/flask-by-example-part-3-text-processing-with-requests-beautifulsoup-nltk/

In MyWebApp I-Code eine Folge von NLP-Analyse schreiben von parse_news_from:

@app.route('/submit', methods=['POST']) 
def submit_textarea(): 
    if(parse_news_from(format(request.form["text"]))): 
     print("The news is parsed sucessfully!"); 
    return talk_title;

Derzeit arbeitet parse_news_from mit NLTK-Bibliothek, aber ich werde SpaCy verwenden. Hier ist mein Code für spacig aus offiziellen Quellen:

from spacy.en import English 
import _regex 
parser = English() 

# Test Data 
multiSentence = "There is an art, it says, or rather, a knack to flying." \ 
       "The knack lies in learning how to throw yourself at the ground and miss." \ 
       "In the beginning the Universe was created. This has made a lot of people "\ 
       "very angry and been widely regarded as a bad move." 
# all you have to do to parse text is this: 
#note: the first time you run spaCy in a file it takes a little while to load up its modules 
parsedData = parser(multiSentence) 

# Let's look at the tokens 
# All you have to do is iterate through the parsedData 
# Each token is an object with lots of different properties 
# A property with an underscore at the end returns the string representation 
# while a property without the underscore returns an index (int) into spaCy's vocabulary 
# The probability estimate is based on counts from a 3 billion word 
# corpus, smoothed using the Simple Good-Turing method. 
for i, token in enumerate(parsedData): 
    print("original:", token.orth, token.orth_) 
    print("lowercased:", token.lower, token.lower_) 
    print("lemma:", token.lemma, token.lemma_) 
    print("shape:", token.shape, token.shape_) 
    print("prefix:", token.prefix, token.prefix_) 
    print("suffix:", token.suffix, token.suffix_) 
    print("log probability:", token.prob) 
    print("Brown cluster id:", token.cluster) 
    print("----------------------------------------") 
    if i > 1: 
     break

Nach der Ausführung habe ich einen Fehler:

File "/home/xxx/anaconda3/lib/python3.6/site-packages/_regex_core.py", line 21, in <module> 
    import _regex 
ImportError: /home/xxx/anaconda3/lib/python3.6/site-packages/_regex.cpython-36m-x86_64-linux-gnu.so: undefined symbol: PySlice_AdjustIndices

Gibt es Arbeitsbeispiele, wie es zu tun für den Start? Wo ist mein Fehler? Danke

Quelle

2017-05-21 Vasyl Lyashkevych

Ich fand das Problem mit dem oben genannten Fehler und es ist so unvorhersehbar für mich. Es ist hier beschrieben: How to fix a python spaCy error: "undefined symbol: PySlice_AdjustIndices"?

Quelle

2017-05-21 10:40:41

Wie definiert man Token in SpaCy NLP in Python?

Antwort

Verwandte Themen