2015-12-09 6 views
5

Ich teste Installationsskript in Ubuntu 14.04-Instanz von AWS. Instanztyp c4.xlarge, mit EBS 50 GB. Bei jeder Installation starte ich mit dem Testen einer neuen Instanz, die ich erstellt habe.Nltk-Daten können nicht auf Ubuntu 14.04 von AWS-Instance-Typ installiert werden c4.xlarge

Constantly die Nltk-Daten können nicht auf panlex_lite-Paket installiert werden.

Irgendwelche Ideen? (Ich habe viele Zeilen von der Installation angehängt, um mich mit den Informationen zu identifizieren, die ich sehe. Sorry für die langen Listen).

Danke,

Die Befehle, die ich tun, bevor die nltk Daten sind:

sudo apt-get install python3-setuptools -y 
sudo apt-get install python3.4-dev -y 

# Installing Python packages 
sudo easy_install3 pip 
sudo easy_install3 inflect 
sudo easy_install3 elasticsearch 
sudo easy_install3 geopy 
sudo easy_install3 geojson 
sudo easy_install3 simplejson 
sudo easy_install3 python_instagram 
sudo easy_install3 flickrapi 
sudo easy_install3 oauth 
sudo easy_install3 xlrd 
sudo easy_install3 pytz 
sudo easy_install3 tweepy 
sudo easy_install3 BeautifulSoup4 
sudo easy_install3 psutil 
sudo pip3 install -U nltk 
sudo pip3 install -U numpy 
sudo python3 -m nltk.downloader all 

Letzte Zeile ausfällt. Log wird die folgende ausgehend von der Oberfläche von psutil:

Finished processing dependencies for psutil 
sudo: unable to resolve host ip-172-30-0-207 
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. 
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. 
Collecting nltk 
    Downloading nltk-3.1.tar.gz (1.1MB) 
Installing collected packages: nltk 
    Running setup.py install for nltk 
Successfully installed nltk-3.1 
sudo: unable to resolve host ip-172-30-0-207 
The directory '/home/ubuntu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. 
The directory '/home/ubuntu/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. 
Collecting numpy 
    Downloading numpy-1.10.1.tar.gz (4.0MB) 
Installing collected packages: numpy 
    Running setup.py install for numpy 
Successfully installed numpy-1.10.1 
sudo: unable to resolve host ip-172-30-0-207 
[nltk_data] Downloading collection 'all' 
[nltk_data] | 
[nltk_data] | Downloading package abc to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/abc.zip. 
[nltk_data] | Downloading package alpino to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/alpino.zip. 
[nltk_data] | Downloading package biocreative_ppi to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/biocreative_ppi.zip. 
[nltk_data] | Downloading package brown to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/brown.zip. 
[nltk_data] | Downloading package brown_tei to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/brown_tei.zip. 
[nltk_data] | Downloading package cess_cat to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/cess_cat.zip. 
[nltk_data] | Downloading package cess_esp to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/cess_esp.zip. 
[nltk_data] | Downloading package chat80 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/chat80.zip. 
[nltk_data] | Downloading package city_database to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/city_database.zip. 
[nltk_data] | Downloading package cmudict to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/cmudict.zip. 
[nltk_data] | Downloading package comparative_sentences to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/comparative_sentences.zip. 
[nltk_data] | Downloading package comtrans to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package conll2000 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/conll2000.zip. 
[nltk_data] | Downloading package conll2002 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/conll2002.zip. 
[nltk_data] | Downloading package conll2007 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package crubadan to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/crubadan.zip. 
[nltk_data] | Downloading package dependency_treebank to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/dependency_treebank.zip. 
[nltk_data] | Downloading package europarl_raw to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/europarl_raw.zip. 
[nltk_data] | Downloading package floresta to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/floresta.zip. 
[nltk_data] | Downloading package framenet_v15 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/framenet_v15.zip. 
[nltk_data] | Downloading package gazetteers to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/gazetteers.zip. 
[nltk_data] | Downloading package genesis to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/genesis.zip. 
[nltk_data] | Downloading package gutenberg to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/gutenberg.zip. 
[nltk_data] | Downloading package ieer to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/ieer.zip. 
[nltk_data] | Downloading package inaugural to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/inaugural.zip. 
[nltk_data] | Downloading package indian to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/indian.zip. 
[nltk_data] | Downloading package jeita to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package kimmo to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/kimmo.zip. 
[nltk_data] | Downloading package knbc to /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package lin_thesaurus to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/lin_thesaurus.zip. 
[nltk_data] | Downloading package mac_morpho to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/mac_morpho.zip. 
[nltk_data] | Downloading package machado to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package masc_tagged to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package moses_sample to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping models/moses_sample.zip. 
[nltk_data] | Downloading package movie_reviews to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/movie_reviews.zip. 
[nltk_data] | Downloading package names to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/names.zip. 
[nltk_data] | Downloading package nombank.1.0 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package nps_chat to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/nps_chat.zip. 
[nltk_data] | Downloading package oanc_masc to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package omw to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/omw.zip. 
[nltk_data] | Downloading package opinion_lexicon to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/opinion_lexicon.zip. 
[nltk_data] | Downloading package paradigms to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/paradigms.zip. 
[nltk_data] | Downloading package pil to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/pil.zip. 
[nltk_data] | Downloading package pl196x to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/pl196x.zip. 
[nltk_data] | Downloading package ppattach to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/ppattach.zip. 
[nltk_data] | Downloading package problem_reports to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/problem_reports.zip. 
[nltk_data] | Downloading package propbank to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package ptb to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/ptb.zip. 
[nltk_data] | Downloading package oanc_masc to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Package oanc_masc is already up-to-date! 
[nltk_data] | Downloading package product_reviews_1 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/product_reviews_1.zip. 
[nltk_data] | Downloading package product_reviews_2 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/product_reviews_2.zip. 
[nltk_data] | Downloading package pros_cons to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/pros_cons.zip. 
[nltk_data] | Downloading package qc to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/qc.zip. 
[nltk_data] | Downloading package reuters to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package rte to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/rte.zip. 
[nltk_data] | Downloading package semcor to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package senseval to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/senseval.zip. 
[nltk_data] | Downloading package sentiwordnet to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/sentiwordnet.zip. 
[nltk_data] | Downloading package sentence_polarity to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/sentence_polarity.zip. 
[nltk_data] | Downloading package shakespeare to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/shakespeare.zip. 
[nltk_data] | Downloading package sinica_treebank to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/sinica_treebank.zip. 
[nltk_data] | Downloading package smultron to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/smultron.zip. 
[nltk_data] | Downloading package state_union to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/state_union.zip. 
[nltk_data] | Downloading package stopwords to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/stopwords.zip. 
[nltk_data] | Downloading package subjectivity to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/subjectivity.zip. 
[nltk_data] | Downloading package swadesh to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/swadesh.zip. 
[nltk_data] | Downloading package switchboard to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/switchboard.zip. 
[nltk_data] | Downloading package timit to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/timit.zip. 
[nltk_data] | Downloading package toolbox to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/toolbox.zip. 
[nltk_data] | Downloading package treebank to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/treebank.zip. 
[nltk_data] | Downloading package twitter_samples to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/twitter_samples.zip. 
[nltk_data] | Downloading package udhr to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/udhr.zip. 
[nltk_data] | Downloading package udhr2 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/udhr2.zip. 
[nltk_data] | Downloading package unicode_samples to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/unicode_samples.zip. 
[nltk_data] | Downloading package universal_treebanks_v20 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package verbnet to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/verbnet.zip. 
[nltk_data] | Downloading package webtext to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/webtext.zip. 
[nltk_data] | Downloading package wordnet to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/wordnet.zip. 
[nltk_data] | Downloading package wordnet_ic to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/wordnet_ic.zip. 
[nltk_data] | Downloading package words to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/words.zip. 
[nltk_data] | Downloading package ycoe to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/ycoe.zip. 
[nltk_data] | Downloading package rslp to /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping stemmers/rslp.zip. 
[nltk_data] | Downloading package hmm_treebank_pos_tagger to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping taggers/hmm_treebank_pos_tagger.zip. 
[nltk_data] | Downloading package maxent_treebank_pos_tagger to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping taggers/maxent_treebank_pos_tagger.zip. 
[nltk_data] | Downloading package universal_tagset to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping taggers/universal_tagset.zip. 
[nltk_data] | Downloading package maxent_ne_chunker to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping chunkers/maxent_ne_chunker.zip. 
[nltk_data] | Downloading package punkt to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping tokenizers/punkt.zip. 
[nltk_data] | Downloading package book_grammars to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping grammars/book_grammars.zip. 
[nltk_data] | Downloading package sample_grammars to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping grammars/sample_grammars.zip. 
[nltk_data] | Downloading package spanish_grammars to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping grammars/spanish_grammars.zip. 
[nltk_data] | Downloading package basque_grammars to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping grammars/basque_grammars.zip. 
[nltk_data] | Downloading package large_grammars to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping grammars/large_grammars.zip. 
[nltk_data] | Downloading package tagsets to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping help/tagsets.zip. 
[nltk_data] | Downloading package snowball_data to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package bllip_wsj_no_aux to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping models/bllip_wsj_no_aux.zip. 
[nltk_data] | Downloading package word2vec_sample to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping models/word2vec_sample.zip. 
[nltk_data] | Downloading package panlex_swadesh to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Downloading package mte_teip5 to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/mte_teip5.zip. 
[nltk_data] | Downloading package averaged_perceptron_tagger to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping taggers/averaged_perceptron_tagger.zip. 
[nltk_data] | Downloading package panlex_lite to 
[nltk_data] |  /home/ubuntu/nltk_data... 
[nltk_data] | Unzipping corpora/panlex_lite.zip. 

Error installing package. Retry? [n/y/e] 

Auch ist es nicht Größe Ausnahme:

Filesystem  1K-blocks Used Available Use% Mounted on 
/dev/xvda1  51466360 6582776 42687092 14%/
none     4  0   4 0% /sys/fs/cgroup 
udev    3824796  8 3824788 1% /dev 
tmpfs    765952  360 765592 1% /run 
none    5120  0  5120 0% /run/lock 
none    3829752  0 3829752 0% /run/shm 
none    102400  0 102400 0% /run/user 
+0

Siehe https: // github.com/nltk/nltk/issues/1283 # issementcomment-188251568 – alvas

Antwort

9

ich über das gleiche Problem kam, als ein alter AWS tutorial für Stimmungsanalyse von tweet Daten. Dieses Tutorial verwendet eine Bootstrap-Skript NLTK und seine Daten mit dem Befehl auf einem EMR-Cluster zu installieren,

$ sudo python -m nltk.downloader -d /usr/share/nltk_data all 

Auf diesen Befehl ausführen ich genau das gleiche Problem von panlex_lite Installation. Da dies ein Bootstrap-Skript ist, erscheint die Eingabeaufforderung

Fehler beim Installieren des Pakets. Wiederholen? [n/y/e]

bewirkt, dass die Bootstrap-Aktion fehlschlägt und der EMR-Cluster beendet wird. : P

Ich habe dies überwunden durch: A) vorausgesetzt, dass dieses Paket nicht wesentlich ist B) Ändern des Befehls zu, ein 'n' automatisch übergeben, so dass das Skript nicht unbegrenzt warten.

$ yes n | sudo python -m nltk.downloader -d /usr/share/nltk_data all 

Hoffe, das hilft.

Update 25Jan2016: Der Datensatz mit dem Namen 'panlex_lite' verursacht weiterhin einen Fehler bei der Installation.

+0

Vielen Dank. Das hilft. Ich werde das machen. – user1902346

+0

Update: Die Bootstrap-Aktion des EMR-Clusters schlägt auch nach '$ yes n | noch fehl sudo python -m nltk.downloader -d/usr/share/nltk_data all' Ich fand heraus, dass das einzige Paket, das von NLTK DATA benötigt wird, _punkt_ ist, daher habe ich den NLTK-Befehl bearbeitet, um es nur zu installieren. '$ sudo python -m nltk.downloader -d/usr/share/nltk_data punkt' – gprakhar

+0

Aktualisiert für 20.1.16 - schlägt immer noch fehl. Ich werde an nltk_dala schreiben, um diesen Beitrag zu sehen. – user1902346

0

Ich bin nicht sicher, meine Erfahrung.

In here wird ein link für Corpora/Panlex angegeben und eine Verknüpfung zu Datei angegeben. Ich lade es herunter. Es ist eine große Datei 1.7G vor dem Entpacken. Die nicht komprimierte Datei wird in C:\nltk_data\corpora eingefügt. Ich bin mir nicht sicher, wie es funktioniert. Ich hoffe es ist hilfreich.

+1

Ich stimme dir nicht zu. Dies ist eine Lösung, die hilft, die erforderliche Datei manuell herunterzuladen. – xuanyue

+0

Danke. Ich suche nach einer automatischen Lösung zum Herunterladen aller Pakete. – user1902346

+0

Siehe https://github.com/nltk/nltk/issues/1283#issuecomment-188251568 – alvas

0

Entsprechend dieser github issues, möchten Sie möglicherweise Ihre Version von NLTK auf die Entwicklungsversion aktualisieren.

Auch diese spezifische Datei ist wirklich groß, nach this, es ist etwa 1,7 GB. Also bitte etwas Geduld.

Verwandte Themen