Datenfluss konnte nicht eingerichtet werden Worker

Getestet meine Pipeline auf DirectRunner und alles funktioniert gut. Jetzt möchte ich es auf DataflowRunner ausführen. Es funktioniert nicht. Es schlägt sogar fehl, bevor ich meinen Pipeline-Code eingegeben habe, und ich bin völlig überwältigt von den Logs im Stackdriver - verstehe einfach nicht, was sie bedeuten und habe wirklich keine Ahnung, was falsch ist.Datenfluss konnte nicht eingerichtet werden Worker

Ausführung Graph sieht geladen feine
Arbeiter Pool beginnt und 1 Arbeiter versucht, durch den Setup-Prozess ausgeführt, sieht jedoch nie Erfolg
einige Protokolle, die ich nützliche Informationen für die Fehlersuche liefern könnte erraten:

AttributeError:'module' object has no attribute 'NativeSource' /usr/bin/python failed with exit status 1 Back-off 20s restarting failed container=python pod=dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh_default(50a3915d6501a3ec74d6d385f70c8353) checking backoff for container "python" in pod "dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh" INFO SSH key is not a complete entry: .....

Wie soll ich dieses Problem angehen?

Edit: mein setup.py hier, wenn es hilft: (copyed von [here], modifiziert nur REQUIRED_PACKAGES und setuptools.setup Abschnitt)

from distutils.command.build import build as _build 
import subprocess 

import setuptools 


# This class handles the pip install mechanism. 
class build(_build): # pylint: disable=invalid-name 
    """A build command class that will be invoked during package install. 

    The package built using the current setup.py will be staged and later 
    installed in the worker using `pip install package'. This class will be 
    instantiated during install for this specific scenario and will trigger 
    running the custom commands specified. 
    """ 
    sub_commands = _build.sub_commands + [('CustomCommands', None)] 


# Some custom command to run during setup. The command is not essential for this 
# workflow. It is used here as an example. Each command will spawn a child 
# process. Typically, these commands will include steps to install non-Python 
# packages. For instance, to install a C++-based library libjpeg62 the following 
# two commands will have to be added: 
# 
#  ['apt-get', 'update'], 
#  ['apt-get', '--assume-yes', install', 'libjpeg62'], 
# 
# First, note that there is no need to use the sudo command because the setup 
# script runs with appropriate access. 
# Second, if apt-get tool is used then the first command needs to be 'apt-get 
# update' so the tool refreshes itself and initializes links to download 
# repositories. Without this initial step the other apt-get install commands 
# will fail with package not found errors. Note also --assume-yes option which 
# shortcuts the interactive confirmation. 
# 
# The output of custom commands (including failures) will be logged in the 
# worker-startup log. 
CUSTOM_COMMANDS = [ 
    ['echo', 'Custom command worked!']] 


class CustomCommands(setuptools.Command): 
    """A setuptools Command class able to run arbitrary commands.""" 

    def initialize_options(self): 
    pass 

    def finalize_options(self): 
    pass 

    def RunCustomCommand(self, command_list): 
    print 'Running command: %s' % command_list 
    p = subprocess.Popen(
     command_list, 
     stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 
    # Can use communicate(input='y\n'.encode()) if the command run requires 
    # some confirmation. 
    stdout_data, _ = p.communicate() 
    print 'Command output: %s' % stdout_data 
    if p.returncode != 0: 
     raise RuntimeError(
      'Command %s failed: exit code: %s' % (command_list, p.returncode)) 

    def run(self): 
    for command in CUSTOM_COMMANDS: 
     self.RunCustomCommand(command) 


# Configure the required packages and scripts to install. 
# Note that the Python Dataflow containers come with numpy already installed 
# so this dependency will not trigger anything to be installed unless a version 
# restriction is specified. 
REQUIRED_PACKAGES = ['apache-beam==2.0.0', 
        'datalab==1.0.1', 
        'google-cloud==0.19.0', 
        'google-cloud-bigquery==0.22.1', 
        'google-cloud-core==0.22.1', 
        'google-cloud-dataflow==0.6.0', 
        'pandas==0.20.2'] 


setuptools.setup(
    name='geotab-backlog-dataflow', 
    version='0.0.1', 
    install_requires=REQUIRED_PACKAGES, 
    packages=setuptools.find_packages(), 
)

Arbeiter-Startprotokoll: und endete an der folgenden Ausnahme

I /usr/bin/python failed with exit status 1 
I /usr/bin/python failed with exit status 1 
I AttributeError: 'module' object has no attribute 'NativeSource' 
I  class ConcatSource(iobase.NativeSource): 
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/concat_reader.py", line 26, in <module> 
I  from dataflow_worker import concat_reader 
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/maptask.py", line 31, in <module> 
I  from dataflow_worker import maptask 
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 26, in <module> 
I  from dataflow_worker import executor 
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 63, in <module> 
I  from dataflow_worker import batchworker 
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 26, in <module> 
I  exec code in run_globals 
I File "/usr/lib/python2.7/runpy.py", line 72, in _run_code 
I  "__main__", fname, loader, pkg_name) 
I File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main 
I AttributeError: 'module' object has no attribute 'NativeSource' 
I  class ConcatSource(iobase.NativeSource):

Quelle

2017-06-14 foxwendy

Können Sie eine Job-ID teilen? Bei der Cloud-Protokollierung können auch zusätzliche Informationen in einigen anderen Protokollen (z. B. worker, worker_startup usw.) enthalten sein, die etwas detailliertere Informationen darüber enthalten, warum der Worker nicht gestartet wurde. –

@BenChambers JobID: 2017-06-13_14_00_46-14992846213167080594, 2017-06-14_08_17_20-10061999408051657645. Ich kann eine Ausnahme vom Worker-Startprotokoll sehen. Ich habe sie in meinen Beitrag eingefügt. – foxwendy

Sie scheinen inkompatible Anforderungen in Ihrer REQUIRED_PACKAGES Direktive zu verwenden, d. H. Sie geben "apache-beam==2.0.0" und "google-cloud-dataflow==0.6.0" an, die miteinander in Konflikt stehen. Können Sie versuchen, das "apache-beam" Paket zu entfernen/zu deinstallieren und stattdessen das "google-cloud-dataflow==2.0.0" Paket zu installieren/einzubinden?

Quelle

2017-06-15 17:12:21

Es ist wirklich frustrierend Erfahrung mit dem Datenfluss zu arbeiten. Was auf meinem lokalen Runner funktioniert, kann aus Gründen des Cloud-Runner fehlschlagen. – foxwendy

Es ist wirklich verwirrend (nicht Ihre Antwort, aber das Dokument) Jedes Google-Dokument für den Datenfluss sagt, dass es jetzt von Apache Beam basiert und verweist mich auf Beam-Website. Auch wenn ich nach github-Projekt suchte, würde ich sehen, dass das google dataflow-Projekt leer ist und gerade alles zum Apache beam repo geht. – foxwendy

bedeutet das google-cloud-dataflow ist veraltet und ich sollte stattdessen Apache-beam installieren? Außerdem habe ich versucht, Apache-Beam zu deinstallieren und nur google-cloud-dataflow beizubehalten, da ist ein Fehler von directrunner: ImportError: Kein Modul namens options.pipeline_options. scheint wie unter google-cloud-dataflow, beam package ist immer noch auf der älteren version, aber ich kann einfach nicht finden unterstützung API dokument von Google erklären die version pick ..... – foxwendy

Datenfluss konnte nicht eingerichtet werden Worker

Antwort

Verwandte Themen