1

Ich versuche, Apache Strahl Googles Datenspeicher api ReadFromDatastorApache Beam-Google-Datenspeicher ReadFromDatastore Einheit protobuf

p = beam.Pipeline(options=options) 
(p 
| 'Read from Datastore' >> ReadFromDatastore(gcloud_options.project, query) 
| 'reformat'   >> beam.Map(reformat) 
| 'Write To Datastore' >> WriteToDatastore(gcloud_options.project)) 

Das Objekts zu verwenden, die zu meiner umformatieren Funktion übergeben werden ist Typ

google.cloud.proto. datastore.v1.entity_pb2.Entity

Es ist im Protobuf-Format, das schwer zu ändern oder zu lesen ist.

glaube ich, eine entity_pb2.Entity zu einem dict mit

entity= dict(google.cloud.datastore.helpers._property_tuples(entity_pb)) 

Aber aus irgendeinem Grund umwandeln kann versuchen, die folgenden zwei Bibliotheken zu importieren gibt mir einige Fehler:

import google.cloud.datastore.helpers 
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore 

Fehler:

Traceback (most recent call last): 
    File "/home/nburn42/MotoGarage/MotoGarage/MotoGarageBackgroundJobs/format_data.py", line 16, in <module> 
    import google.cloud.datastore.helpers 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 57, in <module> 
    from google.cloud.datastore.batch import Batch 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module> 
    from google.cloud.datastore import helpers 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module> 
    from google.cloud.grpc.datastore.v1 import entity_pb2 as _entity_pb2 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/grpc/datastore/v1/entity_pb2.py", line 28, in <module> 
    dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,]) 
    File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__ 
    return _message.default_pool.AddSerializedFile(serialized_pb) 
TypeError: Couldn't build proto file into descriptor pool! 
Invalid proto descriptor for file "google/cloud/grpc/datastore/v1/entity.proto": 
    google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.PartitionId.namespace_id: "google.datastore.v1.PartitionId.namespace_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.PartitionId: "google.datastore.v1.PartitionId" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.partition_id: "google.datastore.v1.Key.partition_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.path: "google.datastore.v1.Key.path" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.id_type: "google.datastore.v1.Key.PathElement.id_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.kind: "google.datastore.v1.Key.PathElement.kind" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.id: "google.datastore.v1.Key.PathElement.id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.name: "google.datastore.v1.Key.PathElement.name" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement: "google.datastore.v1.Key.PathElement" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key: "google.datastore.v1.Key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.ArrayValue.values: "google.datastore.v1.ArrayValue.values" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.ArrayValue: "google.datastore.v1.ArrayValue" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.value_type: "google.datastore.v1.Value.value_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.null_value: "google.datastore.v1.Value.null_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.boolean_value: "google.datastore.v1.Value.boolean_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.integer_value: "google.datastore.v1.Value.integer_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.double_value: "google.datastore.v1.Value.double_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.timestamp_value: "google.datastore.v1.Value.timestamp_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.key_value: "google.datastore.v1.Value.key_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.string_value: "google.datastore.v1.Value.string_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.blob_value: "google.datastore.v1.Value.blob_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.geo_point_value: "google.datastore.v1.Value.geo_point_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.entity_value: "google.datastore.v1.Value.entity_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.array_value: "google.datastore.v1.Value.array_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.meaning: "google.datastore.v1.Value.meaning" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.exclude_from_indexes: "google.datastore.v1.Value.exclude_from_indexes" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value: "google.datastore.v1.Value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.key: "google.datastore.v1.Entity.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.properties" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry.key: "google.datastore.v1.Entity.PropertiesEntry.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Entity.PropertiesEntry.value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry: "google.datastore.v1.Entity.PropertiesEntry" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity: "google.datastore.v1.Entity" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.partition_id: "google.datastore.v1.PartitionId" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Key.path: "google.datastore.v1.Key.PathElement" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.ArrayValue.values: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.key_value: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.entity_value: "google.datastore.v1.Entity" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.array_value: "google.datastore.v1.ArrayValue" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.key: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 

Gibt es etwas, was ich tun kann, um eine entity_pb2.Entity in etwas Nutzbares zu konvertieren ?
Ist der ReadFromDatastore gerade zu neu für den echten Gebrauch?
Gibt es einen anderen Ansatz, den ich verwenden sollte?

Danke,
Nathan

+0

Werfen Sie einen Blick auf das Paket 'com.google.datastore.v1' und –

Antwort

1

können Sie die Funktion nutzen google.cloud.datastore.helpers.entity_from_protobufentity_pb2.Entity-google.cloud.datastore.entity.Entity zu konvertieren.

google.cloud.datastore.entity.Entity ist eine Unterklasse von dict und gibt Ihnen die Benutzerfreundlichkeit, die Sie benötigen.

+0

Diese Lösung funktioniert hervorragend mit Apache Beam, das auf einem lokalen Computer ausgeführt wird. Wenn der Job jedoch an DataflowRunner übergeben wird, schlägt der Job fehl, da in 'google.cloud'' datastore' nicht gefunden wird. Liegt dies daran, dass die Datastore-Unterstützung für Apache Beam Python SDK noch in der Betaphase ist? –

+1

Wenn Sie eine Pipeline an Google Dataflow-Mitarbeiter mit Apache Beam bereitstellen, wird Ihre Python-Umgebung nicht auf Arbeitscomputern repliziert. Das Problem ist, dass auf den Arbeitscomputern das Google-Cloud-Datenspeicherpaket standardmäßig nicht installiert ist, während dies in Ihrer lokalen Umgebung der Fall ist. Anweisungen zum Festlegen von PyPI-Abhängigkeiten finden Sie unter https://cloud.google.com/dataflow/pipelines/dependencies-python –