2016-05-18 3 views

Antwort

0

sollten Sie in der Lage sein, damit über die python-bigquery api zu tun.

Zuerst müssen Sie die Verbindung zum BigQuery-Dienst machen. Hier ist der Code, den ich so zu tun, verwenden:

class BigqueryAdapter(object): 
    def __init__(self, **kwargs): 
     self._project_id = kwargs['project_id'] 
     self._key_filename = kwargs['key_filename'] 
     self._account_email = kwargs['account_email'] 
     self._dataset_id = kwargs['dataset_id'] 
     self.connector = None 
     self.start_connection() 

    def start_connection(self): 
     key = None 
     with open(self._key_filename) as key_file: 
      key = key_file.read() 
     credentials = SignedJwtAssertionCredentials(self._account_email, 
                key, 
                ('https://www.googleapis' + 
                '.com/auth/bigquery')) 
     authorization = credentials.authorize(httplib2.Http()) 
     self.connector = build('bigquery', 'v2', http=authorization) 

Danach Sie jobs mit self.connector (in this answer Sie einige Beispiele finden) ausführen können.

Um Sicherungen von Google Cloud Storage Sie so die configuration wie definieren würde:

body = "configuration": { 
    "load": { 
    "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO". 
    "fieldDelimiter": "," #(if it's comma separated) 
    "destinationTable": { 
     "projectId": #your_project_id 
     "tableId": #your_table_to_save_the_data 
     "datasetId": #your_dataset_id 
    }, 
    "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND" 
    "sourceUris": [ 
     #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator. 
    ], 
    "schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore. 
     "fields": [ # Describes the fields in a table. 
     { 
      "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD. 
      # Object with schema name: TableFieldSchema 
      ], 
      "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema). 
      "description": "A String", # [Optional] The field description. The maximum length is 16K characters. 
      "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters. 
      "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE. 
     }, 
     ], 
    }, 
    }, 

Und dann laufen:

self.connector.jobs().insert(body=body).execute() 

Hoffentlich ist das, was Sie suchen. Lassen Sie uns wissen, wenn Sie auf Probleme stoßen.

Verwandte Themen