2016-03-23 5 views
0

Ich habe eine Tabelle, dna_extraction_protocols, die Daten über DNA-Extraktionsprotokolle enthält. Eine Anzahl von Schlüsseln wird durch Incubation Objekte belegt, die auf der Tabelle incubations gespeichert sind. Die Inkubationen haben einen Schlüssel duration_unit, der ein Objekt MeasurementUnit enthält, das in der Tabelle measurement_units enthalten ist.Join Verwandte Tabelle der verwandten Tabelle in SQLAlchemy

Diese Tabellen sind wie so erstellt:

class DNAExtractionProtocol(Protocol): 
    __tablename__ = 'dna_extraction_protocols' 
    __mapper_args__ = {'polymorphic_identity': 'dna_extraction'} 
    id = Column(Integer, ForeignKey('protocols.id'), primary_key=True) 
    sample_mass = Column(Float) 
    mass_unit_id = Column(String, ForeignKey('measurement_units.id')) 
    mass_unit = relationship("MeasurementUnit", foreign_keys=[mass_unit_id]) 
    digestion_buffer_id = Column(String, ForeignKey("solutions.id")) 
    digestion_buffer = relationship("Solution", foreign_keys=[digestion_buffer_id]) 
    digestion_buffer_volume = Column(Float) 
    digestion_id = Column(Integer, ForeignKey("incubations.id")) 
    digestion = relationship("Incubation", foreign_keys=[digestion_id]) 
    lysis_buffer_id = Column(String, ForeignKey("solutions.id")) 
    lysis_buffer = relationship("Solution", foreign_keys=[lysis_buffer_id]) 
    lysis_buffer_volume = Column(Float) 
    lysis_id = Column(Integer, ForeignKey("incubations.id")) 
    lysis = relationship("Incubation", foreign_keys=[lysis_id]) 
    proteinase_id = Column(String, ForeignKey("solutions.id")) 
    proteinase = relationship("Solution", foreign_keys=[proteinase_id]) 
    proteinase_volume = Column(Float) 
    inactivation_id = Column(Integer, ForeignKey("incubations.id")) 
    inactivation = relationship("Incubation", foreign_keys=[inactivation_id]) 
    cooling_id = Column(Integer, ForeignKey("incubations.id")) 
    cooling = relationship("Incubation", foreign_keys=[cooling_id]) 
    centrifugation_id = Column(Integer, ForeignKey("incubations.id")) 
    centrifugation = relationship("Incubation", foreign_keys=[centrifugation_id]) 

    volume_unit_id = Column(String, ForeignKey('measurement_units.id')) 
    volume_unit = relationship("MeasurementUnit", foreign_keys=[volume_unit_id]) 

class Incubation(Base): 
    __tablename__ = "incubations" 
    id = Column(Integer, primary_key=True) 
    speed = Column(Float) 
    duration = Column(Float) 
    temperature = Column(Float) 
    movement = Column(String) # "centrifuge" or "shake" 

    #speed - usually in RPM - will refer to either centrifugation or shaking (See above) 
    speed_unit_id = Column(String, ForeignKey('measurement_units.id')) 
    speed_unit = relationship("MeasurementUnit", foreign_keys=[speed_unit_id]) 
    duration_unit_id = Column(String, ForeignKey('measurement_units.id')) 
    duration_unit = relationship("MeasurementUnit", foreign_keys=[duration_unit_id]) 
    temperature_unit_id = Column(String, ForeignKey('measurement_units.id')) 
    temperature_unit = relationship("MeasurementUnit", foreign_keys=[temperature_unit_id] 

class MeasurementUnit(Base): 
    __tablename__ = "measurement_units" 
    id = Column(Integer, primary_key=True) 
    code = Column(String, unique=True) 
    long_name = Column(String) 
    siunitx = Column(String) 

Nun möchte ich eine Pandas Datenrahmen extrahieren möchte, wo ich alle Attribute des DNAPurificationProtocol Objekt bekommen, eines verknüpften Incubation Objekts und eines verbunden MeasurementUnit Objekt.

Ich habe eine Reihe von Ansätzen versucht, und dieser scheint gut für die erste Beziehung zu arbeiten:

sql_query = session.query(DNAExtractionProtocol, MeasurementUnit, Incubation) \ 
    .join(MeasurementUnit, MeasurementUnit.id == DNAExtractionProtocol.volume_unit_id) \ 
    .join(Incubation, Incubation.id == DNAExtractionProtocol.lysis_id) \ 
    .filter(tables[table].code == code) 

Aber was für mich wie eine logische Erweiterung fühlte:

sql_query = session.query(DNAExtractionProtocol, MeasurementUnit, Incubation) \ 
    .join(MeasurementUnit, MeasurementUnit.id == DNAExtractionProtocol.volume_unit_id) \ 
    .join(Incubation, Incubation.id == DNAExtractionProtocol.lysis_id) \ 
    .join(MeasurementUnit, MeasurementUnit.id == Incubation.temperature_unit_id) \ 
    .filter(tables[table].code == code) 

schlägt mit :

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) ambiguous column name: measurement_units.id [SQL: u'SELECT protocols.type, dna_extraction_protocols.id, protocols.id, protocols.code, protocols.name, dna_extraction_protocols.sample_mass, dna_extraction_protocols.mass_unit_id, dna_extraction_protocols.digestion_buffer_id, dna_extraction_protocols.digestion_buffer_volume, dna_extraction_protocols.digestion_id, dna_extraction_protocols.lysis_buffer_id, dna_extraction_protocols.lysis_buffer_volume, dna_extraction_protocols.lysis_id, dna_extraction_protocols.proteinase_id, dna_extraction_protocols.proteinase_volume, dna_extraction_protocols.inactivation_id, dna_extraction_protocols.cooling_id, dna_extraction_protocols.centrifugation_id, dna_extraction_protocols.volume_unit_id, measurement_units.id, measurement_units.code, measurement_units.long_name, measurement_units.siunitx, incubations.id, incubations.speed, incubations.duration, incubations.temperature, incubations.movement, incubations.speed_unit_id, incubations.duration_unit_id, incubations.temperature_unit_id \nFROM protocols JOIN dna_extraction_protocols ON protocols.id = dna_extraction_protocols.id JOIN measurement_units ON measurement_units.id = dna_extraction_protocols.volume_unit_id JOIN incubations ON incubations.id = dna_extraction_protocols.lysis_id JOIN measurement_units ON measurement_units.id = incubations.temperature_unit_id \nWHERE protocols.code = ?'] [parameters: ('EPDqEP',)]

Jede Idee, wie sonst kann ich bekommen, wonach ich suche?

Antwort

1

Der Kern des Problems besteht darin, dass Sie der gleichen Tabelle zweimal beitreten. In SQL-Land, du bist der Weg, dies zu lösen ist einer von ihnen an Alias:

SELECT * FROM protocols 
JOIN dna_extraction_protocols ON ... 
JOIN measurement_units ON ... 
JOIN incubations ON ... 
JOIN measurement_units AS incubation_measurement_units ON incubation_measurement_units.id = incubations.temperature_unit_id 

Das Gleiche gilt hier:

sql_query = session.query(DNAExtractionProtocol, MeasurementUnit, Incubation) \ 
    .join(MeasurementUnit, ...) \ 
    .join(Incubation, ...) \ 
    .join(MeasurementUnit, ..., aliased=True) \ 
    .filter(tables[table].code == code) 

Wenn Sie von Spalten zurückgeben müssen oder Filter auf der aliased Tisch, Sie Ich werde Probleme bekommen, weil Sie nicht in der Lage sind, zwischen den beiden zu disambiguieren. In diesem Fall müssen Sie sich einem expliziten aliased() Konstrukt anschließen.

IncubationMeasurementUnit = aliased(MeasurementUnit) 
sql_query = session.query(DNAExtractionProtocol, MeasurementUnit, Incubation, IncubationMeasurementUnit) \ 
    .join(MeasurementUnit, ...) \ 
    .join(Incubation, ...) \ 
    .join(IncubationMeasurementUnit, ...) \ 
    .filter(tables[table].code == code) 
+0

ich ma versuchen, dies zu nutzen [hier] (https://github.com/TheChymera/medaba/blob/ef69d57dbafa34a4d83402acc5aaba6f1a043c89/src/dbdata/fmri.py#L41-L46), aber das Abrufen nicht der Spalten aus dem Eintrag "measurement_units", der durch den Eintrag "Inkubationen" verbunden ist. Ich bekomme in der Tat die gleiche Anzahl von Spalten mit oder ohne den letzten Join. Irgendeine Idee, was ist los? – TheChymera

+1

@TheChymera Sie haben das Beispiel nicht genau kopiert. Sie vermissen "IncubationMeasurementUnit" in der Liste der angefragten Entitäten. – univerio