Finden Sie Unterschiede zwischen zwei Tabellen einschließlich Duplikate

Ich habe eine Abfrage, wo ich Unterschiede zwischen zwei Zeilen finden kann, aber ich möchte auch doppelte Zeilen als Unterschied anzeigen. Ich weiß, dass Tabelle actual_orders hat Duplikate und meine Tabelle expected_rows hat keine Duplikate. Wie kann ich meine Abfrage so ändern, dass Duplikate als Unterschied angezeigt werden? Und nicht nur die eigentlichen Daten.Finden Sie Unterschiede zwischen zwei Tabellen einschließlich Duplikate

Dies ist meine Frage:

 select 
    expected_orders.mk_file_id,actual_orders.mk_file_id, 
    expected_orders.ind_id, actual_orders.ind_id, 
    expected_orders.mk_cust_id,actual_orders.mk_cust_id, 
    expected_orders.order_sk,actual_orders.order_sk, 
    expected_orders.progen_order_id,actual_orders.progen_order_id, 
    expected_orders.order_chanel_id,actual_orders.order_chanel_id, 
    expected_orders.order_date_str,actual_orders.order_date_str, 
    expected_orders.order_total_usd,actual_orders.order_total_usd, 
    expected_orders.order_ship_usd,actual_orders.order_ship_usd, 
    expected_orders.order_discount_usd,actual_orders.order_discount_usd, 
    expected_orders.order_tax_usd,actual_orders.order_tax_usd, 
    expected_orders.empty_source_code,actual_orders.empty_source_code, 
    expected_orders.method_of_payment_code,actual_orders.method_of_payment_code, 
    expected_orders.feed_id,actual_orders.feed_id, 
    expected_orders.creation_date_str,actual_orders.creation_date_str, 
    expected_orders.update_ts_str,actual_orders.update_ts_str, 
    expected_orders.empty_match_type,actual_orders.empty_match_type, 
    expected_orders.mp_id,actual_orders.mp_id 
    from default.expected_orders 
    FULL OUTER JOIN default.actual_orders 
    ON (
     COALESCE(expected_orders.mk_file_id,-1)=COALESCE(actual_orders.mk_file_id,-1) AND 
     COALESCE(expected_orders.ind_id,-1)=COALESCE(actual_orders.ind_id,-1)AND 
     COALESCE(expected_orders.mk_cust_id,'-1')=COALESCE(actual_orders.mk_cust_id,'-1') AND 
     COALESCE(expected_orders.order_sk,-1)=COALESCE(actual_orders.order_sk,-1) 

    )where (
    COALESCE(expected_orders.mk_file_id,-1)<>COALESCE(actual_orders.mk_file_id,-1) OR 
    COALESCE(expected_orders.ind_id,-1)<>COALESCE(actual_orders.ind_id,-1) OR 
    COALESCE(expected_orders.mk_cust_id,'-1')<>COALESCE(actual_orders.mk_cust_id,'-1') OR 
    COALESCE(expected_orders.order_sk,-1)<>COALESCE(actual_orders.order_sk,-1) OR 
    COALESCE(expected_orders.progen_order_id,'-1')<>COALESCE(actual_orders.progen_order_id,'-1') OR 
    COALESCE(expected_orders.order_chanel_id,-1)<>COALESCE(actual_orders.order_chanel_id,-1) OR 
    COALESCE(expected_orders.order_date_str,'-1')<>COALESCE(actual_orders.order_date_str,'-1') OR 
    COALESCE(expected_orders.order_total_usd,0.0)<>COALESCE(actual_orders.order_total_usd,0.0) OR 
    COALESCE(expected_orders.order_ship_usd,0.0)<>COALESCE(actual_orders.order_ship_usd,0.0) OR 
    COALESCE(expected_orders.order_discount_usd,0.0)<>COALESCE(actual_orders.order_discount_usd,0.0) OR 
    COALESCE(expected_orders.order_tax_usd,0.0)<>COALESCE(actual_orders.order_tax_usd,0.0) OR 
    COALESCE(expected_orders.empty_source_code,'-1')<>COALESCE(actual_orders.empty_source_code,'-1') OR 
    COALESCE(expected_orders.method_of_payment_code,'-1')<>COALESCE(actual_orders.method_of_payment_code,'-1') OR 
    COALESCE(expected_orders.feed_id,-1)<>COALESCE(actual_orders.feed_id,-1) OR 
    COALESCE(expected_orders.creation_date_str,'-1')<>COALESCE(actual_orders.creation_date_str,'-1') OR 
    COALESCE(expected_orders.update_ts_str,'-1')<>COALESCE(actual_orders.update_ts_str,'-1') OR 
    COALESCE(expected_orders.empty_match_type,'-1')<>COALESCE(actual_orders.empty_match_type,'-1') OR 
    COALESCE(expected_orders.mp_id,-1)<>COALESCE(actual_orders.mp_id,-1))

ich hive bin, aber ich werde auch andere Tags wie SQL und Fortschritt gehören. Jede Hilfe wäre wirklich

Quelle

2017-06-23 danilo

Für Gleichheit einschließlich NULL gleich NULL, verwenden Sie '<=>' –

hinzufügen count (*) Spalte für die Berechnung der Anzahl der Duplikate und vergleiche sie auch. – leftjoin

Beginnen Sie mit einem High-Level-Zusammenfassung

select  total_rows 
      ,expected_rows 
      ,actual_rows 
      ,record_variations 
      ,count (*)    as number_of_keys 

from  (select  count (*)        as total_rows 
         ,count (case when tab = 'E' then 1 end) as expected_rows 
         ,count (case when tab = 'A' then 1 end) as actual_rows 
         ,count (distinct rec)     as record_variations 

      from  (   select 'E' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from expected_orders 
         union all select 'A' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from actual_orders 
         ) t 

      group by mk_file_id 
         ,ind_id  
         ,mk_cust_id 
         ,order_sk 
      ) t 

group by total_rows 
      ,expected_rows 
      ,actual_rows 
      ,record_variations 
;

und dann

select  mk_file_id 
      ,ind_id  
      ,mk_cust_id 
      ,order_sk 

      ,count (*)        as total_rows 
      ,count (case when tab = 'E' then 1 end) as expected_rows 
      ,count (case when tab = 'A' then 1 end) as actual_rows 
      ,count (distinct rec)     as record_variations 


from  (   select 'E' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from expected_orders 
      union all select 'A' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from actual_orders 
      ) t 

group by mk_file_id 
      ,ind_id  
      ,mk_cust_id 
      ,order_sk 

-- having ... 
;

Drilldown geschätzt werden

Quelle

2017-06-23 15:03:44

Hallo, hast du dir diesen Vorschlag angesehen? –

Finden Sie Unterschiede zwischen zwei Tabellen einschließlich Duplikate

Antwort

Verwandte Themen