2017-05-25 2 views
1

Ich versuche, verschachtelte XML-Daten in Hive zu laden. Beispieldaten wird wie folgt ...Laden verschachtelter XML-Daten in Hive mithilfe von SerDe

<CustomerOrders> 
    <Customers> 
    <CustID>ALFKI</CustID> 
    <Orders> 
     <OrderID>10643</OrderID> 
     <CustomerID>ALFKI</CustomerID> 
     <OrderDate>1997-08-25</OrderDate> 
    </Orders> 
    <Orders> 
     <OrderID>10692</OrderID> 
     <CustomerID>ALFKI</CustomerID> 
     <OrderDate>1997-10-03</OrderDate> 
    </Orders> 
    <CompanyName>Alfreds Futterkiste</CompanyName> 
    </Customers> 
    <Customers> 
    <CustID>ANATR</CustID> 
    <Orders> 
     <OrderID>10308</OrderID> 
     <CustomerID>ANATR</CustomerID> 
     <OrderDate>1996-09-18</OrderDate> 
    </Orders> 
    <CompanyName>Ana Trujillo Emparedados y helados</CompanyName> 
    </Customers> 
</CustomerOrders> 

Unterhalb der Befehl, die ich verwende:

CREATE TABLE CUSTOMERORDERS(
      CustID STRING, 
      Orders ARRAY<STRUCT<OrderID:STRING,CustomerID:STRING,OrderDate:STRING>>, 
      CompanyName STRING) 
      ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
      WITH SERDEPROPERTIES (
      "column.xpath.CustID"="/Customers/CustID/text()", 
      "column.xpath.Orders"="/Customers/Orders", 
      "column.xpath.OrderID"="/Customers/Orders/OrderID", 
      "column.xpath.CustomerID"="/Customers/Orders/CustomerID", 
      "column.xpath.OrderDate"="/Customers/Orders/OrderDate", 
      "column.xpath.CompanyName"="/Customers/CompanyName/text()") 
      STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
      TBLPROPERTIES ("xmlinput.start"="<Customers>","xmlinput.end"= "</Customers>"); 

Ausgang, ich bin gettings ist:

hive> select * from customerorders; 
OK 
ALFKI [{"orderid":null,"customerid":null,"orderdate":null},{"orderid":null,"customerid":null,"orderdate":null}]  Alfreds Futterkiste 
ANATR [{"orderid":null,"customerid":null,"orderdate":null}] Ana Trujillo Emparedados y helados 
Time taken: 0.039 seconds, Fetched: 2 row(s) 

Ich erhalte null Werte für OrderID, CustomerID und OrderDate. Kann mir jemand bei der Lösung dieses Problems helfen?

Dank

+0

Ich glaube, ich nicht 'OrderID' konfigurieren sollte,' CustomerID', 'OrderDate' in' SERDEPROPERTIES' weil sie keine Tabellenspalten sind. Also habe ich sie entfernt. Ich habe '/ text()' für 'Orders' ausprobiert. In diesem Fall bekomme ich "NULL". 'Bienenstock> wählen * von customerorders; OK ALFKI NULL Alfreds Futterkiste ANATR NULL Ana Trujillo Emparedados y helados Zeitaufwand: 0,037 Sekunden, abgerufen: 2 Zeile (n) ' –

Antwort

1
create external table customerorders 
(
    custid  string 
    ,orders  array<struct<Orders:struct<OrderID:string,CustomerID:string,OrderDate:string>>> 
    ,companyname string 
) 
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
with serdeproperties 
(
    "column.xpath.CustID"  = "/Customers/CustID/text()" 
    ,"column.xpath.Orders"  = "/Customers/Orders" 
    ,"column.xpath.CompanyName" = "/Customers/CompanyName/text()" 
) 

stored as 
inputformat  'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
tblproperties 
(
    "xmlinput.start" = "<Customers>" 
    ,"xmlinput.end"  = "</Customers>" 
); 

-

select * from customerorders 
; 

-

+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ 
| custid |                   orders                   |   companyname    | 
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ 
| ALFKI | [{"orders":{"orderid":"10643","customerid":"ALFKI","orderdate":"1997-08-25"}},{"orders":{"orderid":"10692","customerid":"ALFKI","orderdate":"1997-10-03"}}] | Alfreds Futterkiste    | 
| ANATR | [{"orders":{"orderid":"10308","customerid":"ANATR","orderdate":"1996-09-18"}}]                    | Ana Trujillo Emparedados y helados | 
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+   
+0

Vielen Dank für die Lösung –

Verwandte Themen