2017-11-24 3 views
1

Ich versuche, ein JSON zu lesen, die Wörterbücher verschachtelt hat, indem ich diese Pandas tutorial, das Problem ist, einige meiner verschachtelten Liste/Wörterbücher sind NaN also wenn ich versuche, rufen Sie die normalize Funktion bekomme ich ein can't find Key Error da es nur für bestimmte Elemente in der höheren Ebene des Wörterbuchs existiert.Pandas lesen verschachtelten JSON mit NaN-Einträge

Hier sind meine Daten:

q 
Out[235]: 
[{u'Code': u'GE', 
    u'datetime': u'2011-11-14T19:30:03-05:00[US/Eastern]'}, 
{u'Code': u'PP', 
    u'datetime': u'2012-21-14T18:50-05:00[US/Eastern]'}, 
{u'Code': u'IO', 
    u'Summary': [{u'prod': u'book', 
    u'num': 81.04, 
    u'devil': 17}, 
    {u'prod': u'game', 
    u'num': 191.5, 
    u'devil': 10}, 
    {u'prod': u'desk', 
    u'num': 55.5, 
    u'devil': -6}, 
    {u'angel': u'ipo', 
    u'num': 503.0, 
    u'devil': 1}], 
    u'datetime': u'2013-10-14T16:30-05:00[US/Eastern]'}, 
{u'Code': u'BI', 
    u'datetime': u'2014-11-14T12:30-05:00[US/Eastern]'}, 
{u'Code': u'EZ', 
    u'datetime': u'2015-12-14T10:00-05:00[US/Eastern]'}, 
{u'Code': u'JC', 
    u'datetime': u'2016-10-14T08:30:01-05:00[US/Eastern]'}, 
{u'Code': u'WX', 
    u'Summary': [{u'angel': u'yut', 
    u'num': 0, 
    u'prod': u'read', 
    u'devil': 0.0}, 
    {u'angel': u'fgf', 
    u'prod': u'fart', 
    u'devil': 0.0}, 
    {u'prod': u'red', 
    u'num': 673, 
    u'angel': u'deft', 
    u'devil': 0}, 
    { u'devil': 0, 
    u'prod': u'dog'}, 
    {u'angel': u'hut', 
    u'devil': 99}], 
    u'datetime': u'2017-10-13T05:00:02-05:00[US/Eastern]'}] 

kann ich halb sehen es in einem Datenrahmen wie folgt aus:

pd.DataFrame(q) 
    Out[229]: 
      Code           Summary      datetime 
    0   GE            NaN 2011-11-11T19:30:03-05:00[US/Eastern] 
    1   PP            NaN 2012-12-25T18:50-05:00[US/Eastern] 
    2   IO [{u'prod': u'book', u'angel': u'I...    2013-11-04T16:30-05:00[US/Eastern] 
    3   BI            NaN 2014-12-14T08:30:01-05:00[US/Eastern] 
    4   JC            NaN 2016-11-14T04:30-05:00[US/Eastern] 
    5   WX [{u'prod': u'orange', u'devil': -2, u's...   2017-10-13T03:30:08-05:00[US/Eastern] 

Wie bereits erwähnt, pd.io.json.json_normalize(q, 'Summary',['Code', 'datetime']) Ergebnisse in KeyError: 'Summary'

laufen Wie kann ich umgehen Sie das? Idealerweise hätte ich gerne NaN-Zellenwerte für Zeiten, in denen sie nicht existiert.

+0

@MaxU Leider habe gerade bemerkt, die Tippfehler und ich bearbeitet es. Lassen Sie es mich wissen, wenn Probleme mit meinen Beispieldaten auftreten. – guy

+0

@MaxU Siehst du es jetzt? – guy

+0

yep, es sieht jetzt besser aus ;-) Könnten Sie bitte Ihren gewünschten Datensatz angeben? – MaxU

Antwort

1

IIUC:

In [94]: (json_normalize([x for x in q if x.get('Summary')], 
         'Summary', 
         ['Code', 'datetime']) 
    ...:    .append(pd.DataFrame([x for x in q if not x.get('Summary')]))) 
    ...: 
Out[94]: 
    Code angel        datetime devil  num prod 
0 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book 
1 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game 
2 IO NaN  2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk 
3 IO ipo  2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN 
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read 
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN fart 
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red 
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN dog 
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0  NaN NaN 
0 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN  NaN NaN 
1 PP NaN  2012-21-14T18:50-05:00[US/Eastern] NaN  NaN NaN 
2 BI NaN  2014-11-14T12:30-05:00[US/Eastern] NaN  NaN NaN 
3 EZ NaN  2015-12-14T10:00-05:00[US/Eastern] NaN  NaN NaN 
4 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN  NaN NaN 

oder mit pd.concat():

In [95]: pd.concat([json_normalize([x for x in q if x.get('Summary')], 
    ...:       'Summary', 
    ...:       ['Code', 'datetime']), 
    ...:   pd.DataFrame([x for x in q if not x.get('Summary')])], 
    ...:   ignore_index=True) 
    ...: 
Out[95]: 
    Code angel        datetime devil  num prod 
0 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book 
1 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game 
2 IO NaN  2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk 
3 IO ipo  2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN 
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read 
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN fart 
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red 
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN dog 
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0  NaN NaN 
9 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN  NaN NaN 
10 PP NaN  2012-21-14T18:50-05:00[US/Eastern] NaN  NaN NaN 
11 BI NaN  2014-11-14T12:30-05:00[US/Eastern] NaN  NaN NaN 
12 EZ NaN  2015-12-14T10:00-05:00[US/Eastern] NaN  NaN NaN 
13 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN  NaN NaN