2017-03-22 1 views
0

Hallo, ich habe folgendes json:Wie zu überwinden, das folgende Problem beim Parsen einer JSON-Datei?

j = """[ 
    [ 
     { 
      "created": "2017-02-02T11:57:41+0000", 
      "from": "Bank", 
      "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks." 
     }, 
     { 
      "created": "2017-02-01T22:19:58+0000" , 
      "from": "Alex ", 
      "message": "Could someone please help me?, I am callig to CC and they don't answer" 
     }, 
     { 
      "created": "2017-02-01T22:19:42+0000", 
      "from": "Alex ", 
      "message": "the sms with the corresponding key and token has not arrived" 
     }, 
     { 
      "created": "2017-02-01T22:19:28+0000", 
      "from": "Alex ", 
      "message": "I have issues to make payments from the app" 
     }, 
     { 
      "created": "2017-02-01T22:19:18+0000", 
      "from": "Alex ", 
      "message": "Good afternoon" 
     } 
    ], 
    [ 
     { 
      "created": "2017-02-01T22:19:12+0000", 
      "from": "Bank", 
      "message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459" 
     }, 
     {    
      "created": "2017-02-01T16:22:30+0000", 
      "from": "Alex", 
      "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot" 
     } 

    ] 


]""" 

Da ich brauche eine spezifische Struktur ich es zu analysieren versucht, wie folgt:

js = json.loads(j) 
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)}) 

df.created = pd.to_datetime(df.created) 

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='') 

Alles ist bis zu diesem Zeitpunkt in Ordnung, aber wenn ich hinzufügen, ein anderes Feld mit ich wiederholte Datum bekam folgende Fehlermeldung:

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-5-5652e92adbdc> in <module>() 
    69 df['from'] = df['from'].str.strip() 
    70 df = df.drop_duplicates() 
---> 71 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')) .set_index(['created', 'qna']) .unstack() 
    72 
    73 

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in unstack(self, level, fill_value) 
    4034   """ 
    4035   from pandas.core.reshape import unstack 
-> 4036   return unstack(self, level, fill_value) 
    4037 
    4038  # ---------------------------------------------------------------------- 

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in unstack(obj, level, fill_value) 
    406  if isinstance(obj, DataFrame): 
    407   if isinstance(obj.index, MultiIndex): 
--> 408    return _unstack_frame(obj, level, fill_value=fill_value) 
    409   else: 
    410    return obj.T.stack(dropna=False) 

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _unstack_frame(obj, level, fill_value) 
    449   unstacker = _Unstacker(obj.values, obj.index, level=level, 
    450        value_columns=obj.columns, 
--> 451        fill_value=fill_value) 
    452   return unstacker.get_result() 
    453 

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value) 
    101 
    102   self._make_sorted_values_labels() 
--> 103   self._make_selectors() 
    104 
    105  def _make_sorted_values_labels(self): 

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _make_selectors(self) 
    139 
    140   if mask.sum() < len(self.index): 
--> 141    raise ValueError('Index contains duplicate entries, ' 
    142        'cannot reshape') 
    143 

ValueError: Index contains duplicate entries, cannot reshape 

ich mit diesem neuen json bin versucht, aber es wird von dem Zeitpunkt versagt, so möchte ich recei zu ve unterstützen, diese Aufgabe zu bewältigen:

dies der json ist, die versagt:

j = """[ 
    [ 
     { 
      "created": "2017-02-02T11:57:41+0000", 
      "from": "Bank", 
      "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks." 
     }, 
     { 
      "created": "2017-02-01T22:19:58+0000" , 
      "from": "Alex ", 
      "message": "Could someone please help me?, I am callig to CC and they don't answer" 
     }, 
     { 
      "created": "2017-02-01T22:19:42+0000", 
      "from": "Alex ", 
      "message": "the sms with the corresponding key and token has not arrived" 
     }, 
     { 
      "created": "2017-02-01T22:19:28+0000", 
      "from": "Alex ", 
      "message": "I have issues to make payments from the app" 
     }, 
     { 
      "created": "2017-02-01T22:19:18+0000", 
      "from": "Alex ", 
      "message": "Good afternoon" 
     } 
    ], 
    [ 
     { 
      "created": "2017-02-01T22:19:12+0000", 
      "from": "Bank", 
      "message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459" 
     }, 
     {    
      "created": "2017-02-01T16:22:30+0000", 
      "from": "Alex", 
      "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot" 
     } 

    ], 
    [ 
     { 
      "created": "2017-02-01T22:19:13+0000", 
      "from": "Bank", 
      "message": " Hello Adolfo, the money is available." 
     }, 
     {    
      "created": "2017-02-01T16:22:33+0000", 
      "from": "Omar", 
      "message": "hello they have deposited the money into my account." 
     } 

    ] 



]""" 
+0

Nur meine Antwort bearbeitet. Es ist nicht notwendig, "append = True" zu haben. Das Problem bestand darin, dass die Anweisung assign eine eigene Zeile benötigte. –

Antwort

1

trennen Sieht aus wie Sie die assign Anweisung müssen. Keine Notwendigkeit für append=True.

js = json.loads(j) 
df = pd.concat([pd.DataFrame(j) for j in js], ignore_index=True) 
df['from'] = df['from'].str.strip() 
df['created'] = pd.to_datetime(df.created) 
df['qna'] = np.where(df['from'] == 'Bank', 'Answer', 'Question') 
df1 = df.set_index(['created', 'qna']).unstack(fill_value='') 

with pd.option_context('display.max_colwidth', 30, 'display.expand_frame_repr', False): 
    print(df1) 

Ausgabe

     from         message        
qna     Answer Question       Answer      Question 
created                       
2017-02-01 16:22:30   Alex         hello they have deposited ... 
2017-02-01 22:19:12 Bank   Hello Alexander, the mone...        
2017-02-01 22:19:18   Alex            Good afternoon 
2017-02-01 22:19:28   Alex         I have issues to make paym... 
2017-02-01 22:19:42   Alex         the sms with the correspon... 
2017-02-01 22:19:58   Alex         Could someone please help ... 
2017-02-02 11:57:41 Bank   Hi Alex, if you have not p... 
+0

Vielen Dank für die Unterstützung – neo33

Verwandte Themen