2016-11-08 8 views
0

Ich habe zwei CSVs, die ich auf einer Spalte vergleichen muss. Und ich muss übereinstimmende Zeilen in einem csv und unübertroffene Zeilen in anderen setzen. Also habe ich Index für diese Spalte in der zweiten CSV erstellt und zuerst durchlaufen.Wie konvertiert man Serien in Datenrahmen in Pandas

df1 = pd.read_csv(file1,nrows=100) 
df2 = pd.read_csv(file2,nrows=100) 
df2.set_index('crc', inplace = True) 
matched_list = [] 
non_matched_list = [] 
    for _, row in df1.iterrows(): 
     try: 
      x = df2.loc[row['crc']]  
      matched_list.append(x) 
     except KeyError: 
      non_matched_list.append(row) 

Die x hier ist eine Reihe in folgendem Format

policyID     448094 
statecode      FL 
county    CLAY COUNTY 
eq_site_limit   1322376.3 
hu_site_limit   1322376.3 
fl_site_limit   1322376.3 
fr_site_limit   1322376.3 
tiv_2011    1322376.3 
tiv_2012    1438163.57 
eq_site_deductible    0 
hu_site_deductible   0.0 
fl_site_deductible    0 
fr_site_deductible    0 
point_latitude   30.063936 
point_longitude  -81.707664 
line     Residential 
construction    Masonry 
point_granularity    3 
Name: 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,0,0.0, dtype: object 

Meine Ausgabe csv in folgendem Format

policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 
114455,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 

Für alle die Serie in der angepaßten und nicht angepaßten sein sollte. Wie mache ich es? Ich kann nicht aus Index in zweiten CSV als Leistung in wichtigen loswerden.

Im Folgenden finden Sie den Inhalt von zwei CSV-Dateien. File1:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 
114455,FL,CLAY COUNTY,589658,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3 
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1 
333743,FL,CLAY COUNTY,0,12563.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3 
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1 
785275,FL,CLAY COUNTY,0,515035.62,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3 
995932,FL,CLAY COUNTY,0,19260000,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1 
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1 
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1 

File2:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3 
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1 
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3 
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1 
785275,FL,CLAY COUNTY,0,51564.9,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3 
995932,FL,CLAY COUNTY,0,457962,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1 
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1 
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1 
253816,FL,CLAY COUNTY,831498.3,831498.3,831498.3,831498.3,831498.3,1117791.48,0,0,0,0,30.10216,-81.719444,Residential,Masonry,1 
894922,FL,CLAY COUNTY,0,24059.09,0,0,24059.09,33952.19,0,0,0,0,30.095957,-81.695099,Residential,Wood,1 

Edit: Added Probe csv

+0

Sie Dateien manipulieren könnte, so dass es mindestens eine ist " unübertroffene "CRC in ihnen? – MaxU

Antwort

1

Ich denke, man es auf diese Weise tun können:

df1.loc[df1.crc.isin(df2.index)].to_csv('/path/to/matched.csv', index=False) 
df1.loc[~df1.crc.isin(df2.index)].to_csv('/path/to/unmatched.csv', index=False) 

statt Looping ..

Demo:

In [62]: df1.loc[df1.crc.isin(df2.index)].to_csv(r'c:/temp/matched.csv', index=False) 

In [63]: df1.loc[~df1.crc.isin(df2.index)].to_csv(r'c:/temp/unmatched.csv', index=False) 

Ergebnisse:

matched.csv:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0.0,0,0,30.063935999999998,-81.70766400000001,Residential,Masonry,3 
333743,FL,CLAY COUNTY,0.0,12563.76,0.0,0.0,79520.76,86854.48,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Wood,3 
172534,FL,CLAY COUNTY,0.0,254281.5,0.0,254281.5,254281.5,246144.49,0,0.0,0,0,30.060614,-81.702675,Residential,Wood,1 
785275,FL,CLAY COUNTY,0.0,515035.62,0.0,0.0,515035.62,884419.17,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Masonry,3 
995932,FL,CLAY COUNTY,0.0,19260000.0,0.0,0.0,19260000.0,20610000.0,0,0.0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 
223488,FL,CLAY COUNTY,328500.0,328500.0,328500.0,328500.0,328500.0,348374.25,0,16425.0,0,0,30.102217,-81.707146,Residential,Wood,1 
433512,FL,CLAY COUNTY,315000.0,315000.0,315000.0,315000.0,315000.0,265821.57,0,15750.0,0,0,30.118774,-81.704613,Residential,Wood,1 
142071,FL,CLAY COUNTY,705600.0,705600.0,705600.0,705600.0,705600.0,1010842.56,14112,35280.0,0,0,30.100628000000004,-81.703751,Residential,Masonry,1 

unmatched.csv:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 
114455,FL,CLAY COUNTY,589658.0,498960.0,498960.0,498960.0,498960.0,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0.0,0,0,30.089578999999997,-81.700455,Residential,Wood,1 
+0

CSV-Dateien Beispiel hinzugefügt. Auch dieser gab folgenden Fehler. AttributeError: 'DataFrame' Objekt hat kein Attribut 'crc' –

+0

Haben Sie crc als Index in df2 gemacht? Weil ich folgenden Fehler bekomme. df1.loc [df1.crc.isin (df2.crc)]. To_csv (Übereinstimmende_Datei, Index = Falsch) Datei "D: \ venv \ lib \ Site-Pakete \ Pandas \ core \ generic.py", Zeile 2744, in __getattr__ return Objekt .__ getAttribute __ (self, name) Attribute: 'Dataframe' Objekt hat kein Attribut 'crc' –

+0

@IshanBhatt, nein, sie automatisch aus der CSV-Dateien analysiert wurde - siehe – MaxU

Verwandte Themen