Ich denke, das ist, was Sie wollen.
Lassen Sie uns Setup bis Beispieldaten:
tt = '''I want to search through the. searchList and check if column text str.contains one or more of each searchWord. If I get a match I want to append the data to masterdf which is easily accomplished as seen below. But I also want to add a new column with searchWord so that I know which text matched with what. This code below fills the column searchWord with the. latest search that matched'''
text_col = tt.split('.')
id_col = range(len(text_col))
jsons_data = pd.DataFrame({'doc_id':id_col,'text':text_col})
searchList = ['code','fills', 'But','also','want']
Das Beispiel jsons_data
ist
doc_id text
0 0 I want to search through the
1 1 searchList and check if column text str
2 2 contains one or more of each searchWord
3 3 If I get a match I want to append the data to...
4 4 But I also want to add a new column with sear...
5 5 This code below fills the column searchWord w...
6 6 latest search that matched
Code ändern mit search['searchWord'] = searchWord
erhalten wir:
masterdf = pd.DataFrame(columns=['doc_id','text','searchWord'])
for searchWord in searchList:
search = jsons_data[jsons_data['text'].str.contains(searchWord)]
if len(search) > 0:
search['searchWord'] = searchWord
masterdf = masterdf.append(search)
Und masterdf
ist
doc_id text searchWord
5 5.0 This code below fills the column searchWord w... code
5 5.0 This code below fills the column searchWord w... fills
4 4.0 But I also want to add a new column with sear... But
4 4.0 But I also want to add a new column with sear... also
0 0.0 I want to search through the want
3 3.0 If I get a match I want to append the data to... want
4 4.0 But I also want to add a new column with sear... want
Dies sieht wirklich nett. Warum ist das besser als Looping? Und gibt es eine "klarere"/korrektere Art, über dieses Problem nachzudenken, als "add column when match ..."? – user3471881
@ user3471881, weil für größere (1000+) Datensätze vektorisierte Lösungen im Vergleich zu "looping" -Lösungen in der Regel um Größenordnungen schneller sind – MaxU