2017-10-27 1 views
1

Ich extrahierte Facebook-Posts mit Rfacebook nach Zeitkriterien (Code: siehe unten) und möchte alle Ergebnisse (dh Zeilen im Datenrahmen) löschen, wo eine Spalte ("Nachrichten")) enthält kein Schlüsselwort.Zeilen löschen, wenn es kein Schlüsselwort enthält

Meine einzige "Lösung" mit grep verlässt mich nur mit dem Inhalt dieser Spalte. Kann mir jemand helfen?

Code:

# RETRIEVING DATA 
BBCpage <- getPage(page="bbcnews", token=fb_oauth, n=20, since="2017-05-03", feed=FALSE, reactions=TRUE, verbose=TRUE) 

BBCpage$message 
#Now I only want to keep the rows where the field "message" contains one of my keywords "Brexit" or "European Union" 

# possibility 1: not working, since I end up with ONLY the content of 'messages, not the entire row  
     pattern <- "Brexit|European Union" 
     grep(pattern, BBCpage, ignore.case=TRUE, perl = FALSE, value = TRUE, fixed = FALSE, useBytes = FALSE, invert = FALSE) 



# possibility 2: not working, no filter applied 
    matches <- c("Brexit", "European Union") 
    BBCfiltered <- BBCpage[!(BBCpage$message %in% matches), ] 

Kann mir jemand helfen, herauszufinden, wie ich der Filter angewendet bekommen kann?

Vielen Dank im Voraus,

Ivo

- EDIT: als pro Anfrage: hier ist der Ausgang: für die Ausführung des folgenden Code:

BBCpage <- getPage(page="bbcnews", token=fb_oauth, n=20, since="2017-05-03", feed=FALSE, reactions=TRUE, verbose=TRUE) 

> dput(BBCpage) 
structure(list(id = c("228735667216_10155253874762217", "228735667216_10155253984962217", 
"228735667216_10155254016922217", "228735667216_1510422315708643", 
"228735667216_10155254242117217", "228735667216_10155254357457217", 
"228735667216_10155254531807217", "228735667216_10155254645177217", 
"228735667216_10155254739207217", "228735667216_10155254848077217", 
"228735667216_10155255021777217", "228735667216_10155255187982217", 
"228735667216_10155255303912217", "228735667216_10155255312537217", 
"228735667216_10155255092167217", "228735667216_10155256112042217", 
"228735667216_10155256182962217", "228735667216_10155256278057217", 
"228735667216_1993087934041388", "228735667216_10155256481732217" 
), likes_count = c(24996, 1385, 1280, 8870, 2104, 5906, 5813, 
15842, 9313, 3315, 944, 6485, 1638, 1638, 2045, 4356, 2098, 1305, 
237, 741), from_id = c("228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216"), from_name = c("BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News"), 
    message = c("The Catalan parliament votes to declare independence from Spain - as Madrid looks set to impose direct rule.", 
    "As Halloween approaches, we are revisiting a spooky American classic. Goosebumps books were a scary children's book series that have been around for 25 years. We were #LIVE with Tim Jacobus, the artist behind the creepy cover art.", 
    "Do hotel comparison sites really give you the best deal?", 
    "The first official exhibition about the late pop icon Prince has opened in London - with the help of his little sister. <ed><U+00A0><U+00BC><ed><U+00BE><U+00B8><U+2728> #MyNameisPrince\n\n(via BBC Entertainment News)", 
    "British-born novelist Christina Baker Kline says the ex-president \"squeezed my butt\" as she posed for a photo.", 
    "Ecstatic scenes in Barcelona as Catalonia’s parliament votes to declare independence from Spain - but Madrid has approved direct rule over the region.\n\nbbc.in/2zbEyCn", 
    "Her husband dropped her at a doctor's appointment in 1975 - and that was the last he ever heard of her.", 
    "<ed><U+00A0><U+00BC><ed><U+00BE><U+0083><ed><U+00A0><U+00BD><ed><U+00B0><U+00BE> No tricks, just treats for these animals at Halloween. <ed><U+00A0><U+00BD><ed><U+00B0><U+00BE><ed><U+00A0><U+00BC><ed><U+00BE><U+0083>", 
    "“We are pure. We are strong. We are brave. And we will fight.”\n\nRose McGowan's message to women in her first public remarks since accusing Harvey Weinstein of rape.", 
    "Downing Street said the declaration was based on an illegal vote. But The Scottish Government said it respected Catalonia's position.", 
    "Surely this should have been: \"Eleven things you need to know about Stranger Things\"... <ed><U+00A0><U+00BE><ed><U+00B4><U+00A6><U+200D><U+2640><U+FE0F>", 
    "\"You have no weight problems, that's the good news.\"\n\nPresident Donald J. Trump handed out Halloween treats and the odd trick to journalists' children on their trip to the Oval Office.", 
    "The actresses are the latest women to make allegations against film director James Toback.", 
    "\"Why are you asking me what I wore? It should not happen, no means no.\"", 
    "A pair of US speed climbers have cracked an \"unbeatable\" record for scaling one of the world's best known rock faces - El Capitan.", 
    "Cambridge University say the online repository has \"never seen numbers like this before\".", 
    "Spain's Deputy PM Soraya Saenz de Santamaria is put in charge of Catalonia after its government was dismissed.", 
    "Did you get enough sleep last night?", "\"Sometimes, I think coming into the studio with you John is a bit like going into Harvey Weinstein's bedroom.\"\n\nUK environment secretary Michael Gove apologises for what he says was his \"clumsy attempt at humour\" on a special edition of BBC Radio 4's Today programme. bbc.in/2idoZPk\n\n(Via BBC Politics)", 
    "Rescuers save caimans from a sticky situation in Brazil." 
    ), created_time = c("2017-10-27T13:36:37+0000", "2017-10-27T14:32:50+0000", 
    "2017-10-27T14:34:09+0000", "2017-10-27T15:20:00+0000", "2017-10-27T16:13:54+0000", 
    "2017-10-27T17:04:07+0000", "2017-10-27T17:53:05+0000", "2017-10-27T18:44:23+0000", 
    "2017-10-27T19:29:38+0000", "2017-10-27T20:21:24+0000", "2017-10-27T21:09:17+0000", 
    "2017-10-27T22:11:04+0000", "2017-10-27T22:45:09+0000", "2017-10-27T22:50:13+0000", 
    "2017-10-27T23:44:00+0000", "2017-10-28T07:15:39+0000", "2017-10-28T08:17:01+0000", 
    "2017-10-28T09:18:02+0000", "2017-10-28T10:28:12+0000", "2017-10-28T11:14:21+0000" 
    ), type = c("link", "video", "link", "video", "link", "video", 
    "link", "video", "video", "link", "link", "video", "link", 
    "link", "video", "link", "link", "link", "video", "video" 
    ), link = c("http://bbc.in/2zTuomQ", "https://www.facebook.com/bbcnews/videos/10155253984962217/", 
    "http://bbc.in/2y9oCAc", "https://www.facebook.com/bbcnews/videos/1510422315708643/", 
    "http://bbc.in/2ia2Q4M", "https://www.facebook.com/bbcnews/videos/10155254357457217/", 
    "http://bbc.in/2iaQ3if", "https://www.facebook.com/bbcnews/videos/10155254645177217/", 
    "https://www.facebook.com/bbcnews/videos/10155254739207217/", 
    "http://bbc.in/2zW9sLZ", "http://bbc.in/2z9SHQr", "https://www.facebook.com/bbcnews/videos/10155255187982217/", 
    "http://bbc.in/2zcSkVm", "http://bbc.in/2zUQc1E", "https://www.facebook.com/bbcnews/videos/10155255092167217/", 
    "http://bbc.in/2zelIu3", "http://bbc.in/2zfgXQY", "http://bbc.in/2ybP2S4", 
    "https://www.facebook.com/bbcnews/videos/1993087934041388/", 
    "https://www.facebook.com/bbcnews/videos/10155256481732217/" 
    ), story = c(NA, "BBC News was live.", NA, NA, NA, NA, NA, 
    NA, NA, NA, "BBC News shared BBC Entertainment News's post.", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), comments_count = c(1982, 
    412, 164, 2778, 1069, 963, 246, 727, 707, 896, 97, 3111, 
    198, 167, 232, 100, 385, 158, 147, 18), shares_count = c(10001, 
    198, 235, 2756, 262, 1677, 567, 4358, 1634, 602, 2, 1850, 
    75, 188, 363, 296, 231, 283, 33, 81), love_count = c(2294, 
    203, 23, 2224, 36, 625, NA, 2744, NA, 249, 83, NA, 55, 49, 
    94, NA, NA, NA, 8, NA), haha_count = c(549, 19, 67, 11, 697, 
    148, NA, 605, NA, 224, 26, NA, 24, 9, 4, NA, NA, NA, 73, 
    NA), wow_count = c(6987, 31, 66, 256, 169, 898, NA, 76, NA, 
    136, 7, NA, 101, 30, 249, NA, NA, NA, 13, NA), sad_count = c(392, 
    2, 1, 26, 85, 134, NA, 5, NA, 83, 1, NA, 218, 183, 1, NA, 
    NA, NA, 3, NA), angry_count = c(398, 17, 10, 6, 305, 183, 
    NA, 2, NA, 865, 0, NA, 32, 248, 2, NA, NA, NA, 61, NA)), .Names = c("id", 
"likes_count", "from_id", "from_name", "message", "created_time", 
"type", "link", "story", "comments_count", "shares_count", "love_count", 
"haha_count", "wow_count", "sad_count", "angry_count"), row.names = c(1L, 
2L, 3L, 19L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L, 
15L, 16L, 17L, 20L, 18L), class = "data.frame") 
> 

- EDIT 2: Einer der Kommentare funktionierte (siehe die Antwort unten); danke r2evans

+0

'grepl' gibt logische Werte zurück oder' grep (... Wert = F) 'gibt Indizes für übereinstimmende Elemente zurück. Beide sind nützlich für die Indexierung von Zeilen/Spalten. –

+0

Können Sie die Daten bereitstellen, ohne dass wir sie von der Seite abrufen müssen? Kopieren und fügen Sie die Ausgabe von 'dput (BBCpage)' in Ihre Frage ein – useR

+0

'BBCpage [grepl (Muster, BBCpage $ Nachricht, ...),]'? – r2evans

Antwort

1

Der Vorschlag von r2evans schien zu funktionieren. Ich leicht modifiziert den Code und tat dies:

  BBC_page_relevant <- BBC_page[grepl(pattern, BBC_page$message, ...),] 

und dies scheint zu funktionieren, die entsprechenden Stellen in der data.frame BBC_page_relevant speichern.

Vielen Dank für die schnelle und hilfreiche Antworten. Am besten, Ivo

Verwandte Themen