Nach der Überprüfung meiner Daten scheint es in meinem Modell einige größere Hebelwerte zu geben als erwartet (oder erhofft). Ich möchte sie entfernen, basierend darauf, welche Beobachtung die Hebelschwelle überschritten hat.Entfernen von Beobachtungen basierend auf Hebelwerten
df <- df[ df$title_year >= 2000 , ]
model <- lm(df$gross ~ . , data= df)
lev <- hatvalues(model3)
lev_val <- lev[ lev > 2 * 8/2546
> names(lev_val)
[1] "282" "90" "103" "25" "189" "53" "147" "51" "1526" "1" "133" "34" "273" "1852" "2172" "5" "916" "200" "79" "50"
[21] "29" "632" "266" "149" "28" "26" "107" "105" "202" "144" "203" "8" "414" "54" "279" "394" "4" "155" "10" "1539"
[41] "27" "1318" "153" "109" "1790" "1327" "918" "1275" "1055" "85" "11" "1543" "919" "33" "481" "759" "43" "2404" "30" "920"
[61] "212" "123" "42" "223" "58" "118" "111" "32" "281" "88" "1075" "1061" "421" "1517" "487" "2084" "774" "934" "1069" "86"
[81] "113" "221" "37" "60" "112" "304" "1347" "117" "697" "102" "1472" "225" "773" "2" "219" "121" "151" "1856" "18" "122"
[101] "182" "518" "1084" "2014" "776" "300" "71" "125" "2070" "1854" "97" "1288" "38" "1087" "2004" "227" "1747" "64" "98" "264"
[121] "23" "1729" "150" "65" "1559" "36" "52" "13" "128" "328" "39" "524" "886" "613" "2195" "2441" "2446" "2096" "84" "1522"
[141] "15" "1800" "462" "41" "1100" "1804" "14" "162" "1336" "232" "2193" "24" "529" "2088" "787" "68" "433" "785" "66" "1340"
[161] "1863" "1587" "788" "950" "2450" "1717" "158" "682" "2531" "951" "1110" "535" "539" "1478" "35" "72" "2136" "547" "1887" "21"
[181] "2208" "1873" "1126" "2144" "1805" "2211" "1722" "1874" "1721" "91" "16" "2451" "410" "31" "1153" "1593" "280" "1878" "366" "20"
[201] "2407" "69" "1605" "1606" "694" "1881" "403" "73" "303" "1149" "955" "1144" "893" "1138" "2221" "140" "404" "819" "1365" "2225"
[221] "1374" "1690" "1485" "442" "1150" "1614" "1486" "78" "1894" "1998" "185" "2236" "2229" "1373" "1155" "1369" "1817" "184" "1305" "9"
[241] "2017" "331" "1910" "466" "1621" "566" "1619" "1372" "2239" "570" "1908" "947" "1906" "587" "170" "1629" "17" "308" "2112" "2257"
[261] "400" "575" "1166" "402" "1745" "1733" "1818" "372" "962" "329" "829" "966" "67" "1174" "2248" "703" "1622" "1624" "1623" "574"
[281] "2255" "269" "1013" "1180" "1387" "1627" "40" "1924" "2270" "2260" "1631" "95" "1628" "1184" "2423" "2050" "1481" "1388" "1037" "2060"
[301] "843" "1746" "1202" "975" "977" "1392" "256" "2109" "2541" "333" "1495" "1047" "2426" "2478" "2277" "250" "2282" "981" "92" "848"
[321] "2290" "75" "1995" "1646" "2143" "688" "2123" "2488" "2499" "255" "1941" "1190" "914" "1189" "2487" "1911" "1768" "382" "2305" "1512"
[341] "2113" "2127" "2329" "980" "1203" "1204" "582" "1949" "2185" "600" "588" "1422" "1468" "1963" "2425" "2331" "2494" "2122" "1273" "2415"
[361] "2147" "2335" "2500" "2276" "864" "2397" "2504" "1235" "1269" "2361" "2360" "2119" "1009" "2371" "2362" "709" "1976" "1216" "2039" "594"
[381] "595" "2517" "77" "2124" "2373" "1680" "1682" "998" "259" "1984" "2386" "1785" "1683" "143" "1990" "1179" "2163" "2418" "82"
Ich bin nicht sicher, wie die übermäßigen Hebelwerte aus meinen Daten entfernt werden.
Ich habe versucht df2 = df[which(names(df) %nin% remove),]
, aber ich hatte keinen Erfolg mit dieser oder kleinere Variationen. Es entfernt mehr Zeilen als es zu große Hebelwerte gibt, wahrscheinlich weil es die Werte entfernt, nicht X-te Beobachtung.
Jeder hat einen grundlegenden R-Code dafür? Ich scheine verloren zu sein.
Dies ist möglicherweise sehr schlechte Statistiken. Sie sollten möglicherweise eine Umwandlung auf Ihrem 'y' betrachten. Ohne weitere Details kann keine Fehlersuche durchgeführt werden. Die Antwort, nach der du fragst, ist 'df2 = df [! (Lev> 2 * 8/2546)]], was wahrscheinlich nicht die Antwort ist, die du wirklich willst. –