Ich würde gerne Propensity Score Match mit R-Funktion matchit, wenn ich Daten aus einer CSV-Datei lesen, alles sieht gut aus und das Ergebnis ist was ich will:Unterschied zwischen as.data.frame und read.csv in R
> csv <- read.csv("C:/Users/Lenovo/Desktop/ddd.csv", header=TRUE)
> df <- as.data.frame(csv)
> df
PERSON_ID OUTCOME tnb gxy AGE1
1 166920 1 2 0 61
2 167350 1 2 0 65
3 167757 1 1 0 58
4 167812 1 1 0 63
5 168271 1 2 0 55
6 168426 0 2 0 47
7 168652 0 2 1 57
8 168983 0 1 0 51
9 169083 0 2 0 50
10 169172 0 2 1 53
> fm <- matchit(OUTCOME ~ tnb + AGE1, data = df, method = "nearest")
> result <- summary(fm)
> result
Call:
matchit(formula = OUTCOME ~ tnb + AGE1, data = df, method = "nearest")
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 0.8334 0.1666 0.2575 0.6667 0.867 0.6667 0.8964
tnb 1.6000 1.8000 0.4472 -0.2000 0.000 0.2000 1.0000
AGE1 60.4000 51.6000 3.7148 8.8000 8.000 8.8000 10.0000
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 0.8334 0.1666 0.2575 0.6667 0.867 0.6667 0.8964
tnb 1.6000 1.8000 0.4472 -0.2000 0.000 0.2000 1.0000
AGE1 60.4000 51.6000 3.7148 8.8000 8.000 8.8000 10.0000
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 0 0 0 0
tnb 0 0 0 0
AGE1 0 0 0 0
Sample sizes:
Control Treated
All 5 5
Matched 5 5
Unmatched 0 0
Discarded 0 0
Allerdings, wenn ich Arrays verwenden Eingangsdaten zu halten, warf sie dann zu data.frame, hat die Ergebnismatrix viele Zeilen, deren Zeilennamen sind nicht I definiert:
> OUTCOME<-c("1", "1", "1", "1", "1", "0", "0", "0", "0", "0");
> PERSON_ID<-c("166920", "167350", "167757", "167812", "168271", "168426", "168652", "168983", "169083", "169172");
> tnb<-c("0", "0", "1", "0", "1", "0", "0", "1", "1", "0");
> gxy<-c("0", "0", "1", "0", "0", "1", "0", "0", "1", "0");
> AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53");
> matrix <- cbind(PERSON_ID,OUTCOME,tnb,gxy,AGE1)
> data <- as.data.frame(matrix, stringsAsFactors= TRUE)
> data
PERSON_ID OUTCOME tnb gxy AGE1
1 166920 1 0 0 61
2 167350 1 0 0 65
3 167757 1 1 1 58
4 167812 1 0 0 63
5 168271 1 1 0 55
6 168426 0 0 1 47
7 168652 0 0 0 57
8 168983 0 1 0 51
9 169083 0 1 1 50
10 169172 0 0 0 53
> fm <- matchit(OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", replace = TRUE, ratio = 1)
> summary(fm)
Call:
matchit(formula = OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest",
replace = TRUE, ratio = 1)
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 1.0 0.0 0.0000 1.0 1 1.0 1
tnb0 0.6 0.6 0.5477 0.0 0 0.0 0
tnb1 0.4 0.4 0.5477 0.0 0 0.0 0
gxy1 0.2 0.4 0.5477 -0.2 0 0.2 1
AGE150 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE151 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE153 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE155 0.2 0.0 0.0000 0.2 0 0.2 1
AGE157 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE158 0.2 0.0 0.0000 0.2 0 0.2 1
AGE161 0.2 0.0 0.0000 0.2 0 0.2 1
AGE163 0.2 0.0 0.0000 0.2 0 0.2 1
AGE165 0.2 0.0 0.0000 0.2 0 0.2 1
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 1.0 0.0 0.0000 1.0 1.0 1.0 1
tnb0 0.6 0.8 0.5657 -0.2 0.0 0.0 0
tnb1 0.4 0.2 0.5657 0.2 0.0 0.0 0
gxy1 0.2 0.8 0.5657 -0.6 0.0 0.0 0
AGE150 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE151 0.0 0.2 0.5657 -0.2 0.5 0.5 1
AGE153 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE155 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE157 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE158 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE161 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE163 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE165 0.2 0.0 0.0000 0.2 0.5 0.5 1
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 0 0 0 0
tnb0 -Inf 0 0 0
tnb1 -Inf 0 0 0
gxy1 -200 0 100 100
AGE150 100 0 100 100
AGE151 0 -Inf -150 0
AGE153 100 0 100 100
AGE155 0 -Inf -150 0
AGE157 100 0 100 100
AGE158 0 -Inf -150 0
AGE161 0 -Inf -150 0
AGE163 0 -Inf -150 0
AGE165 0 -Inf -150 0
Sample sizes:
Control Treated
All 5 5
Matched 2 5
Unmatched 3 0
Discarded 0 0
Meine Frage ist: read.csv gibt einen Datenrahmen zurück, da.data.frame (x) auch einen Datenfrapunkt zurückgibt Warum unterscheiden sich die Ergebnisse in Rs Matchit-Ausgabe?
formatieren Sie bitte Ihren CSV, um als Tabelle angezeigt zu werden, um in Ihrer Frage leicht zu sehen – user93