2017-10-04 1 views
0

Ich habe eine CSV-Datei mit vier Spalten AGE, DIASTOLIC, BMI, EVER.PREGNANT. Ich möchte das Histogramm grafisch darstellen, das AGE auf der X-Achse mit DIASTOLIC auf der Y-Achse vergleicht. Wie könnte ich das tun? Der Code, den ich geschrieben habe, ist:Histogramm in R aus einer CSV-Datei mit vier Spalten

Sheet=read.csv("/home/prajnan/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv",sep=",", header = T) hist(Sheet[2],Sheet[3]$AGE$DIASTOLIC)

Der Fehler, den ich bekommen ist:

Error in hist.default(Sheet[2], Sheet[3]$AGE$DIASTOLIC) :'x' must be numeric Wo liegt der Fehler? Danke im Voraus.

Anmerkung: Die Ausgabe für dput (Kopf (Blech, 10)) ist:

structure(list(X = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), 
X.1 = structure(c(1L, 53L, 31L, 12L, 13L, 2L, 14L, 11L, 7L, 
34L), .Label = c("", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", 
"57", "58", "59", "60", "61", "62", "63", "64", "65", "66", 
"67", "68", "69", "70", "81", "AGE"), class = "factor"), 
X.2 = structure(c(1L, 48L, 31L, 28L, 26L, 28L, 13L, 32L, 
17L, 30L), .Label = c("", "100", "102", "104", "106", "108", 
"110", "114", "122", "24", "30", "38", "40", "44", "46", 
"48", "50", "52", "54", "55", "56", "58", "60", "61", "62", 
"64", "65", "66", "68", "70", "72", "74", "75", "76", "78", 
"80", "82", "84", "85", "86", "88", "90", "92", "94", "95", 
"96", "98", "DIASTOLIC"), class = "factor"), X.3 = structure(c(1L, 
248L, 124L, 63L, 31L, 78L, 210L, 54L, 104L, 100L), .Label = c("", 
"18.2", "18.4", "19.1", "19.3", "19.4", "19.5", "19.6", "19.9", 
"20", "20.1", "20.4", "20.8", "21", "21.1", "21.2", "21.7", 
"21.8", "21.9", "22.1", "22.2", "22.3", "22.4", "22.5", "22.6", 
"22.7", "22.9", "23", "23.1", "23.2", "23.3", "23.4", "23.5", 
"23.6", "23.7", "23.8", "23.9", "24", "24.1", "24.2", "24.3", 
"24.4", "24.5", "24.6", "24.7", "24.8", "24.9", "25", "25.1", 
"25.2", "25.3", "25.4", "25.5", "25.6", "25.8", "25.9", "26", 
"26.1", "26.2", "26.3", "26.4", "26.5", "26.6", "26.7", "26.8", 
"26.9", "27", "27.1", "27.2", "27.3", "27.4", "27.5", "27.6", 
"27.7", "27.8", "27.9", "28", "28.1", "28.2", "28.3", "28.4", 
"28.5", "28.6", "28.7", "28.8", "28.9", "29", "29.2", "29.3", 
"29.5", "29.6", "29.7", "29.8", "29.9", "30", "30.1", "30.2", 
"30.3", "30.4", "30.5", "30.7", "30.8", "30.9", "31", "31.1", 
"31.2", "31.3", "31.6", "31.9", "32", "32.1", "32.2", "32.3", 
"32.4", "32.5", "32.6", "32.7", "32.8", "32.9", "33.1", "33.2", 
"33.3", "33.5", "33.6", "33.7", "33.8", "33.9", "34", "34.1", 
"34.2", "34.3", "34.4", "34.5", "34.6", "34.7", "34.8", "34.9", 
"35", "35.1", "35.2", "35.3", "35.4", "35.5", "35.6", "35.7", 
"35.8", "35.9", "36", "36.1", "36.2", "36.3", "36.4", "36.5", 
"36.6", "36.7", "36.8", "36.9", "37", "37.1", "37.2", "37.3", 
"37.4", "37.5", "37.6", "37.7", "37.8", "37.9", "38", "38.1", 
"38.2", "38.3", "38.4", "38.5", "38.6", "38.7", "38.8", "38.9", 
"39", "39.1", "39.2", "39.3", "39.4", "39.5", "39.6", "39.7", 
"39.8", "39.9", "40", "40.1", "40.2", "40.5", "40.6", "40.7", 
"40.8", "40.9", "41", "41.2", "41.3", "41.5", "41.8", "42", 
"42.1", "42.2", "42.3", "42.4", "42.6", "42.7", "42.8", "42.9", 
"43.1", "43.3", "43.4", "43.5", "43.6", "44", "44.1", "44.2", 
"44.5", "44.6", "45", "45.2", "45.3", "45.4", "45.5", "45.6", 
"45.7", "45.8", "46.1", "46.2", "46.3", "46.5", "46.7", "46.8", 
"47.9", "48.3", "48.8", "49.3", "49.6", "49.7", "50", "52.3", 
"52.9", "53.2", "55", "57.3", "59.4", "67.1", "BMI"), class = "factor"), 
X.4 = structure(c(1L, 2L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L), .Label = c("", 
"EVER-PREGNANT", "\"no\"", "\"yes\""), class = "factor")), .Names = c("X", 

"X.1", "X.2", "X.3", "X.4"), row.names = c (NA, 10L), class = "data.frame")

Antwort

2

Erstens ist ein Histogramm eine Grafik, die die Häufigkeit von Werten in einer einzigen Verteilung zeigt. Sie können es nicht verwenden, um zwei Werte zu vergleichen. Zum Betrachten von einer einzigen Verteilung innerhalb Ihres Datensatzes kann man so etwas tun:

hist(sheet$AGE) 

und ebenfalls:

hist(sheet$DIASTOLIC) 

, wenn man sie wollten zusammen aufgetragen werden, um die beiden Verteilungen vergleichen Sie dies tun könnte :

par(mfrow = c(2, 1)) 
hist(sheet$AGE) 
hist(sheet$DIASTOLIC) 

Wenn aber Sie suchen die beiden Variablen direkt vergleichen zu können, wird ein Histogramm wahrscheinlich nicht das, was Sie wollen. Sie könnten beginnen, indem Sie ein einfaches Streudiagramm wie folgt machen:

+0

Wenn ich 'hist (Blatt $ AGE)', bekomme ich den Fehler; 'x' muss numerisch sein, wie soll ich vorgehen? – vidyarthi

+0

Wenn ich raten müsste, würde ich vermuten, dass die Spalte eher als ein Faktor als als numerisch eingelesen wurde. try 'hist (as.numeric (Sheet $ AGE))' Ich könnte sicher sagen, wenn Sie die Ausgabe von 'dput (Sheet)' in Ihre Frage einfügen – tbradley

+0

probiert 'hist (as.numeric (Sheet $ AGE)) 'mit Ausgabe als Fehler in hist.default (as.numeric (Blatt $ AGE)): ungültige Anzahl von 'Pausen'. Die Ausgabe von 'dput (Sheet)' ist so lang, dass es erforderlich ist, sie als Code einzufügen. Soll ich stattdessen ein Bild senden? – vidyarthi