2017-12-31 50 views
1

Ich habe einen Datenrahmen wie unten:Zählfrequenzen jeden Buchstaben für mehrere Spalt

> dfnew 

    C1 C2 C3 C4 C5 C6 
1 A A G A G A 
2 A T T T G G 
3 T A G A T A 
4 C A A A A G 
5 C A T T T C 
6 C A A A T A 
7 T C T G A A 
8 G A G C T A 
9 C T A T G A 
10 G A A A G G 
11 G G T T T A 
12 G A C T T A 
13 T T C T T T 
14 A T A G C T 
15 A C A A A A 
16 A A C A A A 
17 T G G A A T 
18 A A A A G T 
19 G T G G <NA> <NA> 

Ich mag, wie unten in einer Zeile Code in R erhalten beantworten, ohne Looping:

A 6 10 7 9 5 10 
C 4 2 3 1 1 1 
G 5 2 5 3 5 3 
T 4 5 4 6 7 4 
+1

Etwas wie folgt: 'Struktur (list (c ("A", "C", "G", "T"), c (6L, 4L, 5L, 4L), c (10L, 2L, 2L, 5L), c (7L, 3L, 5L, 4L), c (9L, 1L, 3L, 6L), c (5L, 1L, 5L, 7L), c (10L, 1L, 3L, 4L)), row.names = c (NA, -4L), class = "data.frame") ' – PoGibas

+4

Mit nur' sapply (dfnew, table) 'gibt Ihnen das gewünschte Ergebnis. – Jaap

Antwort

6

wir können sapply zu Schleife über die Spalten verwenden, wandeln es in factor mit levels angegeben und erhalten die Häufigkeit, mit table

sapply(dfnew, function(x) table(factor(x, levels = c("A", "C", "G", "T")))) 

Oder tidyverse

library(dplyr) 
library(tidyr) 
dfnew %>% 
    gather(key, val, na.rm = TRUE) %>% 
    count(key, val) %>% 
    spread(key, n) 
3

mit dem Wenn Sie stack verwenden, um alles zu langer Form neu zu gestalten, können Sie table auf dem Ergebnis nennen:

dfnew <- data.frame(C1 = c("A", "A", "T", "C", "C", "C", "T", "G", "C", "G", "G", "G", "T", "A", "A", "A", "T", "A", "G"), 
        C2 = c("A", "T", "A", "A", "A", "A", "C", "A", "T", "A", "G", "A", "T", "T", "C", "A", "G", "A", "T"), 
        C3 = c("G", "T", "G", "A", "T", "A", "T", "G", "A", "A", "T", "C", "C", "A", "A", "C", "G", "A", "G"), 
        C4 = c("A", "T", "A", "A", "T", "A", "G", "C", "T", "A", "T", "T", "T", "G", "A", "A", "A", "A", "G"), 
        C5 = c("G", "G", "T", "A", "T", "T", "A", "T", "G", "G", "T", "T", "T", "C", "A", "A", "A", "G", NA), 
        C6 = c("A", "G", "A", "G", "C", "A", "A", "A", "A", "G", "A", "A", "T", "T", "A", "A", "T", "T", NA), 
        stringsAsFactors = FALSE) 

table(stack(dfnew)) 
#>  ind 
#> values C1 C2 C3 C4 C5 C6 
#>  A 6 10 7 9 5 10 
#>  C 4 2 3 1 1 1 
#>  G 5 2 5 3 5 3 
#>  T 4 5 4 6 7 4 
1

mit data.table und seiner Pfeife worflow mit [:

library(data.table) 
tab <- fread(" 
C1 C2 C3 C4 C5 C6 
A A G A G A 
A T T T G G 
T A G A T A 
C A A A A G 
C A T T T C 
C A A A T A 
T C T G A A 
G A G C T A 
C T A T G A 
G A A A G G 
G G T T T A 
G A C T T A 
T T C T T T 
A T A G C T 
A C A A A A 
A A C A A A 
T G G A A T 
A A A A G T 
G T G G NA NA") 

tab[, melt(.SD, measure.vars = paste0("C", 1:6), na.rm = TRUE)][ 
    , dcast(.SD, value ~ variable, fun = length, drop = TRUE) 
    ] 
#> value C1 C2 C3 C4 C5 C6 
#> 1:  A 6 10 7 9 5 10 
#> 2:  C 4 2 3 1 1 1 
#> 3:  G 5 2 5 3 5 3 
#> 4:  T 4 5 4 6 7 4 
Verwandte Themen