2015-04-21 2 views
10

Ich möchte die PCA umrechnen, die von prcomp berechnet wird, um zu meinen ursprünglichen Daten zurückzukommen.Wie man PCA in prcomp umwandelt, um ursprüngliche Daten zu erhalten

Ich dachte, so etwas wie die folgenden funktionieren würde:

pca$x %*% t(pca$rotation) 

aber es funktioniert nicht.

Der folgende Link zeigt, wie die Originaldaten von PCs zurück zu bekommen, aber es erklärt nur für PCA Eigen auf der Kovarianzmatrix http://www.di.fc.ul.pt/~jpn/r/pca/pca.html

prcomp calcluate nicht-PC, die Art und Weise verwendet wird.

"Die Berechnung erfolgt durch eine singuläre Wertzerlegung der (zentrierten und möglicherweise skalierten) Datenmatrix, nicht durch Verwendung von Eigen in der Kovarianzmatrix." -prcomp

+1

@konvas richtig ist, aber Sie können auch nicht maßstäblich und Zentrum sagen prcomp: 'pca <- prcomp (Daten, RETX = TRUE, Mitte = FALSE, scale = FALSCH)' in diesem Fall oberhalb der Formel tut Arbeit. – Mist

Antwort

11

prcomp werden die Variablen Zentrum, so dass Sie die abgezogen hinzufügen müssen, bedeutet zurück

t(t(pca$x %*% t(pca$rotation)) + pca$center) 

Wenn pca$scale ist TRUE Sie müssen auch neu angelegte

t(t(pca$x %*% t(pca$rotation)) * pca$scale + pca$center) 
+0

Wie glätten Sie Daten mit PCA? Gibt es einen gültigen Ansatz? – Qbik

0

Ich hoffe, dies wird Hilfe auch.

rm(list = ls()) 

# ---- 
# create a dataset feature of class 1, 100 samples 
f1 <- rnorm(n = 100, mean = 5, sd = 1) 

# ---- 
# still in the same feature, create class 2, also 100 samples 
f1 <- c(f1,rnorm(n = 100, mean = 10, sd = 1)) 

# ---- 
# create another feature, of course it has 200 samples 
f2 <- (f1 * 1.25) + rnorm(n = 200, mean = 7, sd = 0.75) 

# ---- 
# put them together in one container i.e dataset 
# feature #1 could better represent the separation of the two class 
# since it spread from about 4 to 11, while feature #2 spread from about 
# 6 to 8 (without addition 1.5 of feature #1) 
mydataset <- cbind(f1,f2) 

# ---- 
# create coloring label 
class.color <- c(rep(2,100),rep(3,100)) 

# ---- 
# plot the dataset 
plot(mydataset, col = class.color, main = 'the original formation') 

# ---- 
# transform it...!!!! 
pca.result <- prcomp(mydataset,scale. = TRUE, center = TRUE, retx = TRUE) 

# ---- 
# plot the samples on their new axis 
# recall that when a line was drawn at the zero value of PC 1, it could separate the red and green class 
# but not when it was drawn at the zero value of PC 2 
# the line at the zero of PC 1 put red on its left and green on its right (or vice versa) 
# the line at the zero of PC 2 put BOTH red AND green on its upper part, and ALSO BOTH red AND green on its 
# lower part... i.e. PC 2 could not separate the red and green class 
plot(pca.result$x, col = class.color, main = 'samples on their new axis') 

# ---- 
# calculate the variance explained by the PCs in percent 
# PC 1 could explain approximately 98% while PC 2 only 2% 
variance.total <- sum(pca.result$sdev^2) 
variance.explained <- pca.result$sdev^2/variance.total * 100 
print(variance.explained) 

# ---- 
# drop PC 2 ---> samples drawn at PC 1's axis ---> this is the desired new representation of dataset 
plot(x = pca.result$x[,1], y = rep(0,200), col = class.color, 
    main = 'over PC 1', ylab = '', xlab = 'PC 1') 

# ---- 
# drop PC 1 ---> samples drawn at PC 2's axis ---> this is the UNdesired new representation of dataset 
plot(x = pca.result$x[,2], y = rep(0,200), col = class.color, 
    main = 'over PC 2', ylab = '', xlab = 'PC 2') 

# ---- 
# now choose only PC 1 and get it back to the original dataset, let's see what it's like 
# take all PC 1 value, put it on first column of the new dataset, and zero pad the second column 
new.dataset <- cbind(
    pca.result$x[,1], 
    rep(0,200) 
) 

# ---- 
# take alook at a glance the new dataset 
# remember, although the choosen one was only PC 1, doesn't mean that there would be only one column 
# the second column (and all column for a larger feature) must also exist 
# but now they are all set to zero 
(new.dataset) 

# ---- 
# transform it back 
new.dataset <- new.dataset %*% solve(pca.result$rotation) 

# ---- 
# plot the new dataset that is constructed with only one PC 
# (a little clumsy though, for we already have a new better axis system, why would we use the old one?) 
plot(new.dataset,col = class.color, 
    main = 'centered and scaled\nnew dataset with only one pc ---> PC 1', xlab = 'f1', ylab = 'f2') 
# ---- 
# remember, the dots are stil in scale and center position 
# must be stretched and dragged first 
scalling.matrix <- matrix(rep(pca.result$scale,200),ncol = 2, byrow = TRUE) 
centering.matrix <- matrix(rep(pca.result$center,200),ncol = 2, byrow = TRUE) 

# ---- 
# obtain original values 
new.dataset <- (new.dataset * scalling.matrix) + centering.matrix 

# ---- 
# compare the result before and after centering 
# all dots reside the same position, but with different values 
plot(new.dataset,col = class.color, 
    main = 'stretched and dragged\nnew dataset with only one pc ---> PC 1', xlab = 'f1', ylab = 'f2') 

# ---- 
# what if all PCs were all used in construction the data? 
# they'll be forming back (but OF COURSE that's not the principal component analysis here on earth for) 
new.dataset <- cbind(
    pca.result$x[,1], 
    pca.result$x[,2] 
) 
new.dataset <- new.dataset %*% solve(pca.result$rotation) 
new.dataset <- (new.dataset * scalling.matrix) + centering.matrix 
plot(new.dataset,col = class.color, 
    main = 'new dataset with\nboth pc included ---> PC 1 & 2 present', xlab = 'f1', ylab = 'f2') 

# ---- 
# compare the inverted dots with those from the original formation, they're all the same 
Verwandte Themen