Wie die Entscheidungsgrenze der logistischen Regression in scikit plotten lernen

Ich versuche, die Entscheidungsgrenze der logistischen Regression in scikit zu plottenWie die Entscheidungsgrenze der logistischen Regression in scikit plotten lernen

features_train_df : 650 columns, 5250 rows 
features_test_df : 650 columns, 1750 rows 
class_train_df = 1 column (class to be predicted), 5250 rows 
class_test_df = 1 column (class to be predicted), 1750 rows

Klassifikator Code lernen;

tuned_logreg = LogisticRegression(penalty = 'l2', tol = 0.0001,C = 0.1,max_iter = 100,class_weight = "balanced") 
tuned_logreg.fit(x_train[sorted_important_features_list[0:650]].values, y_train['loss'].values) 
y_pred_3 = tuned_logreg.predict(x_test[sorted_important_features_list[0:650]].values)

Ich bekomme die richtige Ausgabe für den Klassifizierungscode.

Got diesen Code online:

code: 

X = features_train_df.values 
# evenly sampled points 
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5 
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50), 
       np.linspace(y_min, y_max, 50)) 
plt.xlim(xx.min(), xx.max()) 
plt.ylim(yy.min(), yy.max()) 

#plot background colors 
ax = plt.gca() 
Z = tuned_logreg.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] 
Z = Z.reshape(xx.shape) 
cs = ax.contourf(xx, yy, Z, cmap='RdBu', alpha=.5) 
cs2 = ax.contour(xx, yy, Z, cmap='RdBu', alpha=.5) 
plt.clabel(cs2, fmt = '%2.1f', colors = 'k', fontsize=14) 

# Plot the points 
ax.plot(Xtrain[ytrain == 0, 0], Xtrain[ytrain == 0, 1], 'ro', label='Class 1') 
ax.plot(Xtrain[ytrain == 1, 0], Xtrain[ytrain == 1, 1], 'bo', label='Class 2') 

# make legend 
plt.legend(loc='upper left', scatterpoints=1, numpoints=1)

Fehler:

ValueError: X has 2 features per sample; expecting 650

Bitte mir empfehlen, wo ich falsch

Quelle

2016-12-08 vikky

Wo ist der Klassifikationscode für die logistische Regression? Ich denke, das Problem liegt in der Vorhersage-Methode des Klassifikators. –

@WasiAhmad Ich habe den Klassifikator Code hinzugefügt, aber ich bekomme keinen Fehler darin. – vikky

können Sie erklären, was Sie mit dieser Anweisung tun wollen - 'ax.plot (Xtrain [ytrain == 0, 0], Xtrain [ytrain == 0, 1], 'ro', label = 'Klasse 1') '? –

ich das Problem in Ihrem Code bekam werde. Bitte werfen Sie einen Blick auf die folgende Diskussion.

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50), np.linspace(y_min, y_max, 50)) 
grid = np.c_[xx.ravel(), yy.ravel()] 
Z = tuned_logreg.predict_proba(grid)[:, 1]

Denken Sie über die Formen von Variablen hier:

np.linspace(x_min, x_max, 50) gibt eine Liste mit 50 Werten. Dann trägt das Auftragen np.meshgrid die Form von xx und yy(50, 50). Schließlich np.c_[xx.ravel(), yy.ravel()] Anwendung macht die Form Variable Gitter(2500, 2). Sie geben 2500 Instanzen mit 2 Feature-Werten an predict_proba Funktion.

Das ist der Grund, warum Sie den Fehler erhalten: ValueError: X has 2 features per sample; expecting 650. Sie müssen eine Struktur übergeben, die 650 Spaltenwerte (Features) enthält.

Während predict haben Sie es richtig gemacht.

y_pred_3 = tuned_logreg.predict(x_test[sorted_important_features_list[0:650]].values)

Also, stellen Sie sicher, dass die Anzahl von Merkmalen in den zu fit() verabschiedete Instanzen predict() und predict_proba() Methoden gleich sind.

Explanation of the example from your provided SO post :

X, y = make_classification(200, 2, 2, 0, weights=[.5, .5], random_state=15) 
clf = LogisticRegression().fit(X[:100], y[:100])

Hier ist die Form von X (200, 2) aber als Klassifizierer trainiert wird, die sie verwenden X[:100], die nur 100 Merkmale mit 2 Klassen bedeutet. Für Vorhersage, die sie verwenden:

xx, yy = np.mgrid[-5:5:.01, -5:5:.01] 
grid = np.c_[xx.ravel(), yy.ravel()]

Hier Form von xx ist (1000, 1000) und Gitter ist (1000000, 2). Die Anzahl der für Training und Tests verwendeten Funktionen beträgt also 2.

Quelle

2016-12-09 00:39:46

Wie die Entscheidungsgrenze der logistischen Regression in scikit plotten lernen

Antwort

Verwandte Themen