Ich versuche, einen kNN-Klassifikator über meine Datenmenge mit 10-fach CV auszuführen. Ich habe einige Erfahrungen mit Modellen in WEKA, aber ich habe Mühe, dies auf Sklearn zu übertragen.Python/Sklearn - Wert Fehler: Konnte Zeichenfolge nicht in Gleitkomma konvertieren
Unten ist mein Code
filename = 'train4.csv'
names = ['attribute names are here']
df = pandas.read_csv(filename, names=names)
num_folds = 10
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsClassifier()
results = cross_val_score(model, df.drop('mix1_instrument', axis=1), df['mix1_instrument'], cv=kfold)
print(results.mean())
Ich erhalte diesen Fehler
ValueError: could not convert string to float: ''
Wie kann ich dieses Attribut konvertieren? Und das enthält nützliche Informationen zum Klassifizieren meiner Instanzen, würde eine Konvertierung dies beeinflussen?
Es gibt zwei Attribute, die 'Objekt' sind, die ich brauche Umwandlung namens 'class1' glauben und klasse2'
Beispieldaten unter ...
{
'temporalCentroid': {
0: 'temporalCentroid',
1: '1.67324',
2: '1.330722',
3: '0.786984',
4: '1.850129'
},
'LogSpecCentroid': {
0: 'LogSpecCentroid',
1: '-1.043802',
2: '-0.82943',
3: '-2.441297',
4: '-0.837145'
},
'LogSpecSpread': {
0: 'LogSpecSpread',
1: '0.747558',
2: '1.378373',
3: '0.667634',
4: '1.238404'
},
'MFCC1': {
0: 'MFCC1',
1: '3.502117',
2: '6.697601',
3: '4.011488',
4: '0.823614'
},
'MFCC2': {
0: 'MFCC2',
1: '-9.208897',
2: '-9.741549',
3: '15.27665',
4: '-15.22256'
},
'MFCC3': {
0: 'MFCC3',
1: '-2.334097',
2: '-9.868089',
3: '0.802509',
4: '-4.978688'
},
'MFCC4': {
0: 'MFCC4',
1: '-9.013086',
2: '0.609091',
3: '2.50685',
4: '-2.489553'
},
'MFCC5': {
0: 'MFCC5',
1: '4.847481',
2: '1.733307',
3: '0.10459',
4: '1.066615'
},
'MFCC6': {
0: 'MFCC6',
1: '-4.770421',
2: '-5.381835',
3: '-0.260118',
4: '-1.020861'
},
'MFCC7': {
0: 'MFCC7',
1: '-3.362488',
2: '-1.261088',
3: '0.593255',
4: '-2.007349'
},
'MFCC8': {
0: 'MFCC8',
1: '-9.527529',
2: '-3.809237',
3: '-0.362287',
4: '-8.938164'
},
'MFCC9': {
0: 'MFCC9',
1: '-9.629579',
2: '1.486923',
3: '-2.957592',
4: '-2.324424'
},
'MFCC10': {
0: 'MFCC10',
1: '1.848685',
2: '-3.938455',
3: '-1.884439',
4: '-2.535579'
},
'MFCC11': {
0: 'MFCC11',
1: '-2.311295',
2: '-2.159865',
3: '-0.827179',
4: '0.638553'
},
'MFCC12': {
0: 'MFCC12',
1: '-7.696675',
2: '-3.138412',
3: '-0.605056',
4: '-1.116259'
},
'MFCC13': {
0: 'MFCC13',
1: '10.35572',
2: '9.095669',
3: '6.426399',
4: '15.04535'
},
'MFCCMin': {
0: 'MFCCMin',
1: '-9.629579',
2: '-9.868089',
3: '-2.957592',
4: '-15.22256'
},
'MFCCMax': {
0: 'MFCCMax',
1: '10.35572',
2: '9.095669',
3: '15.27665',
4: '15.04535'
},
'MFCCSum': {
0: 'MFCCSum',
1: '-37.300064',
2: '-19.675939',
3: '22.82507',
4: '-23.059305'
},
'MFCCAvg': {
0: 'MFCCAvg',
1: '-2.869235692',
2: '-1.513533769',
3: '1.755774615',
4: '-1.773792692'
},
'MFCCStd': {
0: 'MFCCStd',
1: '6.409842944',
2: '5.558499123',
3: '4.756836281',
4: '6.76039911'
},
'Energy': {
0: 'Energy',
1: '-2.96148',
2: '-3.522993',
3: '-3.409359',
4: '-2.235853'
},
'ZeroCrossings': {
0: 'ZeroCrossings',
1: '128',
2: '188',
3: '43',
4: '288'
},
'SpecCentroid': {
0: 'SpecCentroid',
1: '284.0513',
2: '414.8489',
3: '102.2096',
4: '405.1262'
},
'SpecSpread': {
0: 'SpecSpread',
1: '207.5526',
2: '350.7937',
3: '53.52178',
4: '360.0353'
},
'Rolloff': {
0: 'Rolloff',
1: '263.7817',
2: '783.2703',
3: '129.1992',
4: '912.4695'
},
'Flux': {
0: 'Flux',
1: '0',
2: '0',
3: '0',
4: '0'
},
'bandsCoefMin': {
0: 'bandsCoefMin',
1: '-0.224957',
2: '-0.247903',
3: '-0.22283',
4: '-0.232534'
},
'bandsCoefMax': {
0: 'bandsCoefMax',
1: '-0.074945',
2: '-0.113654',
3: '-0.062254',
4: '-0.080883'
},
'bandsCoefSum1': {
0: 'bandsCoefSum1',
1: '-5.575428',
2: '-5.524777',
3: '-5.511125',
4: '-5.532536'
},
'bandsCoefAvg': {
0: 'bandsCoefAvg',
1: '-0.168952364',
2: '-0.167417485',
3: '-0.167003788',
4: '-0.167652606'
},
'bandsCoefStd': {
0: 'bandsCoefStd',
1: '0.042580181',
2: '0.048429973',
3: '0.049881374',
4: '0.0475839'
},
'bandsCoefSum': {
0: 'bandsCoefSum',
1: '382.5963',
2: '360.9232',
3: '384.3541',
4: '368.9903'
},
'prjmin': {
0: 'prjmin',
1: '-0.999362',
2: '-0.999719',
3: '-0.988315',
4: '-0.999421'
},
'prjmax': {
0: 'prjmax',
1: '0.023797',
2: '0.009596',
3: '0.028112',
4: '0.024612'
},
'prjSum': {
0: 'prjSum',
1: '-0.99911',
2: '-1.006792',
3: '-1.084054',
4: '-1.002478'
},
'prjAvg': {
0: 'prjAvg',
1: '-0.030276061',
2: '-0.030508848',
3: '-0.032850121',
4: '-0.030378121'
},
'prjStd': {
0: 'prjStd',
1: '0.174082468',
2: '0.174040569',
3: '0.173600498',
4: '0.174064118'
},
'LogAttackTime': {
0: 'LogAttackTime',
1: '0.365883',
2: '-0.35427',
3: '-0.669283',
4: '-0.026181'
},
'HamoPkMin': {
0: 'HamoPkMin',
1: '0',
2: '0',
3: '0',
4: '0'
},
'HamoPkMax': {
0: 'HamoPkMax',
1: '1.025473',
2: '1.05761',
3: '0.986766',
4: '0.957316'
},
'HamoPkSum': {
0: 'HamoPkSum',
1: '14.391206',
2: '20.306125',
3: '9.727358',
4: '14.772449'
},
'HamoPkAvg': {
0: 'HamoPkAvg',
1: '0.513971643',
2: '0.72521875',
3: '0.347405643',
4: '0.527587464'
},
'HamoPkStd': {
0: 'HamoPkStd',
1: '0.376622124',
2: '0.325929503',
3: '0.388971641',
4: '0.381693476'
},
'class1': {
0: 'class1',
1: 'aerophone',
2: 'aerophone',
3: 'chordophone',
4: 'aerophone'
},
'class2': {
0: 'class2',
1: 'aero_single-reed',
2: 'aero_lip-vibrated',
3: 'chrd_simple',
4: 'aero_single-reed'
},
'mix1_instrument': {
0: 'mix1_instrument',
1: 'Saxophone',
2: 'Trumpet',
3: 'Piano',
4: 'Clarinet'
}
}
Dank
sollten Sie die erste Zeile loswerden, weil es Spaltennamen dupliziert ... – MaxU