2016-11-07 1 views
6

Ich versuche, etwas sehr einfaches zu tun, aber ich kann nicht glauben, dass es nicht funktioniert ... Ich vermisse wahrscheinlich etwas sehr offensichtlich. Bitte helfen Sie.kann nicht xyz gegebene Eingabespalten Fehler beim Erstellen von Spark-Datenmenge

Ziel: Lesen Sie die Iris Datensatz (CSV-Datei, kein Header) in einen Datensatz

Code:

case class Iris(sepalWidth: Double, sepalLength: Double, petalWidth: Double, petalLength: Double, irisClass: String) 
val ds = spark.read.format("csv").option("inferSchema", true).load("/home/ec2-user/spark-2.0.1-bin-hadoop2.7/tkdata/iris.data").as[Iris] 

Fehler:

org.apache.spark.sql.AnalysisException: cannot resolve '`sepalWidth`' given input columns: [_c1, _c3, _c0, _c4, _c2]; 
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) 
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) 
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) 
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:350) 
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) 
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) 
    at scala.collection.immutable.List.foreach(List.scala:381) 
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) 
    at scala.collection.immutable.List.map(List.scala:285) 
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:348) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298) 
    at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:190) 
    at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:200) 
    at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$5.apply(QueryPlan.scala:209) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) 
    at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:209) 
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) 
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) 
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) 
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) 
    at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) 
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:245) 
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:210) 
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) 
    at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) 
    at org.apache.spark.sql.Dataset.as(Dataset.scala:359) 
    ... 54 elided 

Hier ist, was die Datendatei aussieht wie:

$ head iris.data 
5.1,3.5,1.4,0.2,Iris-setosa 
4.9,3.0,1.4,0.2,Iris-setosa 
4.7,3.2,1.3,0.2,Iris-setosa 
4.6,3.1,1.5,0.2,Iris-setosa 
5.0,3.6,1.4,0.2,Iris-setosa 
5.4,3.9,1.7,0.4,Iris-setosa 
4.6,3.4,1.4,0.3,Iris-setosa 
5.0,3.4,1.5,0.2,Iris-setosa 
4.4,2.9,1.4,0.2,Iris-setosa 
4.9,3.1,1.5,0.1,Iris-setosa 

Antwort

4

Typen und Namen müssen übereinstimmen. Versuchen Sie:

spark.read.format("csv").option("inferSchema", true).load(...) 
    .toDF("sepalWidth", "sepalLength", "petalWidth", "petalLength", "irisClass") 
    .as[Iris] 
1

Sie können es als Textdatei lesen und jede Eingabe zuordnen und in Dataset konvertieren.

Wie,

case class Iris(sepalWidth: Double, sepalLength: Double, 
       petalWidth: Double,petalLength: Double, irisClass: String) 
val ds = spark.textFile("/home/ec2-user/spark-2.0.1-bin-hadoop2.7/tkdata/iris.data") 
.map(_.split(",")) 
.map(t =>Iris(t(0).toDouble,t(1).toDouble,t(2).toDouble,t(3).toDouble,t(4))).toDS() 
Verwandte Themen