2017-05-26 3 views
6

Ich benutze Maven mit Scala Archetype. Ich erhalte diesen Fehler:

“value $ is not a member of StringContext”

Ich habe bereits versucht, einige Dinge in pom.xml hinzufügen, aber nichts funktionierte sehr gut ...

Mein Code:

import org.apache.spark.ml.evaluation.RegressionEvaluator 
import org.apache.spark.ml.regression.LinearRegression 
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit} 
// To see less warnings 
import org.apache.log4j._ 
Logger.getLogger("org").setLevel(Level.ERROR) 


// Start a simple Spark Session 
import org.apache.spark.sql.SparkSession 
val spark = SparkSession.builder().getOrCreate() 

// Prepare training and test data. 
val data = spark.read.option("header","true").option("inferSchema","true").format("csv").load("USA_Housing.csv") 

// Check out the Data 
data.printSchema() 

// See an example of what the data looks like 
// by printing out a Row 
val colnames = data.columns 
val firstrow = data.head(1)(0) 
println("\n") 
println("Example Data Row") 
for(ind <- Range(1,colnames.length)){ 
    println(colnames(ind)) 
    println(firstrow(ind)) 
    println("\n") 
} 

//////////////////////////////////////////////////// 
//// Setting Up DataFrame for Machine Learning //// 
////////////////////////////////////////////////// 

// A few things we need to do before Spark can accept the data! 
// It needs to be in the form of two columns 
// ("label","features") 

// This will allow us to join multiple feature columns 
// into a single column of an array of feautre values 
import org.apache.spark.ml.feature.VectorAssembler 
import org.apache.spark.ml.linalg.Vectors 

// Rename Price to label column for naming convention. 
// Grab only numerical columns from the data 
val df = data.select(data("Price").as("label"),$"Avg Area Income",$"Avg Area House Age",$"Avg Area Number of Rooms",$"Area Population") 

// An assembler converts the input values to a vector 
// A vector is what the ML algorithm reads to train a model 

// Set the input columns from which we are supposed to read the values 
// Set the name of the column where the vector will be stored 
val assembler = new VectorAssembler().setInputCols(Array("Avg Area Income","Avg Area House Age","Avg Area Number of Rooms","Area Population")).setOutputCol("features") 

// Use the assembler to transform our DataFrame to the two columns 
val output = assembler.transform(df).select($"label",$"features") 


// Create a Linear Regression Model object 
val lr = new LinearRegression() 

// Fit the model to the data 

// Note: Later we will see why we should split 
// the data first, but for now we will fit to all the data. 
val lrModel = lr.fit(output) 

// Print the coefficients and intercept for linear regression 
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") 

// Summarize the model over the training set and print out some metrics! 
// Explore this in the spark-shell for more methods to call 
val trainingSummary = lrModel.summary 

println(s"numIterations: ${trainingSummary.totalIterations}") 
println(s"objectiveHistory: ${trainingSummary.objectiveHistory.toList}") 

trainingSummary.residuals.show() 

println(s"RMSE: ${trainingSummary.rootMeanSquaredError}") 
println(s"MSE: ${trainingSummary.meanSquaredError}") 
println(s"r2: ${trainingSummary.r2}") 

und meine pom.xml ist das:

<project xmlns="http://maven.apache.org/POM/4.0.0" 

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
    <groupId>test</groupId> 
    <artifactId>outrotest</artifactId> 
    <version>1.0-SNAPSHOT</version> 
    <name>${project.artifactId}</name> 
    <description>My wonderfull scala app</description> 
    <inceptionYear>2015</inceptionYear> 
    <licenses> 
    <license> 
     <name>My License</name> 
     <url>http://....</url> 
     <distribution>repo</distribution> 
    </license> 
    </licenses> 

    <properties> 
    <maven.compiler.source>1.6</maven.compiler.source> 
    <maven.compiler.target>1.6</maven.compiler.target> 
    <encoding>UTF-8</encoding> 
    <scala.version>2.11.5</scala.version> 
    <scala.compat.version>2.11</scala.compat.version> 
    </properties> 

    <dependencies> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-library</artifactId> 
     <version>${scala.version}</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.11</artifactId> 
     <version>2.0.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.0.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.11</artifactId> 
     <version>2.0.2</version> 
    </dependency> 
    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-csv_2.11</artifactId> 
     <version>1.5.0</version> 
    </dependency> 

    <!-- Test --> 
    <dependency> 
     <groupId>junit</groupId> 
     <artifactId>junit</artifactId> 
     <version>4.11</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.specs2</groupId> 
     <artifactId>specs2-junit_${scala.compat.version}</artifactId> 
     <version>2.4.16</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.specs2</groupId> 
     <artifactId>specs2-core_${scala.compat.version}</artifactId> 
     <version>2.4.16</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.scalatest</groupId> 
     <artifactId>scalatest_${scala.compat.version}</artifactId> 
     <version>2.2.4</version> 
     <scope>test</scope> 
    </dependency> 
    </dependencies> 

    <build> 
    <sourceDirectory>src/main/scala</sourceDirectory> 
    <testSourceDirectory>src/test/scala</testSourceDirectory> 
    <plugins> 
     <plugin> 
     <!-- see http://davidb.github.com/scala-maven-plugin --> 
     <groupId>net.alchim31.maven</groupId> 
     <artifactId>scala-maven-plugin</artifactId> 
     <version>3.2.0</version> 
     <executions> 
      <execution> 
      <goals> 
       <goal>compile</goal> 
       <goal>testCompile</goal> 
      </goals> 
      <configuration> 
       <args> 
       <!--<arg>-make:transitive</arg>--> 
       <arg>-dependencyfile</arg> 
       <arg>${project.build.directory}/.scala_dependencies</arg> 
       </args> 
      </configuration> 
      </execution> 
     </executions> 
     </plugin> 
     <plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-surefire-plugin</artifactId> 
     <version>2.18.1</version> 
     <configuration> 
      <useFile>false</useFile> 
      <disableXmlReport>true</disableXmlReport> 
      <!-- If you have classpath issue like NoDefClassError,... --> 
      <!-- useManifestOnlyJar>false</useManifestOnlyJar --> 
      <includes> 
      <include>**/*Test.*</include> 
      <include>**/*Suite.*</include> 
      </includes> 
     </configuration> 
     </plugin> 
    </plugins> 
    </build> 
</project> 

Ich habe keine Ahnung, wie man es beheben kann. Hat jemand eine Idee?

+1

hast du versucht, 'import sqlContext.implicits._' zu importieren? –

+0

ja, aber es funktioniert nicht. Es wird mit dem gleichen Fehler fortgesetzt: "value $ ist kein Mitglied von StringContext" – Thaise

+0

Sie müssen den Spark-CSV aus Ihrer pom.xml entfernen, da dies zu einem Laufzeitfehler führt – eliasah

Antwort

17

Fügen Sie diese .. es funktioniert

val spark = SparkSession.builder().getOrCreate()  
import spark.implicits._ // << add this 
+0

Hey @Thaise, pl markiere meine Antwort als die empfohlene –

5

können Sie die col Funktion verwenden, anstatt es einfach so importieren:

import org.apache.spark.sql.functions.col 

Und dann ändern Sie die $"column" zu col("column")

Hoffnung es hilft

0

@A purva Antwort arbeitete zunächst für mich, dass der Fehler von IntelliJ verschwunden, aber es führte dann in "Could not find implicit value for spark" während sbt compile Phase

fand ich ein Work-around durch den Import spark.implicits._ von SparkSession von Datenrahmen verwiesen statt eines, das durch getOrCreate

import df.sparkSession.implicits._ 

wo df ein DataFrame

Dies könnte sein, weil mein Code in einem case class platziert wurde, die einen implicit val spark: SparkSession Parameter empfangen werden; aber ich bin nicht wirklich sicher, warum dieser Fix für mich funktioniert

Verwandte Themen