2017-08-16 1 views
2

Ich habe einen Array-Typ in meinem Dataset müssen in String-Typ konvertieren. Ich habe es auf konventionelle Art versucht. Ich denke, wir können es besser machen. Kannst du mich bitte führen? Eingang Dataset1So konvertieren Sie Array-Typ von Datensätzen in String-Typ in Apache Spark Java

+---------------------------+-----------+-------------------------------------------------------------------------------------------------+ 
    ManufacturerSource   |upcSource |productDescriptionSource                   |                                           | 
    +---------------------------+-----------+-------------------------------------------------------------------------------------------------+ 
    |3M       |51115665883|[c, gdg, whl, t27, 5, x, 1, 4, x, 7, 8, grindig, flap, wheels, 36, grit, 12, 250, rpm]   |                                           | 
    |3M       |51115665937|[c, gdg, whl, t27, q, c, 6, x, 1, 4, x, 5, 8, 11, grinding, flap, wheels, 36, grit, 10, 200, rpm]|                                            | 
    |3M       |0   |[3mite, rb, cloth, 3, x, 2, wd]                 |                                            | 
    |3M       |0   |[trizact, disc, cloth, 237aaa16x5, hole]               |                                            | 
    ------------------------------------------------------------------------------------------------------------------------------------------- 

Erwarteter Output DataSet

 +---------------------------+-----------+---------------------------------------------------------------------------------------------------| 
    |ManufacturerSource   |upcSource |productDescriptionSource                   |                                           | 
    +---------------------------+-----------+---------------------------------------------------------------------------------------------------| 
    |3M       |51115665883|c gdg whl t27 5 x 1 4 x 7 8 grinding flap wheels 36 grit 12 250 rpm    |    |                                       | 
    |3M       |51115665937|c gdg whl t27 q c 6 x 1 4 x 5 8 11 grinding flap wheels 36 grit 10 200 rpm       |                                          | 
    |3M       |0   |3mite rb cloth 3 x 2 wd                  |                                           | 
    |3M       |0   |trizact disc cloth 237aaa16x5 hole                |                                           | 
    +-------------------------------------------------------------------------------------------------------------------------------------------| 

konventioneller Ansatz 1

 Dataset<Row> afterstopwordsRemoved = 
     stopwordsRemoved.select("productDescriptionSource"); 
      stopwordsRemoved.show(); 

     List<Row> individaulRows= afterstopwordsRemoved.collectAsList(); 

     System.out.println("After flatmap\n"); 
     List<String> temp; 
     for(Row individaulRow:individaulRows){ 
     temp=individaulRow.getList(0); 
     System.out.println(String.join(" ",temp)); 
     } 

Ansatz2 (nicht nachgebend result)

Ausnahme: Fehlgeschlagen benutzerdefinierte Funktion ($ auszuführen anonfun $ 27: (Array) => String)

 UDF1 untoken = new UDF1<String,String[]>() { 
     public String call(String[] token) throws Exception { 
      //return types.replaceAll("[^a-zA-Z0-9\\s+]", ""); 
      return Arrays.toString(token); 
     } 

     @Override 
     public String[] call(String t1) throws Exception { 
      // TODO Auto-generated method stub 
      return null; 
     } 
    }; 

    sqlContext.udf().register("unTokenize", untoken, DataTypes.StringType); 

    source.createOrReplaceTempView("DataSetOfTokenize"); 
    Dataset<Row> newDF = sqlContext.sql("select *,unTokenize(productDescriptionSource)FROM DataSetOfTokenize"); 
    newDF.show(4000,false); 

Antwort

2

ich concat_ws verwenden würde:

sqlContext.sql("select *, concat_ws(' ', productDescriptionSource) FROM DataSetOfTokenize"); 

oder:

import static org.apache.spark.sql.functions.*; 

df.withColumn("foo", collect_ws(" ", col("productDescriptionSource"))); 
+0

Dank für die Wiedergabe seiner Arbeits ... –

Verwandte Themen