Ich versuche Selfjoin mit DataFrame Scala API durchzuführen. Hier sind meine Code-Schnipsel; Können Sie mir bitte sagen, was mit der ersten Lösung nicht stimmt?Spark 1.6.0 DataFrame Selfjoin Problem
val df = sqlc.read.json ("empMgr.json");
empMgr.json
{ "ID": 101, "ename": "Peter", "sal": 24.24, "Abteilung": "11", "Land": "US", "doj": "12.01.2017", "mgr": 201} {"ID": 201, "ename": "Johannes", "sal": 1300, "dept": "232", "country ":" IN "," doj ":" 22.04.2016 "," mgr ": 111} {" ID ": 301," ename ":" Sam "," dept ":" 22 "," Land " ":" KR“, "doj": "2015.05.22", "Mgr": 201}
// 1. following is not working
var df_right=df;
df.join(df_right, df("mgr") === df_right("ID")).show()
df.join(df, df("mgr") === df("ID")).show()
/*
* Output:
* +---+-------+----+---+-----+---+---+---+-------+----+---+-----+---+---+
| ID|country|dept|doj|ename|mgr|sal| ID|country|dept|doj|ename|mgr|sal|
+---+-------+----+---+-----+---+---+---+-------+----+---+-----+---+---+
+---+-------+----+---+-----+---+---+---+-------+----+---+-----+---+---+
* */
//2. following works fine
df_right= sqlc.read.json("file:///opt/data/empMgr.json");
df.join(df_right, df("mgr") === df_right("ID")).show()
/*
*Output:
* +---+-------+----+---------+-----+---+-----+---+-------+----+---------+-----+---+------+
| ID|country|dept| doj|ename|mgr| sal| ID|country|dept| doj|ename|mgr| sal|
+---+-------+----+---------+-----+---+-----+---+-------+----+---------+-----+---+------+
|101| US| 11|1/12/2017|Peter|201|24.24|201| IN| 232|4/22/2016| John|111|1300.0|
|301| KR| 22|5/22/2015| Sam|201| null|201| IN| 232|4/22/2016| John|111|1300.0|
+---+-------+----+---------+-----+---+-----+---+-------+----+---------+-----+---+------+
* */
//3. following works fine
df.registerTempTable("empMgr")
sqlc.sql("select b.ename, a.ename as mgr,b.mgr from empMgr a join empMgr b on a.ID=b.mgr").show();
/*
* output
* +-----+----+---+
|ename| mgr|mgr|
+-----+----+---+
|Peter|John|201|
| Sam|John|201|
+-----+----+---+
* */
was ist Ihre Frage? Täusche ich mich oder gibt es in Punkt 1 eine zusätzliche Zeile, die nicht da sein soll? Bitte klären Sie. – Wilmerton