Meine JSON-Datei hat viele Zeilen und jede Zeile sieht so aus.Spark - Parse JSON-Datei, die zusätzlichen Text enthält
Mon Jan 20 00:00:00 -0800 2014, {"cl":"js","ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36","ip":"76.4.253.137","cc":"US","rg":"NV","ct":"North Las Vegas","pc":"89084","mc":839,"bf":"402d6c3bdd18e5b5f6541a98a01ecc47d698420d","vst":"0e1c96ff-1f4a-4279-bfdc-ba3fe51c2a4e","lt":"Sun Jan 19 23:59:59 -0800 2014","hk":["memba","alyson stoner","memba them","member them","member them 80s","missy elliotts","www.tmzmembathem","80s memba then","missy elliott","mini"]},
/zusätzlicher Platz zum Zweck der Klarheit/
{"v":"1.1","pv":"7963ee21-0d09-4924-b315-ced4adad425f","r":"v3","t":"tmzdtcom","a":[{"i":15,"u":"ll-media.tmz.com/2012/10/03/100312-alyson-stoner-then-480w.jpg","w":523,"h":480,"x":503,"y":651,"lt":"none","af":false}],"rf":"http://www.zergnet.com/news/128786/stars-whove-changed-a-lot-since-you-last-saw-them","p":"www.tmz.com/photos/2007/12/20/740-memba-them/images/2012/10/03/100312-alyson-stoner-then-jpg/","fs":true,"tr":0.7,"ac":{},"vp":{"ii":false,"w":1915,"h":1102},"sc":{"w":1920,"h":1200,"d":1},"pid":239,"vid":1,"ss":"0.5"}
versuchte ich folgende:
Methode 1:
val value1 = sc.textFile(filename).map(_.substring(32))
val df = sqlContext.read.json(value1)
Hier versuche ich, den Text weglassen w hich ist am Anfang der Zeile. In diesem Fall bekomme ich nur das erste Json-Objekt von jeder Zeile.
Das heißt:
{"cl":"js","ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36","ip":"76.4.253.137","cc":"US","rg":"NV","ct":"North Las Vegas","pc":"89084","mc":839,"bf":"402d6c3bdd18e5b5f6541a98a01ecc47d698420d","vst":"0e1c96ff-1f4a-4279-bfdc-ba3fe51c2a4e","lt":"Sun Jan 19 23:59:59 -0800 2014","hk":["memba","alyson stoner","memba them","member them","member them 80s","missy elliotts","www.tmzmembathem","80s memba then","missy elliott","mini"]}
Methode 2:
val df = sqlContext.read.json(sc.wholeTextFiles(filename).values)
In diesem Fall erhalte ich nur die Ausgabe als korrupt Rekord.
Können Sie mir bitte sagen, was ist das Problem hier und wie diese Art von Datei zu analysieren?
Darf ich wissen, was es tut? –
Ich empfehle, neue Dinge in der Konsole oder in Ihrem tatsächlichen Code zu versuchen, um ein Gefühl dafür zu bekommen, wenn Sie lernen - oder zumindest das Scaladoc lesen - aber ich habe die Antwort aktualisiert. – Vidya