Ich versuche, Daten mit Seaborn zu visualisieren. Ich habe einen Datenrahmen mit SQLContext in pyspark erstellt. Wenn ich jedoch lmplot aufruft, führt dies zu einem Fehler. Ich bin mir nicht sicher, was ich vermisse. Unten ist mein Code (Ich benutze Jupyter Notebook):Fehler bei der Verwendung von Seaborn in Jupyter Notebook (pyspark)
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.load('file:///home/cloudera/Downloads/WA_Sales_Products_2012-14.csv',
format='com.databricks.spark.csv',
header='true',inferSchema='true')
sns.lmplot(x='Quantity', y='Year', data=df)
Error trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-86-2a2b43993475> in <module>()
----> 2 sns.lmplot(x='Quantity', y='Year', data=df)
/home/cloudera/anaconda3/lib/python3.5/site-packages/seaborn/regression.py in lmplot(x, y, data, hue, col, row, palette, col_wrap, size, aspect, markers, sharex, sharey, hue_order, col_order, row_order, legend, legend_out, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, x_jitter, y_jitter, scatter_kws, line_kws)
557 hue_order=hue_order, size=size, aspect=aspect,
558 col_wrap=col_wrap, sharex=sharex, sharey=sharey,
--> 559 legend_out=legend_out)
560
561 # Add the markers here as FacetGrid has figured out how many levels of the
/home/cloudera/anaconda3/lib/python3.5/site-packages/seaborn/axisgrid.py in __init__(self, data, row, col, hue, col_wrap, sharex, sharey, size, aspect, palette, row_order, col_order, hue_order, hue_kws, dropna, legend_out, despine, margin_titles, xlim, ylim, subplot_kws, gridspec_kws)
255 # Make a boolean mask that is True anywhere there is an NA
256 # value in one of the faceting variables, but only if dropna is True
--> 257 none_na = np.zeros(len(data), np.bool)
258 if dropna:
259 row_na = none_na if row is None else data[row].isnull()
TypeError: object of type 'DataFrame' has no len()
Jede Hilfe oder Zeiger ist willkommen. Vielen Dank im Voraus :-)