get_json_object使用的是堆外内存,默认堆外内存只有max( executorMemory * 0.10),可以考虑通过
—conf “spark.yarn.executor.memoryOverhead=4G” 设置堆外内存。
https://blog.csdn.net/weixin_43267534/article/details/100978755
有用的summary:https://www.cnblogs.com/tomato0906/articles/7291178.html
a 1 2 3
b 4 5 6
1 | val df = sc.parallelize(Seq((1, "zhengchu", "tt"), (2, "zc", null), (3, null, null))).toDF("x", "y","z") |