Randomness of hash of string should be disabled via PYTHONHASHSEED

Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/env/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/usr/local/env/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/env/spark-client/python/lib/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream
for obj in iterator:
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1703, in add_shuffle_key
File "/usr/local/env/spark-client/python/lib/pyspark.zip/pyspark/rdd.py", line 74, in portable_hash
raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED")
Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more

python3和spark1.6.2 解决办法

进入Spark的所有slave节点执行

1
2
echo "export PYTHONHASHSEED=0" >> /root/.bashrc
source /root/.bashrc

重启slave

1
2
/sbin/stop-slaves.sh
/sbin/start-slaves.sh
邵志鹏 wechat
扫一扫上面的二维码关注我的公众号
0%