hivefans
9/8/2018 - 1:35 PM

PySpark serializer and deserializer testing with a nested and complicated value

PySpark serializer and deserializer testing with a nested and complicated value

Python =(parallelize)=> RDD =(collect)=> Python

It works well.

>>> sc = SparkContext('local', 'test', batchSize=2)
>>> data = [([1, 0], [0.5, 0.499]), ([0, 1], [0.5, 0.499])]
>>> rdd = sc.parallelize(data)
>>> rdd.collect()
[([1, 0], [0.5, 0.499]), ([0, 1], [0.5, 0.499])]
>>> sc.stop()

Python Obj =(_py2java)=> Java Obj =(_java2py)=> Python Obj

It works well.

>>> from pyspark.mllib.common import _py2java, _java2py
>>> sc = SparkContext('local', 'test', batchSize=2)
>>> data = [([1, 0], [0.5, 0.499]), ([0, 1], [0.5, 0.499])]
>>> jobj = _py2java(sc, data)
>>> _java2py(sc, jobj)
[([1, 0], [0.5, 0.499]), ([0, 1], [0.5, 0.499])]
>>> sc.stop()