DDF vs PySpark DataFrame

Some DDF functions have interfaces similar to the PySpark DataFrame to help new users who want to migrate to the COMPSs.

The following tables show some of these correspondences.

ETL

PySpark DataFrame DDF
parallelize parallelize
map map
cache cache
count count
describe describe
subtract subtract
drop drop
dropna dropna
dropDuplicates drop_duplicates
fillna fillna
filter filter
groupBy.agg groupBy.agg
intersect intersect
intersectAll intersect_all
join join
randomSplit split
replace replace
sample sample
select select
show show
sort sort
take take
toDF toDF
union union
unionByName union_by_name
withColumnRenamed with_column_renamed
read.text load_text
write save

Machine Learning

PySpark DataFrame DDF
VectorAssembler VectorAssembler
VectorSlicer VectorSlicer
NGram NGram
TF-IDF TF-IDF
CountVectorizer CountVectorizer
Tokenizer Tokenizer
StopWordsRemover RemoveStopWords
PCA PCA
StringIndexer StringIndexer
IndexToString IndexToString
StandardScaler StandardScaler
MaxAbsScaler MaxAbsScaler
MinMaxScaler MinMaxScaler
SVMWithSGD SVM
LogisticRegressionWithSGD LogisticRegression
NaiveBayes Gaussian Naive Bayes
LinearRegressionWithSGD LinearRegression
K-means K-Means
AssociationRules AssociationRules