Data Reader¶
-
class
ddf_library.bases.data_reader.
DataReader
¶ Bases:
object
-
static
csv
(filepath, num_of_parts='*', schema='str', sep=', ', header=True, delimiter=None, na_filter=True, usecols=None, prefix=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, na_values=None, keep_default_na=True, skip_blank_lines=True, parse_dates=False, decimal='.', dayfirst=False, thousands=None, quotechar='"', doublequote=True, escapechar=None, comment=None, encoding='utf-8', error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, float_precision=None)¶ Reads a csv file.
Parameters: - filepath –
- num_of_parts –
- schema –
- sep –
- header –
- delimiter –
- na_filter –
- usecols –
- prefix –
- engine –
- converters –
- true_values –
- false_values –
- skipinitialspace –
- na_values –
- keep_default_na –
- skip_blank_lines –
- parse_dates –
- decimal –
- dayfirst –
- thousands –
- quotechar –
- doublequote –
- escapechar –
- comment –
- encoding –
- error_bad_lines –
- warn_bad_lines –
- delim_whitespace –
- float_precision –
Returns:
-
static
json
(filepath, num_of_parts='*', schema='str', precise_float=False, encoding='utf-8')¶ Reads a json file.
Parameters: - filepath –
- num_of_parts –
- schema –
- precise_float –
- encoding –
Returns:
-
static
parquet
(filepath, num_of_parts='*', columns=None)¶ Reads a parquet file.
Parameters: - filepath –
- num_of_parts –
- columns –
Returns:
-
static
shapefile
(shp_path, dbf_path, polygon='points', attributes=None, num_of_parts='*', schema='str')¶ Reads a shapefile using the shp and dbf file.
Parameters: - shp_path – Path to the shapefile (.shp)
- dbf_path – Path to the shapefile (.dbf)
- polygon – Alias to the new column to store the polygon coordinates (default, ‘points’);
- attributes – List of attributes to keep in the DataFrame, empty to use all fields;
- schema – ‘infer’ to infer schema, otherwise, provide the dtype
- num_of_parts – number of partitions (default, ‘*’ meaning all cores available in master CPU);
Returns: DDF
Example: >>> ddf1 = COMPSsContext() >>> .read.shapefile(shp_path='hdfs://localhost:9000/shapefile.shp', >>> dbf_path='hdfs://localhost:9000/shapefile.dbf')
-
static