Data Reader

class ddf_library.bases.data_reader.DataReader

Bases: object

static csv(filepath, num_of_parts='*', schema='str', sep=', ', header=True, delimiter=None, na_filter=True, usecols=None, prefix=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, na_values=None, keep_default_na=True, skip_blank_lines=True, parse_dates=False, decimal='.', dayfirst=False, thousands=None, quotechar='"', doublequote=True, escapechar=None, comment=None, encoding='utf-8', error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, float_precision=None)

Reads a csv file.

Parameters:
  • filepath
  • num_of_parts
  • schema
  • sep
  • header
  • delimiter
  • na_filter
  • usecols
  • prefix
  • engine
  • converters
  • true_values
  • false_values
  • skipinitialspace
  • na_values
  • keep_default_na
  • skip_blank_lines
  • parse_dates
  • decimal
  • dayfirst
  • thousands
  • quotechar
  • doublequote
  • escapechar
  • comment
  • encoding
  • error_bad_lines
  • warn_bad_lines
  • delim_whitespace
  • float_precision
Returns:

static json(filepath, num_of_parts='*', schema='str', precise_float=False, encoding='utf-8')

Reads a json file.

Parameters:
  • filepath
  • num_of_parts
  • schema
  • precise_float
  • encoding
Returns:

static parquet(filepath, num_of_parts='*', columns=None)

Reads a parquet file.

Parameters:
  • filepath
  • num_of_parts
  • columns
Returns:

static shapefile(shp_path, dbf_path, polygon='points', attributes=None, num_of_parts='*', schema='str')

Reads a shapefile using the shp and dbf file.

Parameters:
  • shp_path – Path to the shapefile (.shp)
  • dbf_path – Path to the shapefile (.dbf)
  • polygon – Alias to the new column to store the polygon coordinates (default, ‘points’);
  • attributes – List of attributes to keep in the DataFrame, empty to use all fields;
  • schema – ‘infer’ to infer schema, otherwise, provide the dtype
  • num_of_parts – number of partitions (default, ‘*’ meaning all cores available in master CPU);
Returns:

DDF

Example:
>>> ddf1 = COMPSsContext()        >>> .read.shapefile(shp_path='hdfs://localhost:9000/shapefile.shp',
>>>                 dbf_path='hdfs://localhost:9000/shapefile.dbf')