api2db.ingest.data_feature package

Submodules

api2db.ingest.data_feature.feature module

Contains the Feature class

Summary of Feature Usage:

data = [{"id": 1, "name": "Foo", "nest0": {"nest1": {"x": True}, "y": 14.3 } }, ... ]
data_features = [

    Feature(key="uuid", lam=lambda x: x["id"], dtype=int),      # Extracts "id" and rename it to "uuid"

    Feature(key="name", lam=lambda x: x["name"], dtype=str),    # Will extract "name" keeping the key as "name"

    Feature(key="x", lam=lambda x: x["nest0"]["nest1"]["x"], dtype=bool),   # Will extract "x"

    Feature(key="y", lam=lambda x: x["nest0"]["y"], dtype=bool)             # Will extract "y"
]
class api2db.ingest.data_feature.feature.Feature(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)

Bases: api2db.ingest.base_lam.BaseLam

Used to extract a data-feature from incoming data

__init__(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)

Creates a Feature object

Note

All values default to nulling the data that cannot be type-casted to its expected type. For the majority of instances this is going to be the programmers desired effect. If there is a way to make it so that the data can be cleaned in order to prevent it from being nulled, that should be done using the libraries pre-processing tools. If the data cannot be cleaned in pre-processing and it cannot be type-casted to its expected type, then it is by definition worthless. If it is possible to clean it, it can be cleaned in pre-processing, although it may require the programmer to subclass Pre

Parameters
  • key – The name of the column that will be stored in the storage target

  • lam – Function that takes as parameter a dictionary, and returns where the data the programmer wants should be. api2db handles null data and unexpected data types automatically

  • dtype – The python native type of the data feature

  • nan_int – If specified and dtype is int this value will be used to replace null values and values that fail to be casted to type int

  • nan_float – If specified and dtype is float this value will be used to replace null values and values that fail to be casted to type float

  • nan_bool – If specified and dtype is bool this value will be used to replace null values and values that fail to be casted to type bool

  • nan_str – If specified and dtype is str this value will be used to replace null values and values that fail to be casted to type str

lam_wrap(data: dict)Any

Overrides super class method

Extracts a feature from incoming data

Workflow:

  1. Attempt to call lam on data to get data-feature

  2. Attempt to typecast result to dtype

  3. If dtype is str and the result.lower() is “none”, “nan”, “null”, or “nil” replace it with nan_str

  4. If an exception occurs when attempting any of the above, set the result to None

  5. Return the result

Parameters

data – A dictionary of incoming data representing a single row in a DataFrame

Returns

The extracted data-feature

Module contents

Original Author

Tristen Harr

Creation Date

04/29/2021

Revisions

None