api2db.ingest.data_feature package¶

Submodules¶

api2db.ingest.data_feature.feature module¶

Contains the Feature class¶

Summary of Feature Usage:¶

data = [{"id": 1, "name": "Foo", "nest0": {"nest1": {"x": True}, "y": 14.3 } }, ... ]
data_features = [

    Feature(key="uuid", lam=lambda x: x["id"], dtype=int),      # Extracts "id" and rename it to "uuid"

    Feature(key="name", lam=lambda x: x["name"], dtype=str),    # Will extract "name" keeping the key as "name"

    Feature(key="x", lam=lambda x: x["nest0"]["nest1"]["x"], dtype=bool),   # Will extract "x"

    Feature(key="y", lam=lambda x: x["nest0"]["y"], dtype=bool)             # Will extract "y"
]

class api2db.ingest.data_feature.feature.Feature(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)¶

Bases: api2db.ingest.base_lam.BaseLam

Used to extract a data-feature from incoming data

__init__(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)¶

Creates a Feature object

Note

All values default to nulling the data that cannot be type-casted to its expected type. For the majority of instances this is going to be the programmers desired effect. If there is a way to make it so that the data can be cleaned in order to prevent it from being nulled, that should be done using the libraries pre-processing tools. If the data cannot be cleaned in pre-processing and it cannot be type-casted to its expected type, then it is by definition worthless. If it is possible to clean it, it can be cleaned in pre-processing, although it may require the programmer to subclass Pre

Parameters

key – The name of the column that will be stored in the storage target
lam – Function that takes as parameter a dictionary, and returns where the data the programmer wants should be. api2db handles null data and unexpected data types automatically
dtype – The python native type of the data feature
nan_int – If specified and dtype is int this value will be used to replace null values and values that fail to be casted to type int
nan_float – If specified and dtype is float this value will be used to replace null values and values that fail to be casted to type float
nan_bool – If specified and dtype is bool this value will be used to replace null values and values that fail to be casted to type bool
nan_str – If specified and dtype is str this value will be used to replace null values and values that fail to be casted to type str

lam_wrap(data: dict) → Any¶

Overrides super class method

Extracts a feature from incoming data

Workflow:

Attempt to call lam on data to get data-feature

Attempt to typecast result to dtype

If dtype is str and the result.lower() is “none”, “nan”, “null”, or “nil” replace it with nan_str

If an exception occurs when attempting any of the above, set the result to None

Return the result

Parameters: data – A dictionary of incoming data representing a single row in a DataFrame
Returns: The extracted data-feature

Module contents¶

Original Author	Tristen Harr
Creation Date	04/29/2021
Revisions	None