api2db.ingest.data_feature package¶
Submodules¶
api2db.ingest.data_feature.feature module¶
Contains the Feature class¶
Summary of Feature Usage:¶
data = [{"id": 1, "name": "Foo", "nest0": {"nest1": {"x": True}, "y": 14.3 } }, ... ]
data_features = [
Feature(key="uuid", lam=lambda x: x["id"], dtype=int), # Extracts "id" and rename it to "uuid"
Feature(key="name", lam=lambda x: x["name"], dtype=str), # Will extract "name" keeping the key as "name"
Feature(key="x", lam=lambda x: x["nest0"]["nest1"]["x"], dtype=bool), # Will extract "x"
Feature(key="y", lam=lambda x: x["nest0"]["y"], dtype=bool) # Will extract "y"
]
-
class
api2db.ingest.data_feature.feature.
Feature
(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)¶ Bases:
api2db.ingest.base_lam.BaseLam
Used to extract a data-feature from incoming data
-
__init__
(key: str, lam: Callable[[dict], Any], dtype: Any, nan_int: Optional[int] = None, nan_float: Optional[float] = None, nan_bool: Optional[bool] = False, nan_str: Optional[str] = None)¶ Creates a Feature object
Note
All values default to nulling the data that cannot be type-casted to its expected type. For the majority of instances this is going to be the programmers desired effect. If there is a way to make it so that the data can be cleaned in order to prevent it from being nulled, that should be done using the libraries pre-processing tools. If the data cannot be cleaned in pre-processing and it cannot be type-casted to its expected type, then it is by definition worthless. If it is possible to clean it, it can be cleaned in pre-processing, although it may require the programmer to subclass
Pre
- Parameters
key – The name of the column that will be stored in the storage target
lam – Function that takes as parameter a dictionary, and returns where the data the programmer wants should be. api2db handles null data and unexpected data types automatically
dtype – The python native type of the data feature
nan_int – If specified and
dtype
isint
this value will be used to replace null values and values that fail to be casted to typeint
nan_float – If specified and
dtype
isfloat
this value will be used to replace null values and values that fail to be casted to typefloat
nan_bool – If specified and
dtype
isbool
this value will be used to replace null values and values that fail to be casted to typebool
nan_str – If specified and
dtype
isstr
this value will be used to replace null values and values that fail to be casted to typestr
-
lam_wrap
(data: dict) → Any¶ Overrides super class method
Extracts a feature from incoming data
Workflow:
Attempt to call
lam
on data to get data-featureAttempt to typecast result to
dtype
If
dtype
isstr
and the result.lower() is “none”, “nan”, “null”, or “nil” replace it withnan_str
If an exception occurs when attempting any of the above, set the result to None
Return the result
- Parameters
data – A dictionary of incoming data representing a single row in a DataFrame
- Returns
The extracted data-feature
-
Module contents¶
Original Author |
Tristen Harr |
Creation Date |
04/29/2021 |
Revisions |
None |