api2db.ingest.post_process package¶
Submodules¶
api2db.ingest.post_process.column_add module¶
Contains the ColumnAdd class¶
Summary of ColumnAdd Usage:¶
DataFrame df
Foo |
Bar |
---|---|
1 |
A |
2 |
B |
3 |
C |
post = ColumnAdd(key="FooBar", lam=lambda: 5, dtype=int)
DataFrame df
Foo |
Bar |
FooBar |
---|---|---|
1 |
A |
5 |
2 |
B |
5 |
3 |
C |
5 |
Example Usage of ColumnAdd:¶
>>> import pandas as pd
... def f():
... return 5
... df = pd.DataFrame({"Foo": [1, 2, 3], "Bar": ["A", "B", "C"]}) # Setup
...
... post = ColumnAdd(key="timestamp", lam=lambda x: f, dtype=int)
... post.lam_wrap(df)
pd.DataFrame({"Foo": [1, 2, 3], "Bar": ["A", "B", "C"], "FooBar": [5, 5, 5]})
-
class
api2db.ingest.post_process.column_add.
ColumnAdd
(key: str, lam: Callable[], Any], dtype: Any)¶ Bases:
api2db.ingest.post_process.post.Post
Used to add global values to a DataFrame, primarily for timestamps/ids
-
__init__
(key: str, lam: Callable[], Any], dtype: Any)¶ Creates a ColumnAdd object
- Parameters
key – The column name for the DataFrame
lam – A function that returns the value that should be globally placed into the DataFrame in
key
columndtype – The python native type of the functions return
-
ctype
¶ type of the data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides super class method
Workflow:
Assign the
lam
function return tolam_arg[self.key]
Typecast
lam_arg[self.key]
todtype
Return
lam_arg
- Parameters
lam_arg – The DataFrame to add a column to
- Returns
The modified DataFrame
-
api2db.ingest.post_process.column_apply module¶
Contains the ColumnApply class¶
Summary of ColumnApply Usage:¶
DataFrame df
Foo |
Bar |
---|---|
1 |
A |
2 |
B |
3 |
C |
post = ColumnApply(key="Foo", lam=lambda x: x + 1, dtype=int)
DataFrame df
Foo |
Bar |
---|---|
2 |
A |
3 |
B |
4 |
C |
Example Usage of ColumnApply:¶
>>> import pandas as pd
... df = pd.DataFrame({"Foo": [1, 2, 3], "Bar": ["A", "B", "C"]}) # Setup
...
... post = ColumnApply(key="Foo", lam=lambda x: x + 1, dtype=int)
... post.lam_wrap(df)
pd.DataFrame({"Foo": [2, 3, 4], "Bar": ["A", "B", "C"]})
-
class
api2db.ingest.post_process.column_apply.
ColumnApply
(key: str, lam: Callable[[Any], Any], dtype: Any)¶ Bases:
api2db.ingest.post_process.post.Post
Used to apply a function across the rows in a column of a DataFrame
-
__init__
(key: str, lam: Callable[[Any], Any], dtype: Any)¶ Creates a ColumnApply Object
- Parameters
key – The column to apply the function to
lam – The function to apply
dtype – The python native type of the function output
-
ctype
¶ type of data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides a super class method
Workflow:
Apply
lam
tolam_arg[self.key]
Cast
lam_arg[self.key]
todtype
Return
lam_arg
- Parameters
lam_arg – The DataFrame to modify
- Returns
The modified DataFrame
-
api2db.ingest.post_process.columns_calculate module¶
Contains the ColumnsCalculate class¶
Note
ColumnsCalculate can be used to
Replace columns in a DataFrame with calculated values
Add new columns to a DataFrame based on calculations from existing columns
Summary of ColumnsCalculate Usage:¶
DataFrame df
Foo |
Bar |
---|---|
1 |
2 |
2 |
4 |
3 |
8 |
def foobar(df):
df["Foo+Bar"] = df["Foo"] + df["Bar"]
df["Foo*Bar"] = df["Foo"] * df["Bar"]
return df[["Foo+Bar", "Foo*Bar"]]
post = ColumnsCalculate(keys=["Foo+Bar", "Foo*Bar"], lam=lambda x: foobar(x), dtype=int)
DataFrame df
Foo |
Bar |
Foo+Bar |
Foo*Bar |
---|---|---|---|
1 |
2 |
3 |
2 |
2 |
4 |
6 |
8 |
3 |
8 |
11 |
24 |
Example Usage of ColumnsCalculate:¶
>>> import pandas as pd
... df = pd.DataFrame({"Foo": [1, 2, 3], "Bar": [2, 4, 8]}) # Setup
...
... def foobar(d):
... d["Foo+Bar"] = d["Foo"] + d["Bar"]
... d["Foo*Bar"] = d["Foo"] * d["Bar"]
... return d[["Foo+Bar", "Foo*Bar"]]
...
... post = ColumnsCalculate(keys=["Foo+Bar", "Foo*Bar"], lam=lambda x: foobar(x), dtype=int)
... post.lam_wrap(df)
pd.DataFrame({"Foo+Bar": [3, 6, 11], "Foo*Bar": [2, 8, 24]})
-
class
api2db.ingest.post_process.columns_calculate.
ColumnsCalculate
(keys: List[str], lam: Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame], dtypes: List[Any])¶ Bases:
api2db.ingest.post_process.post.Post
Used to calculate new column values to add to the DataFrame
-
__init__
(keys: List[str], lam: Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame], dtypes: List[Any])¶ Creates a ColumnsCalculate object
- Parameters
keys – A list of the keys to add/replace in the existing DataFrame
lam – A function that takes as parameter a DataFrame, and returns a DataFrame with column names matching
keys
and the columns having/being castable todtypes
dtypes – A list of python native types that are associated with
keys
-
ctype
¶ type of data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides super class method
Workflow:
Create a temporary DataFrame
tmp_df
by applyinglam
tolam_arg
For each
key
inself.keys
setlam_arg[key] = tmp_df[key]
For each
key
inself.keys
castlam_arg[key]
to the appropriate pandas dtypeReturn
lam_arg
- Parameters
lam_arg – The DataFrame to modify
- Returns
The modified DataFrame
-
api2db.ingest.post_process.date_cast module¶
Contains the DateCast class¶
Summary of DateCast Usage:¶
DataFrame df
Foo |
Bar |
---|---|
2021-04-29 01:39:00 |
False |
2021-04-29 01:39:00 |
False |
Bar! |
True |
DataFrame df.dtypes
Foo |
Bar |
---|---|
string |
bool |
post = DateCast(key="Foo", fmt="%Y-%m-%d %H:%M:%S")
DataFrame df
Foo |
Bar |
---|---|
2021-04-29 01:39:00 |
False |
2021-04-29 01:39:00 |
False |
NaT |
True |
DataFrame df.dtypes
Foo |
Bar |
---|---|
datetime64[ns] |
bool |
-
class
api2db.ingest.post_process.date_cast.
DateCast
(key: str, fmt: str)¶ Bases:
api2db.ingest.post_process.post.Post
Used to cast columns containing dates in string format to pandas DateTimes
-
__init__
(key: str, fmt: str)¶ Creates a DateCast object
- Parameters
key – The name of the column containing strings that should be cast to datetimes
fmt – A string formatter that specifies the datetime format of the strings in the column named
key
-
ctype
¶ type of data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides super class method
Workflow:
Attempt to cast
lam_arg[self.key]
from strings to datetimesReturn the modified
lam_arg
- Parameters
lam_arg – The DataFrame to modify
- Returns
The modified DataFrame
-
api2db.ingest.post_process.drop_na module¶
Contains the DropNa class¶
Simply a shortcut class for a common operation.
Summary of DropNa Usage:¶
See pandas Documentation
-
class
api2db.ingest.post_process.drop_na.
DropNa
(keys: List[str])¶ Bases:
api2db.ingest.post_process.post.Post
Used to drop columns with null values on specified keys
-
__init__
(keys: List[str])¶ Creates a DropNa object
- Parameters
keys – The subset of keys to drop if the keys are null
-
ctype
¶ type of data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides super class method
Shortcut used to drop null values. Performs
pd.DataFrame.drop_na(subset=self.keys)
- Parameters
lam_arg – The DataFrame to modify
- Returns
The modified DataFrame
-
api2db.ingest.post_process.merge_static module¶
Contains the MergeStatic class¶
Note
MergeStatic is used to merge data together. A common use case of this is in situations where a data-vendor provides an API that gives data-points “Foo”, “Bar”, and “location_id” where “location_id” references a different data-set.
It is common for data-providers to have a file that does not update very frequently, i.e. is mostly static that contains this information.
The typical workflow of a MergeStatic instance is as follows:
Create a LocalStream with mode set to update or replace and a target like CACHE/my_local_stream.pickle
Set the LocalStream to run periodically (6 hours, 24 hours, 10 days, whatever frequency this data is updated)
Add a MergeStatic object to the frequently updating datas post-processors and set the path to the LocalStream storage path.
-
class
api2db.ingest.post_process.merge_static.
MergeStatic
(key: str, path: str)¶ Bases:
api2db.ingest.post_process.post.Post
Merges incoming data with a locally stored DataFrame
-
__init__
(key: str, path: str)¶ Creates a MergeStatic object
- Parameters
key – The key that the DataFrames should be merged on
path – The path to the locally stored file containing the pickled DataFrame to merge with
-
ctype
¶ type of data processor
- Type
str
-
lam_wrap
(lam_arg: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Overrides super class method
Workflow:
Load DataFrame
df
from file specified atself.path
Use
lam_arg
to perform left-merge onself.key
merging withdf
Return the modified DataFrame
- Parameters
lam_arg – The DataFrame to modify
- Returns
The modified DataFrame
-
api2db.ingest.post_process.post module¶
Contains the Post class¶
-
class
api2db.ingest.post_process.post.
Post
¶ Bases:
api2db.ingest.base_lam.BaseLam
Used as a BaseClass for all PostProcessors
-
static
typecast
(dtype: Any) → str¶ Yields a string that can be used for typecasting to pandas dtype.
- Parameters
dtype – A python native type
- Returns
A string that can be used in conjunction with a pandas DataFrame/Series for typecasting
-
static
Module contents¶
Original Author |
Tristen Harr |
Creation Date |
04/29/2021 |
Revisions |
None |