api2db.store package¶
Submodules¶
api2db.store.store module¶
Contains the Store class¶
-
class
api2db.store.store.
Store
(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)¶ Bases:
api2db.stream.stream.Stream
Used for storing data into a local or external source periodically
-
__init__
(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)¶ Creates a Store object and attempts to build its dtypes.
- Parameters
name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
fmt=”parquet” (recommended) stores the DataFrame using parquet format
fmt=”json” stores the DataFrame using JSON format
fmt=”pickle” stores the DataFrame using pickle format
fmt=”csv” stores the DataFrame using csv format
drop_duplicate_exclude –
drop_duplicate_exclude=None
DataFrame.drop_duplicates() performed before storage
drop_duplicate_exclude=[“request_millis”]
.drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path –
Documentation and Examples found here
move_composed_path –
Documentation and Examples found here
chunk_size – CURRENTLY NOT SUPPORTED
-
store_str
¶ A string used for logging
- Type
Optional[str]
-
build_dependencies
() → None¶ Builds the dependencies for the storage object. I.e. Makes the directories for the
move_shards_path
and themove_composed_path
- Returns
None
-
store
() → None¶ Composed a DataFrame from the files in the stores path, and stores the data to the storage target.
- Returns
None
-
start
()¶ Store objects subclass Stream but do not contain a start method. Stores should NEVER use start
- Raises
AttributeError – ‘Store’ object has no attribute ‘start’
-
stream_start
()¶ Store objects subclass Stream but do not contain a stream_start method. Stores should NEVER use stream_start
- Raises
AttributeError – ‘Store’ object has no attribute ‘stream_start’
-
api2db.store.store2bigquery module¶
Contains the Store2Bigquery class¶
-
class
api2db.store.store2bigquery.
Store2Bigquery
(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)¶ Bases:
api2db.store.store.Store
Used for storing data to bigquery periodically
-
__init__
(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)¶ Creates a Store2Bigquery object and attempts to build its dtypes.
- Parameters
name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
auth_path – The path to the Google provided authentication file. I.e. AUTH/google_auth_file.json
pid – Google project ID
did – Google dataset ID
tid – Google table ID
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
fmt=”parquet” (recommended) loads the sharded files using parquet format
fmt=”json” loads the sharded files using JSON format
fmt=”pickle” loads the sharded files using pickle format
fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
drop_duplicate_exclude=None
DataFrame.drop_duplicates() performed before storage
drop_duplicate_exclude=[“request_millis”]
.drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path –
Documentation and Examples found here
move_composed_path –
Documentation and Examples found here
location – Location of the Bigquery project
if_exists –
if_exists=”append” Adds the data to the table
if_exists=”replace” Replaces the table with the new data
if_exists=”fail” Fails to upload the new data if the table exists
chunk_size – CURRENTLY NOT SUPPORTED
-
api2db.store.store2omnisci module¶
Contains the Store2Omnisci class¶
-
class
api2db.store.store2omnisci.
Store2Omnisci
(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)¶ Bases:
api2db.store.store.Store
Used for storing data to omnisci periodically
-
__init__
(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)¶ Creates a Store2Omnisci object and attempts to build its dtypes.
- Parameters
name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
db_name – The name of the database to connect to
username – The username to authenticate with the database
password – The password to authenticate with the database
host – The host of the database
auth_path – The path to the authentication credentials.
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
fmt=”parquet” (recommended) loads the sharded files using parquet format
fmt=”json” loads the sharded files using JSON format
fmt=”pickle” loads the sharded files using pickle format
fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
drop_duplicate_exclude=None
DataFrame.drop_duplicates() performed before storage
drop_duplicate_exclude=[“request_millis”]
.drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path –
Documentation and Examples found here
move_composed_path –
Documentation and Examples found here
protocol – The protocol to use when connecting to the database
chunk_size – CURRENTLY NOT SUPPORTED
-
api2db.store.store2sql module¶
Contains the Store2Sql class¶
-
class
api2db.store.store2sql.
Store2Sql
(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)¶ Bases:
api2db.store.store.Store
Used for storing data to an SQL database periodically
-
__init__
(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)¶ Creates a Store2Sql object and attempts to build its dtypes.
- Parameters
name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
db_name – The name of the database to connect to
dialect –
dialect=”mysql” -> Use this to connect to a mysql database
dialect=”mariadb” -> Use this to connect to a mariadb database
dialect=”postgresql” -> Use this to connect to a postgresql database
dialect=”amazon_aurora” -> COMING SOON
dialect=”oracle” -> COMING SOON
dialect=”microsoft_sql” -> COMING SOON
dialect=”Something else?” -> Submit a feature request… or even better build it!
username – The username to authenticate with the database
password – The password to authenticate with the database
host – The host of the database
auth_path – The path to the authentication credentials.
port – The port to connect to the database with
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
fmt=”parquet” (recommended) loads the sharded files using parquet format
fmt=”json” loads the sharded files using JSON format
fmt=”pickle” loads the sharded files using pickle format
fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
drop_duplicate_exclude=None
DataFrame.drop_duplicates() performed before storage
drop_duplicate_exclude=[“request_millis”]
.drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path –
Documentation and Examples found here
move_composed_path –
Documentation and Examples found here
chunk_size – CURRENTLY NOT SUPPORTED
-
Module contents¶
Original Author |
Tristen Harr |
Creation Date |
04/28/2021 |
Revisions |
None |