api2db.store package¶

Submodules¶

api2db.store.store module¶

Contains the Store class¶

class api2db.store.store.Store(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)¶

Bases: api2db.stream.stream.Stream

Used for storing data into a local or external source periodically

__init__(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)¶

Creates a Store object and attempts to build its dtypes.

Parameters

name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
- fmt=”parquet” (recommended) stores the DataFrame using parquet format
- fmt=”json” stores the DataFrame using JSON format
- fmt=”pickle” stores the DataFrame using pickle format
- fmt=”csv” stores the DataFrame using csv format
drop_duplicate_exclude –
- drop_duplicate_exclude=None
  
  DataFrame.drop_duplicates() performed before storage
- drop_duplicate_exclude=[“request_millis”]
  
  .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
  
  Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path – Documentation and Examples found here
move_composed_path – Documentation and Examples found here
chunk_size – CURRENTLY NOT SUPPORTED

store_str¶

A string used for logging

Type: Optional[str]

build_dependencies() → None¶

Builds the dependencies for the storage object. I.e. Makes the directories for the move_shards_path and the move_composed_path

Returns: None

store() → None¶

Composed a DataFrame from the files in the stores path, and stores the data to the storage target.

Returns: None

start()¶

Store objects subclass Stream but do not contain a start method. Stores should NEVER use start

Raises: AttributeError – ‘Store’ object has no attribute ‘start’

stream_start()¶

Store objects subclass Stream but do not contain a stream_start method. Stores should NEVER use stream_start

Raises: AttributeError – ‘Store’ object has no attribute ‘stream_start’

api2db.store.store2bigquery module¶

Contains the Store2Bigquery class¶

class api2db.store.store2bigquery.Store2Bigquery(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)¶

Bases: api2db.store.store.Store

Used for storing data to bigquery periodically

__init__(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)¶

Creates a Store2Bigquery object and attempts to build its dtypes.

Parameters

name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
auth_path – The path to the Google provided authentication file. I.e. AUTH/google_auth_file.json
pid – Google project ID
did – Google dataset ID
tid – Google table ID
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
- fmt=”parquet” (recommended) loads the sharded files using parquet format
- fmt=”json” loads the sharded files using JSON format
- fmt=”pickle” loads the sharded files using pickle format
- fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
- drop_duplicate_exclude=None
  
  DataFrame.drop_duplicates() performed before storage
- drop_duplicate_exclude=[“request_millis”]
  
  .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
  
  Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path – Documentation and Examples found here
move_composed_path – Documentation and Examples found here
location – Location of the Bigquery project
if_exists –
- if_exists=”append” Adds the data to the table
- if_exists=”replace” Replaces the table with the new data
- if_exists=”fail” Fails to upload the new data if the table exists
chunk_size – CURRENTLY NOT SUPPORTED

api2db.store.store2omnisci module¶

Contains the Store2Omnisci class¶

class api2db.store.store2omnisci.Store2Omnisci(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)¶

Bases: api2db.store.store.Store

Used for storing data to omnisci periodically

__init__(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)¶

Creates a Store2Omnisci object and attempts to build its dtypes.

Note

See documentation for Stream2Omnisci

Parameters

name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
db_name – The name of the database to connect to
username – The username to authenticate with the database
password – The password to authenticate with the database
host – The host of the database
auth_path – The path to the authentication credentials.
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
- fmt=”parquet” (recommended) loads the sharded files using parquet format
- fmt=”json” loads the sharded files using JSON format
- fmt=”pickle” loads the sharded files using pickle format
- fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
- drop_duplicate_exclude=None
  
  DataFrame.drop_duplicates() performed before storage
- drop_duplicate_exclude=[“request_millis”]
  
  .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
  
  Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path – Documentation and Examples found here
move_composed_path – Documentation and Examples found here
protocol – The protocol to use when connecting to the database
chunk_size – CURRENTLY NOT SUPPORTED

api2db.store.store2sql module¶

Contains the Store2Sql class¶

class api2db.store.store2sql.Store2Sql(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)¶

Bases: api2db.store.store.Store

Used for storing data to an SQL database periodically

__init__(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)¶

Creates a Store2Sql object and attempts to build its dtypes.

Note

See documentation for Stream2Sql

Parameters

name – The name of the collector the store is associated with
seconds – The number of seconds between storage cycles
db_name – The name of the database to connect to
dialect –
- dialect=”mysql” -> Use this to connect to a mysql database
- dialect=”mariadb” -> Use this to connect to a mariadb database
- dialect=”postgresql” -> Use this to connect to a postgresql database
- dialect=”amazon_aurora” -> COMING SOON
- dialect=”oracle” -> COMING SOON
- dialect=”microsoft_sql” -> COMING SOON
- dialect=”Something else?” -> Submit a feature request… or even better build it!
username – The username to authenticate with the database
password – The password to authenticate with the database
host – The host of the database
auth_path – The path to the authentication credentials.
port – The port to connect to the database with
path – The path to the directory that will contain sharded files that should be recomposed for storage
fmt –
The file format of the sharded files
- fmt=”parquet” (recommended) loads the sharded files using parquet format
- fmt=”json” loads the sharded files using JSON format
- fmt=”pickle” loads the sharded files using pickle format
- fmt=”csv” loads the sharded files using csv format
drop_duplicate_exclude –
- drop_duplicate_exclude=None
  
  DataFrame.drop_duplicates() performed before storage
- drop_duplicate_exclude=[“request_millis”]
  
  .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.
  
  Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps
move_shards_path – Documentation and Examples found here
move_composed_path – Documentation and Examples found here
chunk_size – CURRENTLY NOT SUPPORTED

Module contents¶

Original Author	Tristen Harr
Creation Date	04/28/2021
Revisions	None