api2db.store package

Submodules

api2db.store.store module

Contains the Store class

class api2db.store.store.Store(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)

Bases: api2db.stream.stream.Stream

Used for storing data into a local or external source periodically

__init__(name: str, seconds: int, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, chunk_size: int = 0)

Creates a Store object and attempts to build its dtypes.

Parameters
  • name – The name of the collector the store is associated with

  • seconds – The number of seconds between storage cycles

  • path – The path to the directory that will contain sharded files that should be recomposed for storage

  • fmt

    The file format of the sharded files

    • fmt=”parquet” (recommended) stores the DataFrame using parquet format

    • fmt=”json” stores the DataFrame using JSON format

    • fmt=”pickle” stores the DataFrame using pickle format

    • fmt=”csv” stores the DataFrame using csv format

  • drop_duplicate_exclude

    • drop_duplicate_exclude=None

      DataFrame.drop_duplicates() performed before storage

    • drop_duplicate_exclude=[“request_millis”]

      .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.

      Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps

  • move_shards_pathDocumentation and Examples found here

  • move_composed_pathDocumentation and Examples found here

  • chunk_size – CURRENTLY NOT SUPPORTED

store_str

A string used for logging

Type

Optional[str]

build_dependencies()None

Builds the dependencies for the storage object. I.e. Makes the directories for the move_shards_path and the move_composed_path

Returns

None

store()None

Composed a DataFrame from the files in the stores path, and stores the data to the storage target.

Returns

None

start()

Store objects subclass Stream but do not contain a start method. Stores should NEVER use start

Raises

AttributeError – ‘Store’ object has no attribute ‘start’

stream_start()

Store objects subclass Stream but do not contain a stream_start method. Stores should NEVER use stream_start

Raises

AttributeError – ‘Store’ object has no attribute ‘stream_start’

api2db.store.store2bigquery module

Contains the Store2Bigquery class

class api2db.store.store2bigquery.Store2Bigquery(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)

Bases: api2db.store.store.Store

Used for storing data to bigquery periodically

__init__(name: str, seconds: int, auth_path: str, pid: str, did: str, tid: str, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, location: str = 'US', if_exists: str = 'append', chunk_size: int = 0)

Creates a Store2Bigquery object and attempts to build its dtypes.

Parameters
  • name – The name of the collector the store is associated with

  • seconds – The number of seconds between storage cycles

  • auth_path – The path to the Google provided authentication file. I.e. AUTH/google_auth_file.json

  • pid – Google project ID

  • did – Google dataset ID

  • tid – Google table ID

  • path – The path to the directory that will contain sharded files that should be recomposed for storage

  • fmt

    The file format of the sharded files

    • fmt=”parquet” (recommended) loads the sharded files using parquet format

    • fmt=”json” loads the sharded files using JSON format

    • fmt=”pickle” loads the sharded files using pickle format

    • fmt=”csv” loads the sharded files using csv format

  • drop_duplicate_exclude

    • drop_duplicate_exclude=None

      DataFrame.drop_duplicates() performed before storage

    • drop_duplicate_exclude=[“request_millis”]

      .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.

      Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps

  • move_shards_pathDocumentation and Examples found here

  • move_composed_pathDocumentation and Examples found here

  • location – Location of the Bigquery project

  • if_exists

    • if_exists=”append” Adds the data to the table

    • if_exists=”replace” Replaces the table with the new data

    • if_exists=”fail” Fails to upload the new data if the table exists

  • chunk_size – CURRENTLY NOT SUPPORTED

api2db.store.store2omnisci module

Contains the Store2Omnisci class

class api2db.store.store2omnisci.Store2Omnisci(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)

Bases: api2db.store.store.Store

Used for storing data to omnisci periodically

__init__(name: str, seconds: int, db_name: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, protocol: str = 'binary', chunk_size: int = 0)

Creates a Store2Omnisci object and attempts to build its dtypes.

Parameters
  • name – The name of the collector the store is associated with

  • seconds – The number of seconds between storage cycles

  • db_name – The name of the database to connect to

  • username – The username to authenticate with the database

  • password – The password to authenticate with the database

  • host – The host of the database

  • auth_path – The path to the authentication credentials.

  • path – The path to the directory that will contain sharded files that should be recomposed for storage

  • fmt

    The file format of the sharded files

    • fmt=”parquet” (recommended) loads the sharded files using parquet format

    • fmt=”json” loads the sharded files using JSON format

    • fmt=”pickle” loads the sharded files using pickle format

    • fmt=”csv” loads the sharded files using csv format

  • drop_duplicate_exclude

    • drop_duplicate_exclude=None

      DataFrame.drop_duplicates() performed before storage

    • drop_duplicate_exclude=[“request_millis”]

      .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.

      Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps

  • move_shards_pathDocumentation and Examples found here

  • move_composed_pathDocumentation and Examples found here

  • protocol – The protocol to use when connecting to the database

  • chunk_size – CURRENTLY NOT SUPPORTED

api2db.store.store2sql module

Contains the Store2Sql class

class api2db.store.store2sql.Store2Sql(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)

Bases: api2db.store.store.Store

Used for storing data to an SQL database periodically

__init__(name: str, seconds: int, db_name: str, dialect: str, username: Optional[str] = None, password: Optional[str] = None, host: Optional[str] = None, auth_path: Optional[str] = None, port: str = '', path: Optional[str] = None, fmt: str = 'parquet', drop_duplicate_exclude: Optional[List[str]] = None, move_shards_path: Optional[str] = None, move_composed_path: Optional[str] = None, if_exists: str = 'append', chunk_size: int = 0)

Creates a Store2Sql object and attempts to build its dtypes.

Parameters
  • name – The name of the collector the store is associated with

  • seconds – The number of seconds between storage cycles

  • db_name – The name of the database to connect to

  • dialect

    • dialect=”mysql” -> Use this to connect to a mysql database

    • dialect=”mariadb” -> Use this to connect to a mariadb database

    • dialect=”postgresql” -> Use this to connect to a postgresql database

    • dialect=”amazon_aurora” -> COMING SOON

    • dialect=”oracle” -> COMING SOON

    • dialect=”microsoft_sql” -> COMING SOON

    • dialect=”Something else?” -> Submit a feature request… or even better build it!

  • username – The username to authenticate with the database

  • password – The password to authenticate with the database

  • host – The host of the database

  • auth_path – The path to the authentication credentials.

  • port – The port to connect to the database with

  • path – The path to the directory that will contain sharded files that should be recomposed for storage

  • fmt

    The file format of the sharded files

    • fmt=”parquet” (recommended) loads the sharded files using parquet format

    • fmt=”json” loads the sharded files using JSON format

    • fmt=”pickle” loads the sharded files using pickle format

    • fmt=”csv” loads the sharded files using csv format

  • drop_duplicate_exclude

    • drop_duplicate_exclude=None

      DataFrame.drop_duplicates() performed before storage

    • drop_duplicate_exclude=[“request_millis”]

      .drop_duplicates(subset=df.columns.difference(drop_duplicate_exclude)) performed before storage.

      Primarily used for arrival timestamps. I.e. API sends the same data on sequential requests but in most applications the programmer will want to timestamp the arrival time of data, which would lead to duplicate data with the only difference being arrival timestamps

  • move_shards_pathDocumentation and Examples found here

  • move_composed_pathDocumentation and Examples found here

  • chunk_size – CURRENTLY NOT SUPPORTED

Module contents

Original Author

Tristen Harr

Creation Date

04/28/2021

Revisions

None