tts_data_utils.invulnerable_data_manager
Submodules
tts_data_utils.invulnerable_data_manager.batch
- class tts_data_utils.invulnerable_data_manager.batch.AllDataBatch
Bases:
BatchA specialized Batch that serves as the global “Master List” for all input data.
Concept: This batch is initialized with tag_data=False because it represents the raw, un-categorized state of the system. It is the “Source of Truth” from which all other Batchers draw their data.
- class tts_data_utils.invulnerable_data_manager.batch.Batch(name, tag_data=True)
Bases:
objectA labeled container for a subset of data items.
The Concept: A Batch is a logical “slice” of data. Unlike a standard DataContainer, a Batch can hold multiple types of data containers simultaneously, grouped by a common theme (like a specific time window or a shared asset ID).
Data Tagging: If tag_data is True, every DataItem added to this batch is “stamped” with a reference back to this batch. This allows individual data rows to “know” which logical groups they belong to later in the pipeline.
- Parameters:
name (str) – Human-readable name for the batch.
tag_data (bool) – If True, calls .tag_with_batch() on all ingested items.
- property data_map
The internal dictionary mapping container names to DataContainers.
- get_data(name)
Retrieve a DataContainer by name from the internal map.
- has_data(name)
Check if a specific data container exists in this batch.
- set_data(**data_map)
Bulk-add data containers to the batch via keyword arguments.
- set_data_one(name, data)
Add a single named DataContainer to the batch.
- Raises:
ValueError – If a container with the same name is already present.
- subdivide_on_condition(sub_f)
Creates a new data map by applying a filter function to every container in the batch.
- Parameters:
sub_f (function) – A function that returns a boolean for each DataItem.
- Returns:
A dictionary of filtered DataContainers.
- Return type:
dict
- subdivide_on_time(start_time=None, end_time=None)
A specialized subdivision helper for temporal windowing.
Usage Note: This relies on the DataItem.time property being properly implemented in the underlying data objects.
- Parameters:
start_time – The beginning of the window (inclusive).
end_time – The end of the window (inclusive).
- class tts_data_utils.invulnerable_data_manager.batch.Batcher(source_batch)
Bases:
ABCAn abstract factory for organizing raw data into logical processing groups.
The Concept: Think of a Batcher as a sorting machine at a postal office. It takes a giant unsorted bin of mail (source_batch) and uses specific internal rules to place that mail into smaller, specialized bags (batches) based on destination or priority (e.g., ‘daily_maintenance’ or ‘system_updates’).
How it works: 1. The InvulnerableDataManager provides the source_batch. 2. The Batcher checks REQUIRED_DATA to ensure it has what it needs. 3. The internal _make_batches() method runs the sorting logic. 4. Each resulting Batch is stored in the self.batches list.
- Parameters:
source_batch (AllDataBatch) – The pool of data to be sorted.
- abstract class property NAME
Name of the data batch type, e.g. ‘mobility_backbone’ or ‘daily_maintenance’
- REQUIRED_DATA = []
- class tts_data_utils.invulnerable_data_manager.batch.UntaggedBatch(all_data_batch)
Bases:
BatchA “Lost and Found” batch for data that escaped categorization.
Concept: After all Batchers have run, the UntaggedBatch scans the AllDataBatch to find any DataItem that does not have any batch tags associated with it. This is a critical auditing tool to ensure no data “fell through the cracks” of the defined sorting logic.
- Parameters:
all_data_batch (AllDataBatch) – The master data batch to scan for untagged items.
tts_data_utils.invulnerable_data_manager.invulnerable_data_manager
- class tts_data_utils.invulnerable_data_manager.invulnerable_data_manager.InvulnerableDataManager
Bases:
objectA high-level coordinator for managing data lifecycle in “failure-resistant” pipelines.
The Concept: Think of the InvulnerableDataManager as a mission control center with a built-in containment field. In standard data processing, a single malformed record or missing file can cause the entire system to crash. This class is designed to “swallow” those failures, log them, and keep the rest of the pipeline moving.
How it works: It maintains separate registries for AllDataBatch (Inputs) and processed results (Outputs). By wrapping data initialization in the @invulnerable decorator, it ensures that even if a DataContainer fails to parse its source, the manager remains alive, allowing other valid data sources to continue processing.
Batching & Untagged Data: The manager also tracks “Batchers.” When data is processed into batches, any data that doesn’t fit into a defined logical group is automatically tracked in an UntaggedBatch, ensuring that no data is lost or ignored silently.
- property all_input_data
Access the registry of all successfully initialized input containers.
- property all_output_data
Access the registry of all generated output containers.
- get_batcher(name)
Retrieves a registered batcher by its name.
- get_data_inventory()
Returns a comprehensive map of all Inputs and Outputs currently managed by the system.
- get_input_data(name)
Retrieves a specific input container by name.
- get_input_inventory()
Returns a summary of all loaded input data types and counts.
- get_output_data(name)
Retrieves a specific output container by name.
- get_output_inventory()
Returns a summary of all generated output data types and counts.
- init_batcher(batcher_cls, **kwargs)
Initializes a DataBatcher to organize input data into logical groups.
How it works: Batchers often require specific input datasets to exist. This method checks for REQUIRED_DATA before attempting to create the batcher. If requirements are met, it uses exec_invulnerable to safely build the batcher.
The Untagged Batch: Every time a batcher is initialized, the manager re-syncs the UntaggedBatch to ensure it captures any data that the new batcher rules might have excluded.
- Parameters:
batcher_cls (Type[DataBatcher]) – The Batcher class to initialize.
kwargs – Additional arguments for the batcher initialization.
- init_data(**kwargs)
tts_data_utils.invulnerable_data_manager.utilities
- class tts_data_utils.invulnerable_data_manager.utilities.cached_property(func)
Bases:
objectA property decorator that caches its result after the first access.
The Concept: Some properties are expensive to calculate (e.g., parsing a large file or performing a complex diff). cached_property ensures that the logic runs exactly once; subsequent calls simply return the stored value.
How it works: The first time the property is accessed, it executes the underlying function and saves the result to a private attribute (prefixed with _). On future calls, it checks for this attribute and returns it directly, bypassing the calculation logic.
- tts_data_utils.invulnerable_data_manager.utilities.clear_global_invulnerable()
Globally disables the ‘invulnerable’ logic, allowing exceptions to raise normally.
- tts_data_utils.invulnerable_data_manager.utilities.exec_invulnerable(func, *args, **kwargs)
Executes a function immediately within an ‘invulnerable’ wrapper.
The Concept: Similar to the @invulnerable decorator, but used for one-off executions where you want to try-catch a call inline without decorating the original definition.
- Parameters:
func (callable) – The function to execute.
args – Positional arguments for the function.
kwargs – Keyword arguments for the function.
- Returns:
The function result or None if an exception was caught.
- tts_data_utils.invulnerable_data_manager.utilities.filter_on_attr_regex(all_data, **kwargs)
Filters a list of objects based on attribute values matching regex patterns.
The Concept: Instead of writing multiple loops to find objects with specific properties, this utility allows you to pass keyword arguments where the key is the attribute name and the value is a regex pattern.
How it works: It dynamically compiles each provided pattern into a strict start-to-finish regex (using ^ and $). It then iterates through the dataset and returns only those objects where every specified attribute satisfies its corresponding regex match.
- Parameters:
all_data (list) – A list of objects to be filtered.
kwargs (str) – Attribute names and their corresponding regex strings.
- Returns:
A list of filtered objects.
- Return type:
list
- tts_data_utils.invulnerable_data_manager.utilities.invulnerable(func)
Decorator function to make a function/method ‘invulnerable’ to exceptions, instead logging errors.
The Concept: In automated data pipelines, a single malformed record shouldn’t necessarily kill a long-running process. This decorator acts as a “safety net” or “containment field” around a function.
How it works: If an exception occurs within the decorated function: 1. It checks the GLOBAL_ENABLE_INVULNERABLE flag. 2. If enabled, it captures the function/class name. 3. It logs a descriptive error and the full stack trace. 4. It returns None rather than crashing, allowing the rest of the program to continue.
- Parameters:
func (callable) – The function or method to be protected.
- tts_data_utils.invulnerable_data_manager.utilities.log_exception(user_message)
Simple shorthand for logging an exception the screen with a clarifying message
- tts_data_utils.invulnerable_data_manager.utilities.set_global_invulnerable()
Globally enables the ‘invulnerable’ error-handling logic.