prefilter package#

Submodules#

prefilter.prefilter module#

class prefilter.prefilter.Prefilter[source]#

Bases: object

Main component of the Log Filtering stage to process and filter batches

Consumes batches from the Log Collection stage and applies relevance-based filtering using the LoglineHandler. Filters out irrelevant loglines and forwards only relevant data to the next pipeline stage for anomaly detection.

clear_data() None[source]#

Clears all data from the internal data structures.

Resets both unfiltered_data and filtered_data lists to empty state, preparing for the next batch processing cycle.

filter_by_error() None[source]#

Applies relevance-based filtering to the unfiltered batch data.

Iterates through all loglines in the unfiltered data and applies the relevance check using the LoglineHandler. Relevant loglines are added to the filtered data, while irrelevant ones are discarded and marked as “filtered_out” in the monitoring system. Updates fill level metrics to track filtering progress.

get_and_fill_data() None[source]#

Retrieves and processes a new batch from the configured Kafka topic.

Clears any previously stored data and consumes a new batch message. Unpacks the batch data including metadata (batch_id, timestamps, subnet_id) and stores it internally. Logs batch reception information and updates monitoring metrics for tracking purposes.

send_filtered_data() None[source]#

Sends the filtered batch data to the next pipeline stage via Kafka.

Creates a properly formatted batch message with metadata and sends it to the configured output topic. Updates batch processing status and resets fill level metrics. Logs detailed statistics about the filtering results.

Raises:

ValueError – If no filtered data is available to send.

prefilter.prefilter.main(one_iteration: bool = False) None[source]#

Creates the Prefilter instance and runs the main processing loop.

Continuously processes batches by retrieving data, applying filters, and sending filtered results. The loop handles various exceptions gracefully and supports clean shutdown via KeyboardInterrupt.

Parameters:

one_iteration (bool) – If True, only processes one batch and exits. Used primarily for testing purposes. Default: False