Lab 09 — Data Processing on Edge Devices

Deadlines:

End of lab session (GitHub checkpoint): commit & push your progress to your team repository.
Before next lab (eClass submission): upload (1) a .zip with your code and (2) a PDF export of labs/lab09/README.md.

Submission contents:

(1) a .zip with your code, and
(2) a PDF export of labs/lab09/README.md.

Intro to what you need to do

Your PIR sensor produces events like “detected” or “clear”, one at a time. That is useful, but it does not answer the questions a facilities manager actually cares about: “Is this bin being heavily used right now?” or “Will this area be busy tomorrow morning?”

The raw sensor cannot answer these questions. But software that watches the stream of events and reasons about them can. This is what the lecture called a virtual sensor, a software component that consumes raw data and produces higher-level, more meaningful information that the physical sensor alone cannot provide.

In this lab you will build two virtual sensors on top of your motion pipeline:

A rule-based virtual sensor that watches the stream of motion events, counts them in time windows, and publishes a “bin usage intensity” level (low, medium, high). This is event processing with simple rules (without ML)
An ML-based virtual sensor that looks at historical motion patterns and predicts whether the next hour will be busy or quiet. This uses a trained classifier (the same raw data), but now the logic is learned from data instead of hand-written.

Both virtual sensors subscribe to the same MQTT topic as your consumer. Both publish their results back to MQTT as new topics.

                                    ┌──────────────────────┐
                               ┌───▶│  consumer (JSONL)    │
                               │    └──────────────────────┘
                               │
  PIR ──▶ producer ──▶ MQTT ───┤    ┌──────────────────────┐
                               ├───▶│  virtual sensor       │
                               │    │  (rules: usage level) │──▶ MQTT ──▶ HA
                               │    └──────────────────────┘
                               │
                               │    ┌──────────────────────┐
                               └───▶│  virtual sensor       │
                                    │  (ML: busy predictor) │──▶ MQTT ──▶ HA
                                    └──────────────────────┘

Create the following structure:

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── ...
│   └── lab09/
│       ├── README.md
│       ├── requirements.txt
│       ├── producer.py
│       ├── consumer.py
│       ├── virtual_sensor_rules.py
│       ├── virtual_sensor_ml.py
│       ├── train_model.py
│       ├── models/
│       │   └── (trained model file goes here)
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Copy your producer.py, consumer.py, and pirlib/ from the previous lab. The new files are virtual_sensor_rules.py, virtual_sensor_ml.py, and train_model.py.

Part 1 — Rule-based virtual sensor: bin usage intensity

Your first virtual sensor subscribes to the motion event stream and derives a higher-level metric: how intensely the bin is being used right now. The raw sensor says “motion detected.” The virtual sensor says “this bin is experiencing high activity.”

The logic is straightforward: count motion events in a rolling time window and map the count to a level.

Design the rules

Define thresholds for usage intensity. For example:

Events in last 10 minutes	Usage level
0	idle
1–5	low
6–15	medium
16+	high

These numbers are an example, you can tune them based on what you see fit. The point is that the thresholds are explicit, hand-written rules. You decide what “high” means.

Implement it

Create virtual_sensor_rules.py. This script:

Connects to the MQTT broker
Subscribes to your motion events topic
Maintains a time-windowed count of events
Every N seconds, evaluates the rules and publishes the usage level to a new topic (not new topic everytime)
Also publishes the event count and window size as attributes

Here is a pseudocode skeleton, you can follow this or you can do it your own way. What matters is that it does the above:

START PROGRAM

IMPORT required libraries:
    MQTT client library
    JSON library
    Time library
    Argument parser
    Date and time utilities
    Queue structure
    Lock for thread safety

CREATE an empty queue called event_times
CREATE a lock called event_lock


FUNCTION on_message(client, userdata, message):
    TRY:
        Decode the incoming MQTT message payload
        Convert the payload from JSON text into a data object
        #very important here to remember what was the structure of the sent message, was it a json like the pseudocode here? was it plaintext like 
        #SET payload_text to decoded message payload
        #SET payload_text to payload_text with whitespace removed from both ends
        #IF payload_text equals "detected":

        IF the payload field "hasSimpleResult" equals "detected":
            LOCK event_lock
                Add the current UTC time to event_times
            UNLOCK event_lock

    IF the payload is not valid JSON OR required data is missing:
        Ignore the message
END FUNCTION


FUNCTION evaluate_usage(window_minutes = 10):
    SET cutoff_time to current UTC time minus window_minutes

    LOCK event_lock
        WHILE event_times is not empty AND the oldest event is older than cutoff_time:
            Remove the oldest event from event_times

        SET count to the number of remaining events in event_times
    UNLOCK event_lock

    IF count equals 0:
        RETURN "idle" and count

    ELSE IF count is less than or equal to 5:
        RETURN "low" and count

    ELSE IF count is less than or equal to 15:
        RETURN "medium" and count

    ELSE:
        RETURN "high" and count
END FUNCTION


FUNCTION main():
    CREATE command-line argument parser

    ADD argument "--broker"
        Default value: "localhost"

    ADD argument "--port"
        Default value: 1883

    ADD argument "--subscribe-topic"
        Default value: "smartbin/bin-01/pir-01/events"

    ADD argument "--publish-topic"
        Default value: "smartbin/bin-01/usage"

    ADD argument "--window"
        Default value: 10
        Description: usage evaluation window in minutes

    ADD argument "--interval"
        Default value: 30
        Description: time between evaluations in seconds

    READ command-line arguments into args

    CREATE MQTT client with ID "virtual-sensor-rules"

    SET the client's message handler to on_message

    CONNECT client to args.broker on args.port

    SUBSCRIBE to args.subscribe_topic with QoS level 1

    START MQTT network loop in the background

    PRINT monitoring status message showing:
        subscribe topic
        window size
        evaluation interval

    TRY:
        LOOP forever:
            CALL evaluate_usage using args.window
            STORE returned usage level and event count

            CREATE JSON payload containing:
                usage level
                event count
                window size in minutes
                current UTC evaluation timestamp

            PUBLISH payload to args.publish_topic
                with QoS level 1
                retain enabled

            PRINT usage status message

            WAIT for args.interval seconds

    IF user interrupts the program:
        DISCONNECT MQTT client
END FUNCTION


IF this file is being run directly:
    CALL main()

END PROGRAM

A few things to notice:

The deque acts as a rolling window, old events get dropped when they fall outside the window. This is a simple version of what CEP engines do with time-based windows.

The virtual sensor is a completely separate process from the producer and consumer. It subscribes to the same topic, processes the data differently, and publishes to its own topic. This is the pub/sub pattern at work as you can add new processing without changing anything that already exists.

Add Home Assistant discovery

Add discovery messages so the usage level appears as a Home Assistant entity. Use the same pattern from Lab 07:

Run and test

Start the broker, producer, and your virtual sensor:

# Terminal 1: producer
python producer.py --broker localhost --topic smartbin/bin-01/pir-01/events --pin 18

# Terminal 2: virtual sensor
python virtual_sensor_rules.py --broker localhost --subscribe-topic "smartbin/bin-01/pir-01/events" --publish-topic smartbin/bin-01/usage --window 10 --interval 30

Trigger motion events at different frequencies and watch the usage level change. You should see a new “Bin Usage Intensity” entity that shows idle, low, medium, or high.

Try subscribing from the terminal too:

mosquitto_sub -h localhost -t "smartbin/bin-01/usage"

You should see periodic JSON updates with the usage level and event count.

Part 2 — ML-based virtual sensor: busy period predictor

Your second virtual sensor uses machine learning instead of hand-written rules. It looks at historical patterns and predicts whether the upcoming period will be busy or quiet.

The idea: motion events tend to follow patterns. A bin near a cafeteria is busy at lunchtime. A bin in a lecture hall is busy between classes. If you have enough historical data, a classifier can learn these patterns and predict future activity.

Generate training data

You probably don’t have weeks of real sensor data kept (and of course the bin is not actually used in any real capacity). For the shake of this lab you will generate synthetic training data that mimics realistic patterns (You can change these patterns as you see fit). Create train_model.py:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib
import os


START PROGRAM

FUNCTION generate_training_data(days = 30, seed = 42):
    CREATE a random number generator using seed
    CREATE an empty list called rows

    FOR each day from 0 to days - 1:
        SET day_of_week to day modulo 7
            # 0 = Monday, 6 = Sunday

        FOR each hour from 0 to 23:

            IF day_of_week is Saturday or Sunday:
                SET base_rate to 2
                # Weekend has lower activity

            ELSE IF hour is between 8 and 10 inclusive:
                SET base_rate to 15
                # Morning rush period

            ELSE IF hour is between 11 and 14 inclusive:
                SET base_rate to 25
                # Lunch period

            ELSE IF hour is between 15 and 17 inclusive:
                SET base_rate to 12
                # Afternoon activity

            ELSE IF hour is between 18 and 20 inclusive:
                SET base_rate to 8
                # Evening activity

            ELSE:
                SET base_rate to 1
                # Night or early morning activity

            GENERATE event_count using a normal distribution:
                mean = base_rate
                standard deviation = base_rate * 0.3

            CONVERT event_count to an integer

            IF event_count is less than 0:
                SET event_count to 0

            IF event_count is greater than 10:
                SET label to "busy"
            ELSE:
                SET label to "quiet"

            ADD a row to rows containing:
                day_of_week
                hour
                is_weekend:
                    1 if day_of_week is Saturday or Sunday
                    0 otherwise
                event_count
                label

    CONVERT rows into a table/dataframe

    RETURN the dataframe
END FUNCTION


FUNCTION train_and_save(output_dir = "models"):
    CREATE output_dir if it does not already exist

    CALL generate_training_data()
    STORE the returned dataframe as df

    SELECT input features X from df:
        day_of_week
        hour
        is_weekend

    SELECT target labels y from df:
        label

    SPLIT X and y into training and testing sets:
        80% for training
        20% for testing
        random seed = 42

    CREATE a Random Forest classifier:
        number of trees = 50
        random seed = 42

    TRAIN the classifier using X_train and y_train

    USE the trained classifier to predict labels for X_test
    STORE predictions as y_pred

    PRINT "Model evaluation:"
    PRINT a classification report comparing y_test and y_pred

    SET model_path to output_dir plus "busy_predictor.joblib"

    SAVE the trained model to model_path

    PRINT the location where the model was saved

    RETURN the trained model
END FUNCTION


IF this file is being run directly:
    CALL train_and_save()

END PROGRAM

Run it:

python train_model.py

This generates 30 days of synthetic hourly data with realistic patterns (busy during lunch, quiet at night, less activity on weekends), trains a Random Forest classifier, prints evaluation metrics, and saves the model.

Look at the classification report. What is the accuracy? The precision and recall for “busy” vs “quiet”? These metrics tell you how well the model distinguishes busy from quiet periods. Think about what features matter most, is it the hour of the day? The day of the week?

Build the ML virtual sensor

Create virtual_sensor_ml.py. This script loads the trained model and periodically predicts whether the next hour will be busy or quiet:

import paho.mqtt.client as mqtt
import json
import time
import argparse
import joblib
import numpy as np
from datetime import datetime


FUNCTION load_model(path):
    Load the trained classifier from the given file path
    RETURN the loaded model
END FUNCTION


FUNCTION predict_next_hour(model):
    Get the current local date and time

    SET next_hour to the next hour after the current hour
        If the current hour is 23, set next_hour to 0

    SET day_of_week to the current weekday number
        Monday = 0
        Tuesday = 1
        Wednesday = 2
        Thursday = 3
        Friday = 4
        Saturday = 5
        Sunday = 6

    IF day_of_week is Saturday or Sunday:
        SET is_weekend to 1
    ELSE:
        SET is_weekend to 0

    CREATE feature array containing:
        day_of_week
        next_hour
        is_weekend

    USE the trained model to predict whether the next hour will be busy or quiet

    USE the trained model to calculate class probabilities for the same features

    FIND the probability that corresponds to the predicted class

    SET confidence to that probability

    RETURN prediction, confidence, and next_hour
END FUNCTION


FUNCTION main():
    CREATE command-line argument parser

    ADD argument "--broker"
        Default value: "localhost"
        Description: MQTT broker hostname or IP address

    ADD argument "--port"
        Default value: 1883
        Description: MQTT broker port

    ADD argument "--publish-topic"
        Default value: "smartbin/bin-01/prediction"
        Description: MQTT topic where predictions will be published

    ADD argument "--model-path"
        Default value: "models/busy_predictor.joblib"
        Description: file path of the trained machine learning model

    ADD argument "--interval"
        Default value: 60
        Description: prediction interval in seconds

    ADD argument "--bin-id"
        Default value: "bin-01"
        Description: identifier for the smart bin

    READ command-line arguments into args

    LOAD the trained model from args.model_path

    PRINT message confirming the model was loaded

    CREATE MQTT client with ID "virtual-sensor-ml"

    CONNECT client to args.broker on args.port

    START MQTT network loop in the background

    PRINT monitoring status message showing:
        publish topic
        prediction interval

    TRY:
        LOOP forever:
            CALL predict_next_hour using the loaded model
            STORE returned prediction, confidence, and next_hour

            GET the current UTC date and time

            CREATE JSON payload containing:
                prediction result
                confidence rounded to 3 decimal places
                predicted hour
                UTC prediction timestamp
                model name
                feature values used by the model:
                    current day of week
                    predicted hour
                    whether today is a weekend

            PUBLISH payload to args.publish_topic
                with QoS level 1
                retain enabled

            PRINT prediction status message showing:
                predicted hour
                prediction
                confidence percentage

            WAIT for args.interval seconds

    IF user interrupts the program:
        DISCONNECT MQTT client
END FUNCTION


IF this file is being run directly:
    CALL main()

END PROGRAM

Run and test

# Train the model first (if you haven't already)
python train_model.py

# Start the ML virtual sensor
python virtual_sensor_ml.py --broker localhost --publish-topic smartbin/bin-01/prediction --interval 60

Check the output. The prediction changes based on the current time and day of the week (based on the training data you gave it). At 11 AM on a Tuesday it should predict “busy” for noon. At 11 PM on a Saturday it should predict “quiet.”

Check from the terminal:

mosquitto_sub -h localhost -t "smartbin/bin-01/prediction"

Part 3 — Compare rules vs ML

Now you have two virtual sensors running side by side on the same raw data. They take different approaches to the same problem: understanding bin usage.

Run all of them together:

# Terminal 1: producer (PIR sensor)
python producer.py --broker localhost --topic smartbin/bin-01/pir-01/events --pin 18

# Terminal 2: consumer (JSONL logger)
python consumer.py --broker localhost --topic "smartbin/bin-01/pir-01/events" --out data/events.jsonl

# Terminal 3: rule-based virtual sensor
python virtual_sensor_rules.py --broker localhost --subscribe-topic "smartbin/bin-01/pir-01/events" --publish-topic smartbin/bin-01/usage

# Terminal 4: ML virtual sensor
python virtual_sensor_ml.py --broker localhost --publish-topic smartbin/bin-01/prediction

# Terminal 5: watch both outputs
mosquitto_sub -h localhost -t "smartbin/bin-01/#"

Trigger motion events at different rates and observe both virtual sensors. Think about:

When do the rules and the ML agree? When do they disagree?
The rule-based sensor reacts to what is happening right now. The ML sensor predicts what will happen next.
The rule-based sensor needs no training data since you decided the thresholds. The ML sensor needs historical data but can discover patterns you might not have thought of.
What happens if motion patterns change (e.g., your bin is now closed every monday)? Which sensor adapts, and which needs manual adjustment?

Why this lab is structured this way

The lecture introduced virtual sensors as software that bridges the gap between raw data and meaningful information. You have now built two of them, each using a different approach.

The rule-based sensor is simple, transparent, and immediate. You can explain exactly why it said “high” — because there were 18 events in the last 10 minutes. It needs no training data. But the rules are static. If patterns change, someone has to update the thresholds.

The ML sensor is less transparent but more adaptive. It discovered patterns in the training data (lunch is busy, weekends are quiet) without you having to spell them out. If you retrain it on new data, it adapts to new patterns. But you cannot point to a single number and say “this is why it predicted busy” — the logic is distributed across dozens of decision trees.

In a real system, you would use both. The rule-based sensor gives you an immediate, understandable view of the current state. The ML sensor gives you predictions that help with planning (should we schedule a collection run for this afternoon?). They are complementary — different tools for different questions.

The architecture is also worth noting. Both virtual sensors are independent MQTT subscribers. They do not know about each other, and neither the producer nor the consumer knows about them. You added two new capabilities to your system without changing a single line of existing code. That is the power of pub/sub decoupling combined with the virtual sensor pattern.

Report questions

Answer the following in your labs/lab09/README.md after the implementation and experiments are complete.

Rule-based virtual sensor

RQ1: What thresholds did you use for idle/low/medium/high? How did you decide on these values?
RQ2: What window size did you choose and why? What happens if you make it too short (e.g., 1 minute) or too long (e.g., 60 minutes)?
RQ3: How does the rolling window implementation (the deque) relate to what the lecture described as CEP windowed operators?
RQ4: What would you need to change if you wanted to add a new level (e.g., “critical” for bins that might overflow)?

ML virtual sensor

RQ5: What features did you use for the classifier? Why these features?
RQ6: Show the classification report from training. What is the accuracy? Which class (busy/quiet) is harder to predict?
RQ7: Why did we use a Random Forest classifier? Could you use a different model? What would change?
RQ8: The training data is synthetic. What would change if you used real motion data collected over several weeks? What patterns might emerge that the synthetic data misses?
RQ9: The model publishes a confidence score alongside the prediction. Why is this useful? What should a consumer do if confidence is low (e.g., 55%)?

Comparison

RQ10: Give one scenario where the rule-based sensor and the ML sensor disagree. Which one would you trust more in that scenario, and why?
RQ11: The rule-based sensor reacts to the present. The ML sensor predicts the future. Give one use case where each is more useful.
RQ12: If motion patterns changed tomorrow (e.g., the bin was moved to a new location), which sensor would adapt first? What would you need to do for the other?

Architecture

RQ13: You added two new processing components to your system without modifying the producer or consumer. How did the pub/sub architecture make this possible?
RQ14: Both virtual sensors publish to MQTT. Could a third virtual sensor subscribe to their output and combine them? Give an example.
RQ15: Show a screenshot with the raw motion sensor, usage intensity, and activity prediction all visible.

Reflection

RQ16: In the DIKW pyramid, where does the raw motion event sit? Where does the usage level sit? Where does the prediction sit? What moved the data up each level?
RQ17: In your own words, what is a virtual sensor? How does it differ from a physical sensor?
RQ18: If you had access to additional sensors (temperature, fill level, noise), what virtual sensor could you build by combining them? Describe the inputs, the logic, and the output.

Project hint: Smart Wastebin

Virtual sensors are how your project goes from “we can detect motion” to “we understand what is happening.”

Each of these follows the same pattern you built here: subscribe to raw data, process it, publish a higher-level result. The architecture scales because adding a new virtual sensor never requires changing the existing ones.

For the final project, think about which virtual sensors will make your dashboard genuinely useful. A dashboard that shows raw sensor values is a data viewer. A dashboard that shows usage levels, predictions, and alerts is a decision support tool.

What should be finished before you leave the lab

Before the end of the session you should have: implemented the rule-based virtual sensor with time-windowed event counting, trained the ML classifier on synthetic data and inspected the evaluation metrics, implemented the ML virtual sensor that publishes predictions to MQTT, added Home Assistant discovery for both virtual sensors, run all components together (producer, consumer, both virtual sensors), compared the rule-based and ML approaches, updated labs/lab09/README.md with code and report answers, and pushed to GitHub.

Final checklist (Lab 09)

virtual_sensor_rules.py implemented with time-windowed counting
Rule thresholds defined and configurable
Usage level publishes to MQTT as retained message
train_model.py generates synthetic data and trains a classifier
Classification report reviewed and included in report
Trained model saved to models/ directory
virtual_sensor_ml.py loads model and publishes predictions to MQTT
Both virtual sensors appear as Home Assistant entities
All components run together (producer + consumer + rules + ML)
labs/lab09/README.md contains code, metrics, screenshots, and report answers
Commit and push completed

Deliverables and submission

What must exist in the repository (by end of lab)

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── ...
│   └── lab09/
│       ├── README.md
│       ├── requirements.txt
│       ├── producer.py
│       ├── consumer.py
│       ├── virtual_sensor_rules.py
│       ├── virtual_sensor_ml.py
│       ├── train_model.py
│       ├── models/
│       │   └── busy_predictor.joblib
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Do not include:

venv/
__pycache__/
*.pyc
output/ or *.jsonl
large temporary files unless explicitly requested

What `labs/lab09/README.md` must contain

Two clearly separated parts:

Code / runbook — include your rule thresholds, classification report, how to train and run the virtual sensors, and dashboard screenshots
Answers to report questions

Same style as previous labs.

End of lab session — GitHub checkpoint

Before leaving:

commit your progress
push to your team GitHub repository

Minimum expectation:

all deliverables tracked by Git
latest commit pushed
commit message is clear

Before next lab — eClass submission

Submit both:

Code archive (.zip)
PDF export of labs/lab09/README.md

Required PDF filename format:

lab09_REPORT_<team>.pdf

What follows is the same lab but in Greek

Lab 09 — Επεξεργασία Δεδομένων σε Edge Devices

Προθεσμίες:

Τέλος εργαστηριακής συνεδρίας (GitHub checkpoint): κάντε commit & push την πρόοδό σας στο repo της ομάδας σας.
Πριν το επόμενο εργαστήριο (υποβολή στο eClass): ανεβάστε (1) ένα .zip με τον κώδικά σας και (2) ένα PDF export του labs/lab09/README.md.

Περιεχόμενο υποβολής:

(1) ένα .zip με τον κώδικά σας, και
(2) ένα PDF export του labs/lab09/README.md.

Εισαγωγή στο τι πρέπει να κάνετε

Ο PIR sensor σας παράγει events όπως “detected” ή “clear”, ένα τη φορά. Αυτό είναι χρήσιμο, αλλά δεν απαντά στις ερωτήσεις που πραγματικά αφορούν έναν υπεύθυνο εγκαταστάσεων: “Χρησιμοποιείται αυτός ο κάδος πολύ αυτή τη στιγμή;” ή “Θα είναι αυτός ο χώρος πολυσύχναστος αύριο το πρωί;”

Ο ακατέργαστος sensor δεν μπορεί να απαντήσει σε αυτές τις ερωτήσεις. Αλλά software που παρακολουθεί τη ροή events και συλλογίζεται πάνω σε αυτά μπορεί. Αυτό το είπαμε virtual sensor, ένα software component που καταναλώνει raw data και παράγει πληροφορίες υψηλότερου επιπέδου, πιο ουσιαστικές από αυτές που μπορεί να παρέχει ο φυσικός sensor μόνος του.

Σε αυτό το εργαστήριο θα φτιάξετε δύο virtual sensors πάνω από το motion pipeline σας:

Ένας rule-based virtual sensor που παρακολουθεί τη ροή motion events, τα μετράει σε time windows, και δημοσιεύει ένα επίπεδο “έντασης χρήσης κάδου” (low, medium, high). Αυτή είναι επεξεργασία events με απλούς κανόνες (χωρίς ML).
Ένας ML-based virtual sensor που εξετάζει ιστορικά motion patterns και προβλέπει αν η επόμενη ώρα θα είναι πολυσύχναστη ή ήσυχη. Αυτός χρησιμοποιεί έναν trained classifier (τα ίδια raw data), αλλά τώρα η λογική μαθαίνεται από τα δεδομένα αντί να γράφεται χειροκίνητα.

Και οι δύο virtual sensors κάνουν subscribe στο ίδιο MQTT topic με τον consumer. Και οι δύο κάνουν publish τα αποτελέσματά τους πίσω στο MQTT ως νέα topics.

                                    ┌──────────────────────┐
                               ┌───▶│  consumer (JSONL)    │
                               │    └──────────────────────┘
                               │
  PIR ──▶ producer ──▶ MQTT ───┤    ┌──────────────────────┐
                               ├───▶│  virtual sensor       │
                               │    │  (rules: usage level) │──▶ MQTT ──▶ HA
                               │    └──────────────────────┘
                               │
                               │    ┌──────────────────────┐
                               └───▶│  virtual sensor       │
                                    │  (ML: busy predictor) │──▶ MQTT ──▶ HA
                                    └──────────────────────┘

Δημιουργήστε την παρακάτω δομή:

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── ...
│   └── lab09/
│       ├── README.md
│       ├── requirements.txt
│       ├── producer.py
│       ├── consumer.py
│       ├── virtual_sensor_rules.py
│       ├── virtual_sensor_ml.py
│       ├── train_model.py
│       ├── models/
│       │   └── (το trained model αρχείο πηγαίνει εδώ)
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Αντιγράψτε τα producer.py, consumer.py, και pirlib/ από το προηγούμενο εργαστήριο. Τα νέα αρχεία είναι τα virtual_sensor_rules.py, virtual_sensor_ml.py, και train_model.py.

Μέρος 1 — Rule-based virtual sensor: ένταση χρήσης κάδου

Ο πρώτος virtual sensor σας κάνει subscribe στη ροή motion events και εξάγει ένα μετρικό υψηλότερου επιπέδου: πόσο έντονα χρησιμοποιείται ο κάδος αυτή τη στιγμή. Ο raw sensor λέει “ανιχνεύθηκε κίνηση.” Ο virtual sensor λέει “αυτός ο κάδος εμφανίζει υψηλή δραστηριότητα.”

Η λογική είναι απλή: μετρήστε motion events σε ένα rolling time window και αντιστοιχήστε τον αριθμό σε ένα επίπεδο.

Σχεδιάστε τους κανόνες

Ορίστε thresholds για την ένταση χρήσης. Για παράδειγμα:

Events τα τελευταία 10 λεπτά	Επίπεδο χρήσης
0	idle
1–5	low
6–15	medium
16+	high

Αυτοί οι αριθμοί είναι παράδειγμα — μπορείτε να τους ρυθμίσετε βάσει αυτών που κρίνετε κατάλληλα. Το σημείο είναι ότι τα thresholds είναι ρητοί, χειρόγραφοι κανόνες. Εσείς αποφασίζετε τι σημαίνει “high”.

Υλοποιήστε το

Δημιουργήστε το virtual_sensor_rules.py. Αυτό το script:

Συνδέεται στον MQTT broker
Κάνει subscribe στο topic motion events σας
Διατηρεί χρονικά σταθμισμένο αριθμό events
Κάθε N δευτερόλεπτα, αξιολογεί τους κανόνες και κάνει publish το επίπεδο χρήσης σε νέο topic (όχι νέο topic κάθε φορά)
Κάνει επίσης publish τον αριθμό events και το μέγεθος window ως attributes

Ορίστε ένας pseudocode σκελετός — μπορείτε να τον ακολουθήσετε ή να το κάνετε με τον δικό σας τρόπο. Αυτό που έχει σημασία είναι να κάνει τα παραπάνω:

START PROGRAM

IMPORT required libraries:
    MQTT client library
    JSON library
    Time library
    Argument parser
    Date and time utilities
    Queue structure
    Lock for thread safety

CREATE empty queue event_times
CREATE lock event_lock


FUNCTION on_message(client, userdata, message):
    TRY:
        Decode incoming MQTT message payload
        Convert payload from JSON text into data object
        # Σημαντικό να θυμάστε τη δομή του μηνύματος — ήταν JSON όπως εδώ;
        # Ήταν plaintext όπως:
        # SET payload_text to decoded message payload
        # SET payload_text to payload_text with whitespace removed
        # IF payload_text equals "detected":

        IF payload field "hasSimpleResult" equals "detected":
            LOCK event_lock
                Add current UTC time to event_times
            UNLOCK event_lock

    IF payload is not valid JSON OR required data is missing:
        Ignore the message
END FUNCTION


FUNCTION evaluate_usage(window_minutes = 10):
    SET cutoff_time to current UTC time minus window_minutes

    LOCK event_lock
        WHILE event_times is not empty AND oldest event is older than cutoff_time:
            Remove oldest event from event_times

        SET count to number of remaining events in event_times
    UNLOCK event_lock

    IF count equals 0:
        RETURN "idle" and count

    ELSE IF count is less than or equal to 5:
        RETURN "low" and count

    ELSE IF count is less than or equal to 15:
        RETURN "medium" and count

    ELSE:
        RETURN "high" and count
END FUNCTION


FUNCTION main():
    CREATE command-line argument parser

    ADD argument "--broker"
        Default value: "localhost"

    ADD argument "--port"
        Default value: 1883

    ADD argument "--subscribe-topic"
        Default value: "smartbin/bin-01/pir-01/events"

    ADD argument "--publish-topic"
        Default value: "smartbin/bin-01/usage"

    ADD argument "--window"
        Default value: 10
        Description: παράθυρο αξιολόγησης χρήσης σε λεπτά

    ADD argument "--interval"
        Default value: 30
        Description: χρόνος μεταξύ αξιολογήσεων σε δευτερόλεπτα

    READ command-line arguments into args

    CREATE MQTT client with ID "virtual-sensor-rules"

    SET client's message handler to on_message

    CONNECT client to args.broker on args.port

    SUBSCRIBE to args.subscribe_topic with QoS level 1

    START MQTT network loop in background

    PRINT monitoring status message showing:
        subscribe topic
        window size
        evaluation interval

    TRY:
        LOOP forever:
            CALL evaluate_usage using args.window
            STORE returned usage level and event count

            CREATE JSON payload containing:
                usage level
                event count
                window size in minutes
                current UTC evaluation timestamp

            PUBLISH payload to args.publish_topic
                with QoS level 1
                retain enabled

            PRINT usage status message

            WAIT for args.interval seconds

    IF user interrupts program:
        DISCONNECT MQTT client
END FUNCTION


IF this file is being run directly:
    CALL main()

END PROGRAM

Μερικά πράγματα να παρατηρήσετε:

Το deque λειτουργεί ως rolling window — τα παλιά events αφαιρούνται όταν πέφτουν έξω από το window. Αυτή είναι μια απλή έκδοση αυτού που κάνουν CEP engines με time-based windows.

Ο virtual sensor είναι εντελώς ξεχωριστή διεργασία από τον producer και τον consumer. Κάνει subscribe στο ίδιο topic, επεξεργάζεται τα δεδομένα διαφορετικά, και κάνει publish στο δικό του topic. Αυτό είναι το pub/sub pattern σε εφαρμογή, μπορείτε να προσθέσετε νέα επεξεργασία χωρίς να αλλάξετε τίποτα που ήδη υπάρχει.

Προσθέστε Home Assistant discovery

Προσθέστε discovery messages ώστε το επίπεδο χρήσης να εμφανίζεται ως Home Assistant entity. Χρησιμοποιήστε το ίδιο pattern από το Lab 07.

Εκτέλεση και δοκιμή

Ξεκινήστε τον broker, τον producer, και τον virtual sensor σας:

# Terminal 1: producer
python producer.py --broker localhost --topic smartbin/bin-01/pir-01/events --pin 18

# Terminal 2: virtual sensor
python virtual_sensor_rules.py --broker localhost --subscribe-topic "smartbin/bin-01/pir-01/events" --publish-topic smartbin/bin-01/usage --window 10 --interval 30

Προκαλέστε motion events με διαφορετικές συχνότητες και παρακολουθήστε το επίπεδο χρήσης να αλλάζει. Θα πρέπει να δείτε ένα νέο entity “Bin Usage Intensity” που εμφανίζει idle, low, medium, ή high.

Δοκιμάστε επίσης να κάνετε subscribe από το terminal:

mosquitto_sub -h localhost -t "smartbin/bin-01/usage"

Θα πρέπει να βλέπετε περιοδικές JSON ενημερώσεις με το επίπεδο χρήσης και τον αριθμό events.

Μέρος 2 — ML-based virtual sensor: predictor πολυσύχναστης περιόδου

Ο δεύτερος virtual sensor σας χρησιμοποιεί machine learning αντί για χειρόγραφους κανόνες. Εξετάζει ιστορικά patterns και προβλέπει αν η επερχόμενη περίοδος θα είναι πολυσύχναστη ή ήσυχη.

Η ιδέα: τα motion events τείνουν να ακολουθούν patterns. Ένας κάδος κοντά σε καφετέρια είναι πολυσύχναστος το μεσημέρι. Ένας κάδος σε αίθουσα διάλεξης είναι πολυσύχναστος μεταξύ των μαθημάτων. Αν έχετε αρκετά ιστορικά δεδομένα, ένας classifier μπορεί να μάθει αυτά τα patterns και να προβλέπει μελλοντική δραστηριότητα.

Δημιουργήστε training data

Πιθανότατα δεν έχετε εβδομάδες πραγματικών sensor δεδομένων (και φυσικά ο κάδος δεν χρησιμοποιείται πραγματικά). Για τις ανάγκες αυτού του εργαστηρίου, θα δημιουργήσετε synthetic training data που μιμείται ρεαλιστικά patterns (μπορείτε να αλλάξετε αυτά τα patterns όπως κρίνετε). Δημιουργήστε train_model.py:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib
import os


START PROGRAM

FUNCTION generate_training_data(days = 30, seed = 42):
    CREATE random number generator using seed
    CREATE empty list rows

    FOR each day from 0 to days - 1:
        SET day_of_week to day modulo 7
            # 0 = Δευτέρα, 6 = Κυριακή

        FOR each hour from 0 to 23:

            IF day_of_week is Saturday or Sunday:
                SET base_rate to 2
                # Σαββατοκύριακο έχει χαμηλότερη δραστηριότητα

            ELSE IF hour is between 8 and 10 inclusive:
                SET base_rate to 15
                # Πρωινή ώρα αιχμής

            ELSE IF hour is between 11 and 14 inclusive:
                SET base_rate to 25
                # Μεσημεριανό διάλειμμα

            ELSE IF hour is between 15 and 17 inclusive:
                SET base_rate to 12
                # Απογευματινή δραστηριότητα

            ELSE IF hour is between 18 and 20 inclusive:
                SET base_rate to 8
                # Βραδινή δραστηριότητα

            ELSE:
                SET base_rate to 1
                # Νύχτα ή πρωί

            GENERATE event_count using normal distribution:
                mean = base_rate
                standard deviation = base_rate * 0.3

            CONVERT event_count to integer

            IF event_count is less than 0:
                SET event_count to 0

            IF event_count is greater than 10:
                SET label to "busy"
            ELSE:
                SET label to "quiet"

            ADD row to rows containing:
                day_of_week
                hour
                is_weekend:
                    1 if day_of_week is Saturday or Sunday
                    0 otherwise
                event_count
                label

    CONVERT rows into dataframe

    RETURN dataframe
END FUNCTION


FUNCTION train_and_save(output_dir = "models"):
    CREATE output_dir if it does not already exist

    CALL generate_training_data()
    STORE returned dataframe as df

    SELECT input features X from df:
        day_of_week
        hour
        is_weekend

    SELECT target labels y from df:
        label

    SPLIT X and y into training and testing sets:
        80% for training
        20% for testing
        random seed = 42

    CREATE Random Forest classifier:
        number of trees = 50
        random seed = 42

    TRAIN classifier using X_train and y_train

    USE trained classifier to predict labels for X_test
    STORE predictions as y_pred

    PRINT "Model evaluation:"
    PRINT classification report comparing y_test and y_pred

    SET model_path to output_dir plus "busy_predictor.joblib"

    SAVE trained model to model_path

    PRINT location where model was saved

    RETURN trained model
END FUNCTION


IF this file is being run directly:
    CALL train_and_save()

END PROGRAM

Τρέξτε το:

python train_model.py

Αυτό δημιουργεί 30 ημέρες synthetic ωριαίων δεδομένων με ρεαλιστικά patterns (πολυσύχναστο το μεσημέρι, ήσυχο τη νύχτα, λιγότερη δραστηριότητα τα Σαββατοκύριακα), εκπαιδεύει έναν Random Forest classifier, εκτυπώνει evaluation metrics, και αποθηκεύει το model.

Κοιτάξτε το classification report. Ποιο είναι το accuracy; Το precision και recall για “busy” έναντι “quiet”; Αυτά τα metrics σας λένε πόσο καλά το model διακρίνει τις πολυσύχναστες από τις ήσυχες περιόδους. Σκεφτείτε ποια features έχουν μεγαλύτερη σημασία, είναι η ώρα της ημέρας; Η ημέρα της εβδομάδας;

Φτιάξτε τον ML virtual sensor

Δημιουργήστε το virtual_sensor_ml.py. Αυτό το script φορτώνει το trained model και περιοδικά προβλέπει αν η επόμενη ώρα θα είναι πολυσύχναστη ή ήσυχη:

import paho.mqtt.client as mqtt
import json
import time
import argparse
import joblib
import numpy as np
from datetime import datetime


FUNCTION load_model(path):
    Load trained classifier from given file path
    RETURN loaded model
END FUNCTION


FUNCTION predict_next_hour(model):
    Get current local date and time

    SET next_hour to next hour after current hour
        If current hour is 23, set next_hour to 0

    SET day_of_week to current weekday number
        Monday = 0
        Tuesday = 1
        Wednesday = 2
        Thursday = 3
        Friday = 4
        Saturday = 5
        Sunday = 6

    IF day_of_week is Saturday or Sunday:
        SET is_weekend to 1
    ELSE:
        SET is_weekend to 0

    CREATE feature array containing:
        day_of_week
        next_hour
        is_weekend

    USE trained model to predict whether next hour will be busy or quiet

    USE trained model to calculate class probabilities for same features

    FIND probability corresponding to predicted class

    SET confidence to that probability

    RETURN prediction, confidence, and next_hour
END FUNCTION


FUNCTION main():
    CREATE command-line argument parser

    ADD argument "--broker"
        Default value: "localhost"
        Description: MQTT broker hostname ή IP

    ADD argument "--port"
        Default value: 1883
        Description: MQTT broker port

    ADD argument "--publish-topic"
        Default value: "smartbin/bin-01/prediction"
        Description: MQTT topic όπου θα δημοσιεύονται οι προβλέψεις

    ADD argument "--model-path"
        Default value: "models/busy_predictor.joblib"
        Description: διαδρομή αρχείου του trained machine learning model

    ADD argument "--interval"
        Default value: 60
        Description: διάστημα πρόβλεψης σε δευτερόλεπτα

    ADD argument "--bin-id"
        Default value: "bin-01"
        Description: αναγνωριστικό smart bin

    READ command-line arguments into args

    LOAD trained model from args.model_path

    PRINT message confirming model was loaded

    CREATE MQTT client with ID "virtual-sensor-ml"

    CONNECT client to args.broker on args.port

    START MQTT network loop in background

    PRINT monitoring status message showing:
        publish topic
        prediction interval

    TRY:
        LOOP forever:
            CALL predict_next_hour using loaded model
            STORE returned prediction, confidence, and next_hour

            GET current UTC date and time

            CREATE JSON payload containing:
                prediction result
                confidence rounded to 3 decimal places
                predicted hour
                UTC prediction timestamp
                model name
                feature values used by model:
                    current day of week
                    predicted hour
                    whether today is weekend

            PUBLISH payload to args.publish_topic
                with QoS level 1
                retain enabled

            PRINT prediction status message showing:
                predicted hour
                prediction
                confidence percentage

            WAIT for args.interval seconds

    IF user interrupts program:
        DISCONNECT MQTT client
END FUNCTION


IF this file is being run directly:
    CALL main()

END PROGRAM

Εκτέλεση και δοκιμή

# Εκπαιδεύστε πρώτα το model (αν δεν το έχετε ήδη κάνει)
python train_model.py

# Ξεκινήστε τον ML virtual sensor
python virtual_sensor_ml.py --broker localhost --publish-topic smartbin/bin-01/prediction --interval 60

Ελέγξτε το output. Η πρόβλεψη αλλάζει βάσει της τρέχουσας ώρας και ημέρας της εβδομάδας (βάσει των training data που του δώσατε). Στις 11 π.μ. Τρίτη πρέπει να προβλέπει “busy” για το μεσημέρι. Στις 11 μ.μ. Σάββατο πρέπει να προβλέπει “quiet.”

Ελέγξτε από το terminal:

mosquitto_sub -h localhost -t "smartbin/bin-01/prediction"

Μέρος 3 — Σύγκριση rules vs ML

Τώρα έχετε δύο virtual sensors που τρέχουν παράλληλα στα ίδια raw data. Χρησιμοποιούν διαφορετικές προσεγγίσεις για το ίδιο πρόβλημα: την κατανόηση της χρήσης κάδου.

Τρέξτε όλα μαζί:

# Terminal 1: producer (PIR sensor)
python producer.py --broker localhost --topic smartbin/bin-01/pir-01/events --pin 18

# Terminal 2: consumer (JSONL logger)
python consumer.py --broker localhost --topic "smartbin/bin-01/pir-01/events" --out data/events.jsonl

# Terminal 3: rule-based virtual sensor
python virtual_sensor_rules.py --broker localhost --subscribe-topic "smartbin/bin-01/pir-01/events" --publish-topic smartbin/bin-01/usage

# Terminal 4: ML virtual sensor
python virtual_sensor_ml.py --broker localhost --publish-topic smartbin/bin-01/prediction

# Terminal 5: παρακολούθηση και των δύο outputs
mosquitto_sub -h localhost -t "smartbin/bin-01/#"

Προκαλέστε motion events με διαφορετικές συχνότητες και παρατηρήστε και τους δύο virtual sensors. Σκεφτείτε:

Πότε συμφωνούν οι κανόνες και το ML; Πότε διαφωνούν;
Ο rule-based sensor αντιδρά σε αυτό που συμβαίνει τώρα. Ο ML sensor προβλέπει τι θα συμβεί στη συνέχεια.
Ο rule-based sensor δεν χρειάζεται training data, αποφασίσατε εσείς τα thresholds. Ο ML sensor χρειάζεται ιστορικά δεδομένα αλλά μπορεί να ανακαλύψει patterns που δεν σκεφτήκατε.
Τι συμβαίνει αν τα motion patterns αλλάξουν (π.χ., ο κάδος είναι τώρα κλειστός κάθε Δευτέρα); Ποιος sensor προσαρμόζεται, και ποιος χρειάζεται χειροκίνητη ρύθμιση;

Γιατί αυτό το εργαστήριο είναι δομημένο έτσι

Η διάλεξη εισήγαγε τους virtual sensors ως software που γεφυρώνει το χάσμα μεταξύ raw data και ουσιαστικής πληροφορίας. Έχετε τώρα φτιάξει δύο από αυτούς, καθένας χρησιμοποιώντας διαφορετική προσέγγιση.

Ο rule-based sensor είναι απλός, διαφανής, και άμεσος. Μπορείτε να εξηγήσετε ακριβώς γιατί είπε “high”, γιατί υπήρχαν 18 events τα τελευταία 10 λεπτά. Δεν χρειάζεται training data. Αλλά οι κανόνες είναι στατικοί. Αν τα patterns αλλάξουν, κάποιος πρέπει να ενημερώσει τα thresholds.

Ο ML sensor είναι λιγότερο διαφανής αλλά πιο προσαρμοστικός. Μπορεί να ανακαλύψει patterns στα training data (το μεσημέρι είναι πολυσύχναστο, τα Σαββατοκύριακα είναι ήσυχα) χωρίς να χρειαστεί να τα ορίσετε εσείς. Αν τον επανεκπαιδεύσετε σε νέα δεδομένα, προσαρμόζεται σε νέα patterns. Αλλά δεν μπορείτε να δείξετε σε έναν μόνο αριθμό και να πείτε “αυτός είναι ο λόγος που προέβλεψε busy” καθώς η λογική είναι κατανεμημένη σε δεκάδες decision trees.

Σε ένα πραγματικό σύστημα, θα χρησιμοποιούσατε και τους δύο. Ο rule-based sensor σας δίνει μια άμεση, κατανοητή εικόνα της τρέχουσας κατάστασης. Ο ML sensor σας δίνει προβλέψεις που βοηθούν στον σχεδιασμό (πρέπει να προγραμματίσουμε αποκομιδή για αυτό το απόγευμα;). Είναι συμπληρωματικοί και διαφορετικά εργαλεία για διαφορετικές ερωτήσεις.

Αξίζει επίσης να σημειωθεί η αρχιτεκτονική. Και οι δύο virtual sensors είναι ανεξάρτητοι MQTT subscribers. Δεν γνωρίζουν ο ένας τον άλλο, και ούτε ο producer ούτε ο consumer γνωρίζει για αυτούς. Προσθέσατε δύο νέες δυνατότητες στο σύστημά σας χωρίς να αλλάξετε ούτε μια γραμμή υπάρχοντος κώδικα. Αυτή είναι η δύναμη του pub/sub decoupling σε συνδυασμό με το virtual sensor pattern.

Ερωτήσεις αναφοράς

Απαντήστε τα παρακάτω στο labs/lab09/README.md σας αφού ολοκληρωθεί η υλοποίηση και τα πειράματα.

Rule-based virtual sensor

RQ1: Ποια thresholds χρησιμοποιήσατε για idle/low/medium/high; Πώς αποφασίσατε αυτές τις τιμές;
RQ2: Ποιο μέγεθος window επιλέξατε και γιατί; Τι συμβαίνει αν το κάνετε πολύ μικρό (π.χ. 1 λεπτό) ή πολύ μεγάλο (π.χ. 60 λεπτά);
RQ3: Πώς σχετίζεται η υλοποίηση rolling window (το deque) με αυτό που η διάλεξη περιέγραψε ως CEP windowed operators;
RQ4: Τι θα χρειαζόταν να αλλάξετε αν θέλατε να προσθέσετε νέο επίπεδο (π.χ. “critical” για κάδους που μπορεί να ξεχειλίσουν);

ML virtual sensor

RQ5: Ποια features χρησιμοποιήσατε για τον classifier; Γιατί αυτά τα features;
RQ6: Δείξτε το classification report από την εκπαίδευση. Ποια είναι η accuracy; Ποια κλάση (busy/quiet) είναι πιο δύσκολο να προβλεφθεί;
RQ7: Γιατί χρησιμοποιήσαμε Random Forest classifier; Θα μπορούσατε να χρησιμοποιήσετε διαφορετικό model; Τι θα άλλαζε;
RQ8: Τα training data είναι synthetic. Τι θα άλλαζε αν χρησιμοποιούσατε πραγματικά motion data συλλεγμένα κατά διάστημα αρκετών εβδομάδων; Ποια patterns μπορεί να αναδυόταν που τα synthetic data παραλείπουν;
RQ9: Το model δημοσιεύει confidence score μαζί με την πρόβλεψη. Γιατί αυτό είναι χρήσιμο; Τι πρέπει να κάνει ένας consumer αν το confidence είναι χαμηλό (π.χ. 55%);

Σύγκριση

RQ10: Δώστε ένα σενάριο όπου ο rule-based sensor και ο ML sensor διαφωνούν. Ποιον θα εμπιστευόσασταν περισσότερο σε εκείνο το σενάριο, και γιατί;
RQ11: Ο rule-based sensor αντιδρά στο παρόν. Ο ML sensor προβλέπει το μέλλον. Δώστε μία περίπτωση χρήσης όπου κάθε ένας είναι πιο χρήσιμος.
RQ12: Αν τα motion patterns άλλαζαν αύριο (π.χ., ο κάδος μεταφέρθηκε σε νέα τοποθεσία), ποιος sensor θα προσαρμοζόταν πρώτος; Τι θα χρειαζόταν να κάνετε για τον άλλο;

Αρχιτεκτονική

RQ13: Προσθέσατε δύο νέα processing components στο σύστημά σας χωρίς να τροποποιήσετε τον producer ή τον consumer. Πώς το κατέστησε αυτό εφικτό η pub/sub αρχιτεκτονική;
RQ14: Και οι δύο virtual sensors κάνουν publish στο MQTT. Θα μπορούσε ένας τρίτος virtual sensor να κάνει subscribe στο output τους και να τα συνδυάζει; Δώστε ένα παράδειγμα.
RQ15: Δείξτε screenshot με τον raw motion sensor, την ένταση χρήσης, και την πρόβλεψη δραστηριότητας όλα ορατά.

Αναστοχασμός

RQ16: Στην πυραμίδα DIKW, πού βρίσκεται το raw motion event; Πού βρίσκεται το επίπεδο χρήσης; Πού βρίσκεται η πρόβλεψη; Τι μετακίνησε τα δεδομένα πάνω σε κάθε επίπεδο;
RQ17: Με δικά σας λόγια, τι είναι ένας virtual sensor; Πώς διαφέρει από έναν φυσικό sensor;
RQ18: Αν είχατε πρόσβαση σε επιπλέον sensors (θερμοκρασία, επίπεδο πλήρωσης, θόρυβος), ποιον virtual sensor θα μπορούσατε να φτιάξετε συνδυάζοντάς τους; Περιγράψτε τα inputs, τη λογική, και το output.

Υπόδειξη project: Smart Wastebin

Οι virtual sensors είναι ο τρόπος με τον οποίο το project σας πηγαίνει από “μπορούμε να ανιχνεύσουμε κίνηση” σε “καταλαβαίνουμε τι συμβαίνει.”

Κάθε ένας ακολουθεί το ίδιο pattern που φτιάξατε εδώ: κάνε subscribe σε raw data, επεξεργάσου τα, κάνε publish ένα αποτέλεσμα υψηλότερου επιπέδου. Η αρχιτεκτονική κλιμακώνεται γιατί η προσθήκη νέου virtual sensor δεν απαιτεί ποτέ αλλαγή των υπαρχόντων.

Για το τελικό project, σκεφτείτε ποιοι virtual sensors θα κάνουν το dashboard σας πραγματικά χρήσιμο. Ένα dashboard που εμφανίζει raw sensor τιμές είναι ένα data viewer. Ένα dashboard που εμφανίζει επίπεδα χρήσης, προβλέψεις, και ειδοποιήσεις είναι ένα εργαλείο υποστήριξης αποφάσεων.

Τι πρέπει να έχει ολοκληρωθεί πριν φύγετε από το εργαστήριο

Πριν το τέλος της συνεδρίας θα πρέπει να έχετε: υλοποιήσει τον rule-based virtual sensor με χρονικά σταθμισμένη καταμέτρηση events, εκπαιδεύσει τον ML classifier σε synthetic data και επιθεωρήσει τα evaluation metrics, υλοποιήσει τον ML virtual sensor που κάνει publish προβλέψεις στο MQTT, προσθέσει Home Assistant discovery και για τους δύο virtual sensors, τρέξει όλα τα components μαζί (producer, consumer, και οι δύο virtual sensors), συγκρίνει τις rule-based και ML προσεγγίσεις, ενημερώσει το labs/lab09/README.md με κώδικα και απαντήσεις αναφοράς, και κάνει push στο GitHub.

Τελικό checklist (Lab 09)

virtual_sensor_rules.py υλοποιημένο με χρονικά σταθμισμένη καταμέτρηση
Rule thresholds ορισμένα και configurable
Επίπεδο χρήσης κάνει publish στο MQTT ως retained message
train_model.py δημιουργεί synthetic data και εκπαιδεύει classifier
Classification report ελεγμένο και συμπεριλαμβάνεται στην αναφορά
Trained model αποθηκευμένο στον κατάλογο models/
virtual_sensor_ml.py φορτώνει model και κάνει publish προβλέψεις στο MQTT
Και οι δύο virtual sensors εμφανίζονται ως Home Assistant entities
Όλα τα components τρέχουν μαζί (producer + consumer + rules + ML)
labs/lab09/README.md περιέχει κώδικα, metrics, screenshots, και απαντήσεις αναφοράς
Commit και push ολοκληρωμένα

Παραδοτέα και υποβολή

Τι πρέπει να υπάρχει στο repo (έως το τέλος του εργαστηρίου)

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── ...
│   └── lab09/
│       ├── README.md
│       ├── requirements.txt
│       ├── producer.py
│       ├── consumer.py
│       ├── virtual_sensor_rules.py
│       ├── virtual_sensor_ml.py
│       ├── train_model.py
│       ├── models/
│       │   └── busy_predictor.joblib
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Μην συμπεριλάβετε:

venv/
__pycache__/
*.pyc
output/ ή *.jsonl
μεγάλα προσωρινά αρχεία εκτός αν ζητηθεί ρητά

Τι πρέπει να περιέχει το `labs/lab09/README.md`

Δύο σαφώς διαχωρισμένα μέρη:

Κώδικας / runbook — συμπεριλάβετε τα rule thresholds σας, το classification report, πώς να εκπαιδεύσετε και να τρέξετε τους virtual sensors, και screenshots dashboard
Απαντήσεις στις ερωτήσεις αναφοράς

Ίδιο στυλ με τα προηγούμενα εργαστήρια.

Τέλος εργαστηριακής συνεδρίας — GitHub checkpoint

Πριν φύγετε:

κάντε commit την πρόοδό σας
κάντε push στο GitHub repo της ομάδας σας

Ελάχιστη προσδοκία:

όλα τα παραδοτέα παρακολουθούνται από το Git
το τελευταίο commit έχει γίνει push
το commit message είναι σαφές

Πριν το επόμενο εργαστήριο — υποβολή στο eClass

Υποβάλετε και τα δύο:

Αρχείο κώδικα (.zip)
PDF export του labs/lab09/README.md

Απαιτούμενη μορφή ονόματος PDF αρχείου:

lab09_REPORT_<team>.pdf

Virtual Sensors

Lab 09 — Data Processing on Edge Devices

Intro to what you need to do

Part 1 — Rule-based virtual sensor: bin usage intensity

Design the rules

Implement it

Add Home Assistant discovery

Run and test

Part 2 — ML-based virtual sensor: busy period predictor

Generate training data

Build the ML virtual sensor

Run and test

Part 3 — Compare rules vs ML

Why this lab is structured this way

Report questions

Rule-based virtual sensor

ML virtual sensor

Comparison

Architecture

Reflection

Project hint: Smart Wastebin

What should be finished before you leave the lab

Final checklist (Lab 09)

Deliverables and submission

What must exist in the repository (by end of lab)

What labs/lab09/README.md must contain

End of lab session — GitHub checkpoint

Before next lab — eClass submission

What follows is the same lab but in Greek

Lab 09 — Επεξεργασία Δεδομένων σε Edge Devices

Εισαγωγή στο τι πρέπει να κάνετε

Μέρος 1 — Rule-based virtual sensor: ένταση χρήσης κάδου

Σχεδιάστε τους κανόνες

Υλοποιήστε το

Προσθέστε Home Assistant discovery

Εκτέλεση και δοκιμή

Μέρος 2 — ML-based virtual sensor: predictor πολυσύχναστης περιόδου

Δημιουργήστε training data

Φτιάξτε τον ML virtual sensor

Εκτέλεση και δοκιμή

Μέρος 3 — Σύγκριση rules vs ML

Γιατί αυτό το εργαστήριο είναι δομημένο έτσι

Ερωτήσεις αναφοράς

Rule-based virtual sensor

ML virtual sensor

Σύγκριση

Αρχιτεκτονική

Αναστοχασμός

Υπόδειξη project: Smart Wastebin

Τι πρέπει να έχει ολοκληρωθεί πριν φύγετε από το εργαστήριο

Τελικό checklist (Lab 09)

Παραδοτέα και υποβολή

Τι πρέπει να υπάρχει στο repo (έως το τέλος του εργαστηρίου)

Τι πρέπει να περιέχει το labs/lab09/README.md

Τέλος εργαστηριακής συνεδρίας — GitHub checkpoint

Πριν το επόμενο εργαστήριο — υποβολή στο eClass

What `labs/lab09/README.md` must contain

Τι πρέπει να περιέχει το `labs/lab09/README.md`