📘 FP Project: Data Processing Pipeline

🎯 Introduction

Welcome to this exciting tutorial on building a functional data processing pipeline in Python! 🎉 In this guide, we’ll explore how to create powerful, composable data transformation pipelines using functional programming principles.

You’ll discover how functional programming can transform your data processing tasks into elegant, maintainable pipelines. Whether you’re analyzing logs 📊, transforming datasets 📈, or building ETL systems 🏗️, understanding functional pipelines is essential for writing robust, scalable code.

By the end of this tutorial, you’ll feel confident building your own data processing pipelines using functional programming techniques! Let’s dive in! 🏊‍♂️

📚 Understanding Functional Data Pipelines

🤔 What is a Functional Data Pipeline?

A functional data pipeline is like a factory assembly line 🏭. Think of it as a series of stations where each station performs one specific transformation on your data, passing the result to the next station.

In Python terms, it’s a chain of pure functions that transform data step by step. This means you can:

✨ Build complex transformations from simple functions
🚀 Process data efficiently with lazy evaluation
🛡️ Create predictable, testable data flows

💡 Why Use Functional Pipelines?

Here’s why developers love functional pipelines:

Composability 🧩: Build complex operations from simple pieces
Reusability ♻️: Use the same transformations in multiple pipelines
Testability 🧪: Test each function independently
Clarity 🔍: Read pipelines like a recipe

Real-world example: Imagine processing web server logs 📋. With functional pipelines, you can filter errors, extract timestamps, aggregate by hour, and generate reports - all with composable functions!

🔧 Basic Syntax and Usage

📝 Simple Pipeline Example

Let’s start with a friendly example:

# 👋 Hello, Functional Pipeline!
from functools import reduce
from typing import List, Callable, Any

# 🎨 Create simple transformation functions
def add_exclamation(text: str) -> str:
    """Add excitement to text! 🎉"""
    return f"{text}!"

def uppercase(text: str) -> str:
    """Make text LOUD! 📢"""
    return text.upper()

def add_emoji(text: str) -> str:
    """Add some personality! 😊"""
    return f"{text} 🚀"

# 🔄 Compose functions into a pipeline
def pipeline(*functions: Callable) -> Callable:
    """Create a pipeline from functions"""
    def pipe(data: Any) -> Any:
        return reduce(lambda result, func: func(result), functions, data)
    return pipe

# 🎮 Let's use it!
text_pipeline = pipeline(
    add_exclamation,
    uppercase,
    add_emoji
)

result = text_pipeline("hello world")
print(result)  # HELLO WORLD! 🚀

💡 Explanation: Notice how we chain functions together! Each function takes the output of the previous one, creating a smooth data flow.

🎯 Common Pipeline Patterns

Here are patterns you’ll use daily:

# 🏗️ Pattern 1: Filter-Map-Reduce
from typing import Iterable, TypeVar, Optional

T = TypeVar('T')

class Pipeline:
    """Fluent pipeline builder 🏗️"""
    
    def __init__(self, data: Iterable[T]):
        self.data = data
    
    def filter(self, predicate: Callable[[T], bool]) -> 'Pipeline':
        """Filter items that match condition 🔍"""
        self.data = filter(predicate, self.data)
        return self
    
    def map(self, transform: Callable[[T], Any]) -> 'Pipeline':
        """Transform each item 🎨"""
        self.data = map(transform, self.data)
        return self
    
    def reduce(self, reducer: Callable[[Any, T], Any], initial: Any) -> Any:
        """Combine all items 🔄"""
        return reduce(reducer, self.data, initial)
    
    def collect(self) -> List[Any]:
        """Get final results 📦"""
        return list(self.data)

# 🎨 Pattern 2: Lazy evaluation with generators
def read_large_file(filename: str):
    """Read file line by line (memory efficient!) 💾"""
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip()

# 🔄 Pattern 3: Function composition
def compose(*functions):
    """Compose functions right to left 🔗"""
    def inner(data):
        return reduce(lambda x, f: f(x), reversed(functions), data)
    return inner

💡 Practical Examples

🛒 Example 1: E-commerce Sales Analytics

Let’s build a real data processing pipeline:

# 🛍️ Define our data structures
from dataclasses import dataclass
from datetime import datetime
from typing import List, Dict
import json

@dataclass
class Sale:
    """Single sale record 💰"""
    id: str
    product: str
    amount: float
    category: str
    timestamp: datetime
    emoji: str  # Every product needs an emoji!

class SalesAnalysisPipeline:
    """Process sales data functionally 📊"""
    
    def __init__(self, sales: List[Sale]):
        self.pipeline = Pipeline(sales)
    
    # 🔍 Filter functions
    @staticmethod
    def is_high_value(sale: Sale) -> bool:
        """Find big sales! 💎"""
        return sale.amount > 100
    
    @staticmethod
    def is_category(category: str):
        """Filter by category 🏷️"""
        return lambda sale: sale.category == category
    
    # 🎨 Transform functions
    @staticmethod
    def to_summary(sale: Sale) -> Dict:
        """Create sale summary 📋"""
        return {
            'product': f"{sale.emoji} {sale.product}",
            'amount': sale.amount,
            'date': sale.timestamp.strftime('%Y-%m-%d')
        }
    
    @staticmethod
    def add_tax(tax_rate: float):
        """Add tax calculation 💸"""
        return lambda sale: Sale(
            **{**sale.__dict__, 'amount': sale.amount * (1 + tax_rate)}
        )
    
    # 📊 Aggregation functions
    def analyze_by_category(self) -> Dict[str, float]:
        """Group sales by category 📈"""
        from itertools import groupby
        from operator import attrgetter
        
        # Sort by category first
        sorted_sales = sorted(self.pipeline.data, key=attrgetter('category'))
        
        result = {}
        for category, group in groupby(sorted_sales, key=attrgetter('category')):
            total = sum(sale.amount for sale in group)
            result[category] = total
            print(f"📦 {category}: ${total:,.2f}")
        
        return result
    
    def top_products(self, n: int = 5) -> List[Dict]:
        """Find best sellers 🏆"""
        product_sales = {}
        
        for sale in self.pipeline.data:
            key = f"{sale.emoji} {sale.product}"
            product_sales[key] = product_sales.get(key, 0) + sale.amount
        
        # Sort and get top N
        top = sorted(product_sales.items(), key=lambda x: x[1], reverse=True)[:n]
        
        print("🏆 Top Products:")
        for i, (product, total) in enumerate(top, 1):
            print(f"  {i}. {product}: ${total:,.2f}")
        
        return [{'product': p, 'total': t} for p, t in top]

# 🎮 Let's use it!
sales_data = [
    Sale("1", "Laptop", 1200, "Electronics", datetime.now(), "💻"),
    Sale("2", "Coffee Maker", 89, "Appliances", datetime.now(), "☕"),
    Sale("3", "Smartphone", 899, "Electronics", datetime.now(), "📱"),
    Sale("4", "Book", 29, "Media", datetime.now(), "📚"),
    Sale("5", "Headphones", 199, "Electronics", datetime.now(), "🎧"),
]

# Create analysis pipeline
analyzer = SalesAnalysisPipeline(sales_data)

# Filter high-value electronics
high_value_electronics = (
    Pipeline(sales_data)
    .filter(SalesAnalysisPipeline.is_high_value)
    .filter(SalesAnalysisPipeline.is_category("Electronics"))
    .map(SalesAnalysisPipeline.to_summary)
    .collect()
)

print("💎 High-value electronics:", high_value_electronics)

🎯 Try it yourself: Add a time-based filter to analyze sales by hour or day!

🎮 Example 2: Log Processing Pipeline

Let’s process server logs functionally:

# 🏆 Advanced log processing pipeline
import re
from enum import Enum
from collections import Counter

class LogLevel(Enum):
    """Log severity levels 🚦"""
    DEBUG = "DEBUG"
    INFO = "INFO"
    WARNING = "WARNING"
    ERROR = "ERROR"
    CRITICAL = "CRITICAL"

@dataclass
class LogEntry:
    """Parsed log entry 📝"""
    timestamp: datetime
    level: LogLevel
    message: str
    source: str
    emoji: str

class LogProcessingPipeline:
    """Functional log analysis 🔍"""
    
    # 🎨 Parser functions
    @staticmethod
    def parse_log_line(line: str) -> Optional[LogEntry]:
        """Parse raw log line 📋"""
        pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] \[(\w+)\] (.+)'
        match = re.match(pattern, line)
        
        if not match:
            return None
        
        timestamp_str, level_str, source, message = match.groups()
        
        # Pick emoji based on level
        emoji_map = {
            "DEBUG": "🐛",
            "INFO": "ℹ️",
            "WARNING": "⚠️",
            "ERROR": "❌",
            "CRITICAL": "🚨"
        }
        
        return LogEntry(
            timestamp=datetime.strptime(timestamp_str, '%Y-%m-%d %H:%M:%S'),
            level=LogLevel(level_str),
            message=message,
            source=source,
            emoji=emoji_map.get(level_str, "📝")
        )
    
    # 🔍 Filter predicates
    @staticmethod
    def is_error_or_above(entry: LogEntry) -> bool:
        """Find problems! 🚨"""
        return entry.level in [LogLevel.ERROR, LogLevel.CRITICAL]
    
    @staticmethod
    def contains_keyword(keyword: str):
        """Search for specific terms 🔎"""
        return lambda entry: keyword.lower() in entry.message.lower()
    
    @staticmethod
    def within_timeframe(start: datetime, end: datetime):
        """Filter by time window ⏰"""
        return lambda entry: start <= entry.timestamp <= end
    
    # 🎨 Transform functions
    @staticmethod
    def anonymize_ips(entry: LogEntry) -> LogEntry:
        """Remove IP addresses for privacy 🛡️"""
        ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
        anonymized_message = re.sub(ip_pattern, 'XXX.XXX.XXX.XXX', entry.message)
        
        return LogEntry(
            timestamp=entry.timestamp,
            level=entry.level,
            message=anonymized_message,
            source=entry.source,
            emoji=entry.emoji
        )
    
    @staticmethod
    def extract_metrics(entry: LogEntry) -> Dict:
        """Extract performance metrics 📊"""
        # Look for response time patterns
        time_pattern = r'response_time=(\d+)ms'
        match = re.search(time_pattern, entry.message)
        
        return {
            'timestamp': entry.timestamp,
            'source': entry.source,
            'response_time': int(match.group(1)) if match else None,
            'level': entry.level.value,
            'emoji': entry.emoji
        }
    
    # 📊 Analysis functions
    @staticmethod
    def analyze_error_patterns(entries: List[LogEntry]) -> Dict:
        """Find common error patterns 🔍"""
        error_messages = [
            entry.message for entry in entries 
            if entry.level in [LogLevel.ERROR, LogLevel.CRITICAL]
        ]
        
        # Extract error types
        error_types = []
        for msg in error_messages:
            if "timeout" in msg.lower():
                error_types.append("⏱️ Timeout")
            elif "connection" in msg.lower():
                error_types.append("🔌 Connection")
            elif "memory" in msg.lower():
                error_types.append("💾 Memory")
            elif "permission" in msg.lower():
                error_types.append("🔒 Permission")
            else:
                error_types.append("❓ Other")
        
        return dict(Counter(error_types))
    
    @staticmethod
    def generate_summary_report(entries: List[LogEntry]) -> str:
        """Create readable summary 📑"""
        total = len(entries)
        by_level = Counter(entry.level.value for entry in entries)
        
        report = f"""
📊 Log Analysis Summary
====================
📝 Total Entries: {total}
⏰ Time Range: {entries[0].timestamp} to {entries[-1].timestamp if entries else 'N/A'}

📈 By Level:
"""
        for level, count in by_level.most_common():
            emoji = {"DEBUG": "🐛", "INFO": "ℹ️", "WARNING": "⚠️", 
                    "ERROR": "❌", "CRITICAL": "🚨"}.get(level, "📝")
            percentage = (count / total * 100) if total > 0 else 0
            report += f"  {emoji} {level}: {count} ({percentage:.1f}%)\n"
        
        return report

# 🎮 Example usage
sample_logs = [
    "2024-01-15 10:30:45 [INFO] [API] Request processed response_time=45ms",
    "2024-01-15 10:31:02 [ERROR] [DB] Connection timeout to 192.168.1.100",
    "2024-01-15 10:31:15 [WARNING] [API] High memory usage detected",
    "2024-01-15 10:31:30 [CRITICAL] [AUTH] Multiple failed login attempts from 10.0.0.5",
    "2024-01-15 10:32:00 [INFO] [API] Health check passed",
]

# Process logs through pipeline
parsed_logs = [
    LogProcessingPipeline.parse_log_line(line) 
    for line in sample_logs
]
parsed_logs = [log for log in parsed_logs if log]  # Remove None values

# Create pipeline for error analysis
error_pipeline = (
    Pipeline(parsed_logs)
    .filter(LogProcessingPipeline.is_error_or_above)
    .map(LogProcessingPipeline.anonymize_ips)
    .collect()
)

print("🚨 Anonymized Errors:")
for entry in error_pipeline:
    print(f"  {entry.emoji} [{entry.level.value}] {entry.message}")

# Generate summary
summary = LogProcessingPipeline.generate_summary_report(parsed_logs)
print(summary)

🚀 Advanced Concepts

🧙‍♂️ Advanced Topic 1: Lazy Evaluation with Generators

When you’re ready to level up, try this advanced pattern:

# 🎯 Advanced generator-based pipeline
from typing import Generator, Iterator
import itertools

class LazyPipeline:
    """Memory-efficient lazy pipeline 💫"""
    
    def __init__(self, source: Iterator):
        self.source = source
    
    def filter(self, predicate: Callable) -> 'LazyPipeline':
        """Lazy filter - no computation yet! 🦥"""
        self.source = (item for item in self.source if predicate(item))
        return self
    
    def map(self, transform: Callable) -> 'LazyPipeline':
        """Lazy map - still no computation! 🎨"""
        self.source = (transform(item) for item in self.source)
        return self
    
    def take(self, n: int) -> 'LazyPipeline':
        """Take only first n items ✂️"""
        self.source = itertools.islice(self.source, n)
        return self
    
    def batch(self, size: int) -> 'LazyPipeline':
        """Process in batches 📦"""
        def make_batches():
            batch = []
            for item in self.source:
                batch.append(item)
                if len(batch) == size:
                    yield batch
                    batch = []
            if batch:  # Don't forget the last batch!
                yield batch
        
        self.source = make_batches()
        return self
    
    def window(self, size: int) -> 'LazyPipeline':
        """Sliding window over data 🪟"""
        def make_windows():
            window = []
            for item in self.source:
                window.append(item)
                if len(window) == size:
                    yield tuple(window)
                    window.pop(0)
        
        self.source = make_windows()
        return self
    
    def execute(self) -> List:
        """Force evaluation - computation happens here! ⚡"""
        return list(self.source)
    
    def stream(self) -> Generator:
        """Stream results one by one 🌊"""
        for item in self.source:
            yield item

# 🪄 Using the lazy pipeline
def infinite_numbers():
    """Generate infinite sequence 🔢"""
    n = 0
    while True:
        yield n
        n += 1

# Process infinite stream lazily!
result = (
    LazyPipeline(infinite_numbers())
    .filter(lambda x: x % 2 == 0)  # Even numbers
    .map(lambda x: x ** 2)  # Square them
    .filter(lambda x: x < 1000)  # Less than 1000
    .take(10)  # Only first 10
    .execute()
)

print("🎯 First 10 even squares < 1000:", result)

🏗️ Advanced Topic 2: Functional Error Handling

For the brave developers - monadic error handling:

# 🚀 Railway-oriented programming
from typing import Union, Callable, Generic, TypeVar
from dataclasses import dataclass

T = TypeVar('T')
E = TypeVar('E')

@dataclass
class Success(Generic[T]):
    """Successful result 🎉"""
    value: T

@dataclass
class Failure(Generic[E]):
    """Error result 😢"""
    error: E

Result = Union[Success[T], Failure[E]]

class ResultPipeline:
    """Pipeline with error handling 🛡️"""
    
    def __init__(self, result: Result):
        self.result = result
    
    def then(self, func: Callable[[T], Result]) -> 'ResultPipeline':
        """Chain operations that might fail 🔗"""
        if isinstance(self.result, Failure):
            return self  # Skip if already failed
        
        try:
            new_result = func(self.result.value)
            return ResultPipeline(new_result)
        except Exception as e:
            return ResultPipeline(Failure(str(e)))
    
    def map(self, func: Callable[[T], Any]) -> 'ResultPipeline':
        """Transform if successful 🎨"""
        if isinstance(self.result, Failure):
            return self
        
        try:
            new_value = func(self.result.value)
            return ResultPipeline(Success(new_value))
        except Exception as e:
            return ResultPipeline(Failure(str(e)))
    
    def recover(self, handler: Callable[[E], T]) -> 'ResultPipeline':
        """Recover from errors 🏥"""
        if isinstance(self.result, Success):
            return self
        
        try:
            recovered = handler(self.result.error)
            return ResultPipeline(Success(recovered))
        except Exception as e:
            return ResultPipeline(Failure(str(e)))
    
    def unwrap_or(self, default: T) -> T:
        """Get value or default 📦"""
        if isinstance(self.result, Success):
            return self.result.value
        return default

# 🎮 Example: Safe division pipeline
def safe_divide(a: float, b: float) -> Result:
    """Divide safely ➗"""
    if b == 0:
        return Failure("Division by zero! 🚫")
    return Success(a / b)

def validate_positive(n: float) -> Result:
    """Ensure positive number ✅"""
    if n < 0:
        return Failure(f"Negative number: {n} 👎")
    return Success(n)

# Chain operations safely
result = (
    ResultPipeline(Success(100))
    .then(lambda x: safe_divide(x, 2))
    .then(lambda x: validate_positive(x))
    .map(lambda x: f"Result: {x} 🎯")
    .unwrap_or("Operation failed! 😢")
)

print(result)

⚠️ Common Pitfalls and Solutions

😱 Pitfall 1: Mutating Data in Pipelines

# ❌ Wrong way - mutating shared state!
data = [1, 2, 3, 4, 5]

def bad_transform(lst):
    lst.append(6)  # 💥 Mutates original!
    return lst

pipeline_result = Pipeline([data]).map(bad_transform).collect()
print(data)  # [1, 2, 3, 4, 5, 6] - Original changed! 😰

# ✅ Correct way - create new data!
def good_transform(lst):
    return lst + [6]  # Creates new list! 🛡️

data = [1, 2, 3, 4, 5]
pipeline_result = Pipeline([data]).map(good_transform).collect()
print(data)  # [1, 2, 3, 4, 5] - Original unchanged! ✨

🤯 Pitfall 2: Eager Evaluation Memory Issues

# ❌ Dangerous - loads entire file into memory!
def process_huge_file_bad(filename):
    with open(filename) as f:
        lines = f.readlines()  # 💥 Loads everything!
    
    return Pipeline(lines).filter(lambda x: "ERROR" in x).collect()

# ✅ Safe - processes line by line!
def process_huge_file_good(filename):
    def line_generator():
        with open(filename) as f:
            for line in f:
                yield line.strip()
    
    return LazyPipeline(line_generator()).filter(
        lambda x: "ERROR" in x
    ).execute()

🛠️ Best Practices

🎯 Keep Functions Pure: No side effects - same input, same output!
📝 Name Functions Clearly: filter_active_users not f1
🛡️ Handle Errors Gracefully: Use Result types or try-except
🎨 Compose Small Functions: Each does one thing well
✨ Use Type Hints: Makes pipelines self-documenting

🧪 Hands-On Exercise

🎯 Challenge: Build a Real-Time Stream Processor

Create a functional pipeline for processing streaming data:

📋 Requirements:

✅ Process streaming sensor data (temperature, humidity, pressure)
🏷️ Filter anomalies (values outside normal range)
👤 Group by sensor location
📅 Calculate rolling averages
🎨 Generate alerts for critical values!

🚀 Bonus Points:

Add time-window aggregations
Implement backpressure handling
Create real-time dashboard data

💡 Solution

🔍 Click to see solution

# 🎯 Real-time sensor stream processor!
from dataclasses import dataclass
from datetime import datetime, timedelta
from collections import deque
from typing import Deque, Optional
import random
import time

@dataclass
class SensorReading:
    """Single sensor measurement 🌡️"""
    sensor_id: str
    location: str
    temperature: float
    humidity: float
    pressure: float
    timestamp: datetime
    emoji: str

class StreamProcessor:
    """Functional stream processing pipeline 🌊"""
    
    def __init__(self, window_size: int = 10):
        self.windows: Dict[str, Deque[SensorReading]] = {}
        self.window_size = window_size
        self.alerts: List[str] = []
    
    # 🔍 Filter functions
    @staticmethod
    def is_valid_reading(reading: SensorReading) -> bool:
        """Validate sensor data 🛡️"""
        return (
            -50 <= reading.temperature <= 100 and
            0 <= reading.humidity <= 100 and
            900 <= reading.pressure <= 1100
        )
    
    @staticmethod
    def is_anomaly(reading: SensorReading) -> bool:
        """Detect anomalies 🚨"""
        return (
            reading.temperature > 40 or
            reading.temperature < -10 or
            reading.humidity > 90 or
            reading.pressure < 950 or
            reading.pressure > 1050
        )
    
    # 🎨 Transform functions
    def add_to_window(self, reading: SensorReading) -> SensorReading:
        """Maintain sliding window 🪟"""
        key = f"{reading.location}_{reading.sensor_id}"
        
        if key not in self.windows:
            self.windows[key] = deque(maxlen=self.window_size)
        
        self.windows[key].append(reading)
        return reading
    
    def calculate_rolling_stats(self, reading: SensorReading) -> Dict:
        """Calculate rolling statistics 📊"""
        key = f"{reading.location}_{reading.sensor_id}"
        window = self.windows.get(key, [])
        
        if not window:
            return {}
        
        temps = [r.temperature for r in window]
        humids = [r.humidity for r in window]
        pressures = [r.pressure for r in window]
        
        return {
            'sensor': f"{reading.emoji} {reading.sensor_id}",
            'location': reading.location,
            'current': {
                'temperature': reading.temperature,
                'humidity': reading.humidity,
                'pressure': reading.pressure
            },
            'rolling_avg': {
                'temperature': sum(temps) / len(temps),
                'humidity': sum(humids) / len(humids),
                'pressure': sum(pressures) / len(pressures)
            },
            'rolling_min': {
                'temperature': min(temps),
                'humidity': min(humids),
                'pressure': min(pressures)
            },
            'rolling_max': {
                'temperature': max(temps),
                'humidity': max(humids),
                'pressure': max(pressures)
            }
        }
    
    def generate_alert(self, reading: SensorReading) -> Optional[str]:
        """Create alerts for critical values 🚨"""
        alerts = []
        
        if reading.temperature > 45:
            alerts.append(f"🔥 CRITICAL: High temperature {reading.temperature}°C")
        elif reading.temperature < -20:
            alerts.append(f"🧊 CRITICAL: Low temperature {reading.temperature}°C")
        
        if reading.humidity > 95:
            alerts.append(f"💧 CRITICAL: High humidity {reading.humidity}%")
        
        if reading.pressure < 920 or reading.pressure > 1080:
            alerts.append(f"🌪️ CRITICAL: Abnormal pressure {reading.pressure} hPa")
        
        if alerts:
            alert_msg = f"{reading.emoji} Sensor {reading.sensor_id} @ {reading.location}:\n"
            alert_msg += "\n".join(f"  {a}" for a in alerts)
            self.alerts.append(alert_msg)
            return alert_msg
        
        return None
    
    def process_stream(self, readings: Iterator[SensorReading]):
        """Main processing pipeline 🏗️"""
        pipeline = (
            LazyPipeline(readings)
            .filter(self.is_valid_reading)
            .map(self.add_to_window)
            .map(lambda r: (r, self.calculate_rolling_stats(r)))
            .map(lambda pair: {
                'reading': pair[0],
                'stats': pair[1],
                'alert': self.generate_alert(pair[0])
            })
        )
        
        return pipeline

# 🎮 Simulate sensor stream
def generate_sensor_stream():
    """Generate realistic sensor data 📡"""
    sensors = [
        ("TEMP-001", "Factory Floor", "🏭"),
        ("TEMP-002", "Server Room", "🖥️"),
        ("TEMP-003", "Warehouse", "📦"),
        ("TEMP-004", "Office", "🏢")
    ]
    
    while True:
        for sensor_id, location, emoji in sensors:
            # Generate realistic data with occasional anomalies
            base_temp = 20 + random.gauss(0, 5)
            if random.random() < 0.1:  # 10% chance of anomaly
                base_temp += random.choice([-30, 30])
            
            yield SensorReading(
                sensor_id=sensor_id,
                location=location,
                temperature=round(base_temp, 1),
                humidity=round(50 + random.gauss(0, 15), 1),
                pressure=round(1013 + random.gauss(0, 20), 1),
                timestamp=datetime.now(),
                emoji=emoji
            )
        
        time.sleep(0.1)  # Simulate real-time delay

# 🚀 Run the processor!
processor = StreamProcessor(window_size=5)
stream = generate_sensor_stream()

# Process first 20 readings
for i, result in enumerate(processor.process_stream(stream).stream()):
    if i >= 20:
        break
    
    reading = result['reading']
    stats = result['stats']
    alert = result['alert']
    
    print(f"\n📊 Reading {i+1}:")
    print(f"  {reading.emoji} {reading.sensor_id} @ {reading.location}")
    print(f"  🌡️ Temp: {reading.temperature}°C")
    
    if stats and 'rolling_avg' in stats:
        avg_temp = stats['rolling_avg']['temperature']
        print(f"  📈 5-reading avg: {avg_temp:.1f}°C")
    
    if alert:
        print(f"  {alert}")

# Show final alerts summary
if processor.alerts:
    print("\n🚨 ALERTS SUMMARY:")
    for alert in processor.alerts[-5:]:  # Last 5 alerts
        print(alert)

🎓 Key Takeaways

You’ve learned so much! Here’s what you can now do:

✅ Build functional pipelines with confidence 💪
✅ Process data lazily for memory efficiency 🛡️
✅ Compose complex transformations from simple functions 🎯
✅ Handle errors functionally with Result types 🐛
✅ Create real-time stream processors with Python! 🚀

Remember: Functional pipelines make your code more modular, testable, and maintainable! 🤝

🤝 Next Steps

Congratulations! 🎉 You’ve mastered functional data processing pipelines!

Here’s what to do next:

💻 Practice with the streaming exercise above
🏗️ Build a pipeline for your own data processing needs
📚 Explore libraries like toolz and fn.py for more functional tools
🌟 Share your functional pipelines with the community!

Remember: Every data engineering expert started with their first pipeline. Keep building, keep learning, and most importantly, have fun! 🚀

Happy functional programming! 🎉🚀✨

Prerequisites

What you'll learn