Skip to content

Latest commit

Β 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

README.md

πŸ’Š Python Module 08 β€” The Matrix: Data Engineering Foundations

This project is part of the 42 School Common Core and focuses on
real-world Python development practices used in data engineering.

Building on previous modules, this project introduces essential tools for
environment isolation, dependency management, and configuration handling.

The goal of this module is to understand and apply:

  • virtual environments (venv)
  • dependency management (pip vs Poetry)
  • external libraries (pandas, numpy, matplotlib, requests)
  • dynamic imports and dependency checks
  • environment variables and .env configuration
  • secure handling of sensitive data
  • real-world project structure

Each exercise simulates a real-world scenario, resulting in a
basic but complete data pipeline system.


πŸ“š Table of Contents

πŸ“š Table of Contents


πŸ“ Project Structure

ex0/
└── construct.py

ex1/
β”œβ”€β”€ loading.py
β”œβ”€β”€ requirements.txt
└── pyproject.toml

ex2/
β”œβ”€β”€ oracle.py
β”œβ”€β”€ .env.example
└── .gitignore

πŸ“Œ Exercises Overview

Exercise 0 β€” Entering the Matrix

Introduces virtual environments.

  • Detects if running inside a virtual environment
  • Displays Python environment information
  • Shows differences between global and isolated environments
  • Provides instructions to create a venv

Concepts: environment isolation, system paths, Python runtime
πŸ‘‰ Based on detecting VIRTUAL_ENV and system paths :contentReference[oaicite:0]{index=0}


Exercise 1 β€” Loading Programs

Focuses on dependency management and data processing.

  • Checks installed dependencies dynamically
  • Supports both pip and Poetry
  • Uses:
    • numpy β†’ data generation
    • pandas β†’ data manipulation
    • matplotlib β†’ visualization
    • requests (optional)
  • Generates a dataset and saves a graph (matrix_analysis.png)

Concepts: package management, dynamic imports, data pipelines
πŸ‘‰ Dependency checking implemented via importlib πŸ‘‰ Requirements defined in requirements.txt


Exercise 2 β€” Accessing the Mainframe

Introduces environment configuration and security.

  • Loads variables from .env
  • Uses python-dotenv
  • Handles:
    • MATRIX_MODE
    • DATABASE_URL
    • API_KEY
    • LOG_LEVEL
    • ZION_ENDPOINT
  • Demonstrates dev vs production behavior
  • Ensures secrets are not hardcoded

Concepts: environment variables, secure config, system design
πŸ‘‰ Configuration loading handled safely with fallback and validation


βš™οΈ Key Learning Points

  • Virtual environments isolate your projects from global Python
  • Dependency management is critical in real-world applications
  • pip and Poetry solve the same problem differently
  • External libraries must be handled safely (missing deps, versions)
  • .env files prevent exposing sensitive data
  • Environment variables override local configuration
  • Clean configuration = secure and scalable systems

πŸ” Security & Best Practices

  • Never commit .env files - use .gitignore (this project includes a .env.example for practice purposes only)
  • Always validate required environment variables
  • Avoid hardcoding secrets in your code
  • Provide fallback behavior for missing dependencies
  • Separate development and production configurations

βœ… Notes

  • Written for Python 3.10+
  • Uses type hints and follows flake8
  • Designed to work with and without dependencies installed
  • Handles errors gracefully (missing packages / config)
  • Focus is on real-world practices, not algorithm complexity
  • Outputs match the subject expectations