Work detail

Solar API — Backend for ingesting, normalizing, and aggregating photovoltaic data

This backend API manages real data from a photovoltaic installation in production.

Its role is to centralize information scattered across multiple external sources, normalize heterogeneous formats, and deliver consolidated energy data ready for frontend consumption.

The system is protected with authentication, roles, and permissions, includes data backup mechanisms, and is designed to operate with large volumes of real information.

The problem

Data from a photovoltaic installation is not centralized:

  • Electric utility (Iberdrola)
  • Grid operator (REE)
  • The installation's own inverter

Each source:

  • provides different CSV files,
  • with structures, fields, and formats that differ,
  • and without a common data model.

Additionally:

  • some CSVs contain readings every 5 minutes, others provide hourly aggregations,
  • and the frontend needs coherent and comparable data regardless of origin.

The solution

I designed an API that acts as a data integration, validation, and normalization layer, isolating the frontend from the complexity and heterogeneity of the sources.

Advanced CSV ingestion

  • Supports 4 different CSV formats from different providers.
  • Automatically identifies file type and structure.
  • Transforms each source into a unified data model.

In real use:

  • More than 1,000 CSV files have been processed
  • From multiple sources and formats
  • With complete and successful database loads

Normalization and aggregation

Data is read, transformed, and persisted as hourly data regardless of the original CSV granularity or structure.

During ingestion:

  • the received file is analyzed,
  • redundant information from the CSV provider is filtered out,
  • and hourly data is consolidated, which is what is persisted in the database.

This transformation logic is implemented in the file upload layer, located in the uploads folder, using Python for file and data processing.

From the hourly data stored, the API offers different views depending on the query:

  • Daily data

    Request year–month–day and the API returns 24 hourly values.

  • Monthly data

    Request year–month and the API returns 28, 29, 30 or 31 daily values, depending on month and year.

  • Yearly data

    Request year and the API returns 12 monthly values.

The API and SQL views are prepared to respond differently to the same query, as happens with monthly requests where the number of results varies dynamically.

Managed metrics

  • installation consumption
  • photovoltaic production
  • grid balance:
    • energy purchased
    • energy exported

The architecture is prepared to manage a battery, even if this installation doesn't have one implemented:

  • battery energy balance:
    • energy stored
    • energy consumed

Performance

  • SQL views are used to speed up frequent aggregations,
  • consolidated data is persisted, avoiding repeated calculations,
  • the database acts as a single source of truth from the hourly data.

Security, roles and access control

The API is fully protected:

  • Authentication via login and JWT
  • Role and permission management

Only users with admin role can:

  • manage the database
  • load or restore data
  • run sensitive operations

Security logic is clearly separated from business logic.

Backups and data protection

The system includes explicit data protection mechanisms:

  • Exporting the database to JSON files
  • Loading JSON files for restore or migration
  • These files act as backups of the processed information

This allows:

  • protect historical data,
  • facilitate migrations,
  • and reduce operational risk.

Architecture and tech stack

  • Language: Python
  • Framework: Flask
  • Database: MySQL
  • ORM: SQLAlchemy
  • Migrations: Alembic / Flask-Migrate
  • Authentication: Flask-JWT-Extended
  • Testing: pytest, pytest-mock

Dependencies managed via requirements.txt.

Operations and deployment

  • Deployments on PythonAnywhere
  • deploy.sh script to automate deployments after updating the repository
  • Continuous monitoring of service availability

🔗 Production API

🔗 GitHub repository

What this project proves

This project demonstrates my ability to:

  • Integrate heterogeneous data sources with different formats
  • Design real data validation and normalization processes
  • Build backend systems that scale with growing data volume
  • Operate production services with monitoring and reliability
  • Apply security and role-based access control
  • Maintain a system with traceability, backup, and operational resilience

Project images

API status in production
API status in production
Repository and backend structure
Repository and backend structure
PythonAnywhere deployment environment
PythonAnywhere deployment environment