ariznabarreta.eu | Solar API (Flask + MySQL + JWT): CSV ingestion and data aggregation

Solar API — Backend for ingesting, normalizing, and aggregating photovoltaic data

This backend API manages real data from a photovoltaic installation in production.

Its role is to centralize information scattered across multiple external sources, normalize heterogeneous formats, and deliver consolidated energy data ready for frontend consumption.

The system is protected with authentication, roles, and permissions, includes data backup mechanisms, and is designed to operate with large volumes of real information.

The problem

Data from a photovoltaic installation is not centralized:

Electric utility (Iberdrola)
Grid operator (REE)
The installation's own inverter

Each source:

provides different CSV files,
with structures, fields, and formats that differ,
and without a common data model.

Additionally:

some CSVs contain readings every 5 minutes, others provide hourly aggregations,
and the frontend needs coherent and comparable data regardless of origin.

The solution

I designed an API that acts as a data integration, validation, and normalization layer, isolating the frontend from the complexity and heterogeneity of the sources.

Advanced CSV ingestion

Supports 4 different CSV formats from different providers.
Automatically identifies file type and structure.
Transforms each source into a unified data model.

In real use:

More than 1,000 CSV files have been processed
From multiple sources and formats
With complete and successful database loads

Normalization and aggregation

Data is read, transformed, and persisted as hourly data regardless of the original CSV granularity or structure.

During ingestion:

the received file is analyzed,
redundant information from the CSV provider is filtered out,
and hourly data is consolidated, which is what is persisted in the database.

This transformation logic is implemented in the file upload layer, located in the uploads folder, using Python for file and data processing.

From the hourly data stored, the API offers different views depending on the query:

Daily data
Request year–month–day and the API returns 24 hourly values.
Monthly data
Request year–month and the API returns 28, 29, 30 or 31 daily values, depending on month and year.
Yearly data
Request year and the API returns 12 monthly values.

The API and SQL views are prepared to respond differently to the same query, as happens with monthly requests where the number of results varies dynamically.

Managed metrics

installation consumption
photovoltaic production
grid balance:
- energy purchased
- energy exported

The architecture is prepared to manage a battery, even if this installation doesn't have one implemented:

battery energy balance:
- energy stored
- energy consumed

Performance

SQL views are used to speed up frequent aggregations,
consolidated data is persisted, avoiding repeated calculations,
the database acts as a single source of truth from the hourly data.

Security, roles and access control

The API is fully protected:

Authentication via login and JWT
Role and permission management

Only users with admin role can:

manage the database
load or restore data
run sensitive operations

Security logic is clearly separated from business logic.

Backups and data protection

The system includes explicit data protection mechanisms:

Exporting the database to JSON files
Loading JSON files for restore or migration
These files act as backups of the processed information

This allows:

protect historical data,
facilitate migrations,
and reduce operational risk.

Architecture and tech stack

Language: Python
Framework: Flask
Database: MySQL
ORM: SQLAlchemy
Migrations: Alembic / Flask-Migrate
Authentication: Flask-JWT-Extended
Testing: pytest, pytest-mock

Dependencies managed via requirements.txt.

Operations and deployment

Deployments on PythonAnywhere
deploy.sh script to automate deployments after updating the repository
Continuous monitoring of service availability

🔗 Production API

🔗 GitHub repository

What this project proves

This project demonstrates my ability to:

Integrate heterogeneous data sources with different formats
Design real data validation and normalization processes
Build backend systems that scale with growing data volume
Operate production services with monitoring and reliability
Apply security and role-based access control
Maintain a system with traceability, backup, and operational resilience

Project images

Work detail