ADR-003: Excel Parsing Strategy¶
Status: Proposed
Date: 2026-01-24
Context¶
CostEngine must parse multi-sheet Excel files with merged cells, specific header locations, and pre-existing formulas.
Decision¶
Primary Ingestion: CSV for high-volume automated data.
Rich Templates: XLSX (using pandas + openpyxl) for manual estimator workflows.
Rationale¶
- CSV: Standard for interoperability and version control (git-friendly). Used for Material Masters and Machine rates.
- XLSX: Required for multi-sheet coordination and identifying visual cues (e.g., yellow cell highlights for RM rates) that Excel users rely on.
- pandas: Handles both formats interchangeably as DataFrames.
Consequences¶
- Positive: Extremely robust parsing; can handle complex MSME spreadsheets.
- Negative: Higher memory footprint for very large files; mitigated by async Celery jobs.