Skip to content

ADR-003: Excel Parsing Strategy

Status: Proposed
Date: 2026-01-24

Context

CostEngine must parse multi-sheet Excel files with merged cells, specific header locations, and pre-existing formulas.

Decision

Primary Ingestion: CSV for high-volume automated data.
Rich Templates: XLSX (using pandas + openpyxl) for manual estimator workflows.

Rationale

  • CSV: Standard for interoperability and version control (git-friendly). Used for Material Masters and Machine rates.
  • XLSX: Required for multi-sheet coordination and identifying visual cues (e.g., yellow cell highlights for RM rates) that Excel users rely on.
  • pandas: Handles both formats interchangeably as DataFrames.

Consequences

  • Positive: Extremely robust parsing; can handle complex MSME spreadsheets.
  • Negative: Higher memory footprint for very large files; mitigated by async Celery jobs.