# Excel Add-in: Data Cleaning, Analysis, and Transformation
This software is a comprehensive Excel add-in that automates the process of cleaning, analyzing, and transforming data directly in Excel. It’s designed to handle large datasets (even 500,000+ rows) by reading data from a source worksheet, applying complex transformations via dynamically generated Excel formulas, and writing the results into a new, processed worksheet.
## Overview of Main Functions
### Data Ingestion & Validation
- **DataIngestionStream & HeaderDetector**
The tool starts by streaming raw data from an Excel worksheet using a memory-efficient approach. It converts Excel ranges into a standardized format and then validates the headers. This ensures that the structure of the data is correct before any transformations begin.
### Formula Generation & Strategy Application
- **Formula Builders & Strategies**
A core feature of the add-in is its ability to generate Excel formulas dynamically. The software implements several strategies:
- **Clean Strategies**
- The `LetFormulaStrategy` generates complex formulas using Excel’s `LET` function to clean and transform data (e.g., handling unit conversions or partial period adjustments).
- **Revenue Strategies**
- Different strategies exist for calculating revenue figures by month, quarter, or year. These strategies create formulas that sum or adjust values across columns based on the data type and requirements.
- **Utility Strategies**
- Additional strategies are provided for handling specific tasks such as determining the cohort start month, cancel month, or simply placing spacer cells.
- **Bulk Formula Generation**
- Instead of processing each cell individually, the system generates a single formula template for an entire column (or chunk) and then replaces a placeholder (usually `rowPlaceholder`) with the actual row numbers during processing.
- This approach minimizes Excel API calls and maximizes performance.
### Orchestration & Worksheet Building
- **OrchestrateSheetOperation**
A shared orchestration module manages the end-to-end data flow. It:
- Retrieves the source sheet name and verifies that a valid sheet is selected.
- Initiates data ingestion and header validation.
- Calls on the formula factory to create the appropriate formula strategies based on column types.
- Sets up a worksheet builder that handles writing data in chunks, applying formulas, and even batching style updates.
- **Worksheet Builders**
Components like the `DimensionsManager`, `ChunkProcessor`, and `WorksheetBuilderFinalization` collaborate to:
- Manage the dimensions (columns and rows) and track metadata.
- Process data in configurable batches (or “chunks”) to apply formulas and styles efficiently.
- Finalize the worksheet by writing any remaining data and ensuring that all Excel ranges are properly updated.
### Styling & Performance Optimization
- **StyleOperations & XLSXOperations**
- The add-in uses a dedicated style manager to apply formatting rules using `XLSX-js-style`.
- It supports matrix-based styling and LRU caching to optimize performance when dealing with thousands of style objects.
- **Progress & Error Tracking**
- A centralized `ErrorService` logs errors and warnings with detailed context.
- A `ProgressTracker` monitors the operation’s progress—providing feedback on data ingestion, transformation, and worksheet finalization—all while ensuring minimal API calls and efficient processing.
### User Interface & Configuration
- **UI Components & State Management**
- The add-in offers a taskpane and other UI components (such as source sheet selectors and build dimension dropdowns) to let users choose the operation mode (e.g., clean vs. analyze).
- The `StateManager` keeps track of user selections and operational parameters.
- **Multiple Operation Modes**
- There are distinct controllers for:
- **Clean operations** – where data is simply transformed and cleaned.
- **Analyze operations** – which focus on revenue analysis, pivot tables, and waterfall reports.
- Each mode tailors the formula strategies and worksheet-building process accordingly.