#data-architecture
## Overview of Medallion Architecture
- Data design pattern to organise data in a [lakehouse](https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf)
- Mainly focuses on progressively improving data quality as it flows through it
- Layers of the architecture
- Bronze - **Raw data**, 1:1 correspondence with source system tables, streaming + batch sources. Employs change data capture (CDC)
- Silver - **Cleansed and conformed data**, basic processing and cleaning, provides an "enterprise view" of key business entities, concepts and transactions. Provides combined datasets which can serve as input for analysis, data science use cases
- Gold - **Curated business level tables**, consumption-ready, placed into specific use case buckets (projects), highly de-normalized for speed, data quality is strictly enforced, closely resembles data warehouse modelling methodologies

[Source](https://www.advancinganalytics.co.uk/blog/medallion-architecture)
## Benefits
### Simplicity and Ease of Implementation
- Easy to understand architecture.
- For an org with a lot of structured, unstructured data; the "dump it all in" strategy of lakehouse architecture seems useful
- With usage of SQL-on-steroids[^1] frameworks like dbt and Dataform, and the proliferation of the MDS ([[Honest Thoughts about the Modern Data Stack]]), a strict data warehousing methodology is no longer the only option to extract value of data
- Since the focus is more on "cleaning up" data layer by layer, it is much easier for different members of an organisation (even those not technical) to grasp what is really happening to the data that is generated [^2]
### Recreate Tables from Raw Data at Any Time
- All data effectively exists in the bronze layer
- So, any downstream table can be created at any point in time for whatever reason
### ACID Transactions
- Atomicity, Consistency, Isolation and Durability
## Possible Issues
### Volume of Data could Increase Storage Requirements with Little Extra Advantage
Volume is the biggest issue in my opinion. If you are a large org with a lot of data being generated, the "dump it all in" strategy might backfire. Contrary to the top notch sales idea that "more data" means "more business value", "more data" actually means "more complicated requirements" of using this data.
Harnessing all data generated by a large org is not an overnight task and could span several months. In this case, if a phased out, agile approach is not taken and if all data is just dumped into the bronze layer, I suspect this could lead to an over utilisation of storage with no tangible benefit.
### Too Much to Do, Too Little Done
An extension of the previous point. I am personally a bit on the fence around data lakehouse architectures. Probably because I have not had entirely practical experience using it. To me, it sounds glorious as an idea.
However, it also runs the risk of asking orgs to just dump in all their data without curation and then work of that. There is definitely going to be a risk of things growing hands and legs, especially with the fundamental truth that not all grand data integration projects finish within timelines (if they even finish that is).
Nevertheless, it does sound the way forward definitely. And so, I assume with proper planning and "some" curation, this could work for organisations. Definitely, small to mid size orgs would find it a lot easier to start implementing a Medallion architecture as it helps them build and deliver an insights pipeline fast!
## References
1. https://www.databricks.com/glossary/medallion-architecture
2. https://dataengineering.wiki/Concepts/Medallion+Architecture
3. https://www.advancinganalytics.co.uk/blog/medallion-architecture
### Footnotes
[^1]: A term I coined on a previous article - https://medium.com/@yadramshankar/how-to-explain-analytics-engineering-to-friends-and-family-3c4c10482d01
[^2]: You would be surprised how important it actually is for members of the modern data team to understand clearly what is happening to their data at each stage and how exactly their numbers are formed