We design data architectures that adapt to evolving business needs by leveraging **open core technologies**, a **modular design** and a **single-tenant architecture**. Our technology stack and governance approach ensures flexibility, performance and control, creating a future-proof and reliable setup that can stand the test of time.
## Architecture
### Open Core Technology
We prefer working with **open core vendors** that combine the best of open-source flexibility with the enterprise-grade support and reliability of commercial offerings and which provide the following benefits:
1. **Reduced Vendor Lock-in** – Organizations retain flexibility to switch providers or modify the core, preventing dependence on a single vendor while maintaining the freedom to transition to alternative solutions or bring development in-house when necessary.
2. **More Customization** – Businesses can modify and extend the open core to fit their specific needs, ensuring that the technology aligns with unique workflows, integrates seamlessly with existing systems, and evolves alongside changing business requirements.
3. **Enterprise Support** – Premium features and support provide reliability for critical operations, ensuring that businesses receive dedicated assistance, compliance assurances, and performance optimizations tailored to their specific enterprise-grade requirements.
4. **Faster Innovation** – Open-source contributions drive faster development by enabling a global community of developers to collaborate, introduce new features, and rapidly iterate on improvements without being limited by a single organization’s resources.
5. **Transparent Security** – Public code review enhances security and minimizes hidden vulnerabilities by allowing a large community of developers and cybersecurity experts to continuously audit, test, and improve the software for potential risks or weaknesses.
We pull inspiration from [Gitlab](https://www.gitlab.com) , [Elastic](https://www.elastic.co/) and [DataBricks](https://www.databricks.com) that have built great products and services using the open core design and governance.
### Modular Design
We prioritize a **modular design approach** to building our data architecture, enabling flexibility and adaptability across our systems. This design allows us to seamlessly switch between platforms, adopt a best-of-breed strategy by integrating the most effective tools for each function. The modular design provides these benefits:
1. **Flexibility in Technology Choices** – A modular approach allows organizations to mix and match different tools, databases, and processing engines, ensuring they always use the best technology for each specific use case without being locked into a single vendor.
2. **Easier** **System** **Adjustments** – Since components are independent, individual modules can be upgraded, patched, or replaced without disrupting the entire system, reducing downtime and operational risks.
3. **Faster Experimentation** – New technologies, frameworks, or services can be tested and integrated into specific parts of the architecture without overhauling the entire system, enabling continuous improvement and adaptability.
4. **Scalability on Demand** – Each module can be scaled independently based on demand, optimizing resource usage and ensuring that performance bottlenecks in one area do not impact the entire architecture.
5. **Improved Cost Optimization** – Organizations can allocate resources more efficiently by scaling only the necessary components, avoiding unnecessary infrastructure costs associated with monolithic architectures.
6. **Enhanced Data Governance & Security** – Each module can have its own security policies, access controls, and compliance measures, making it easier to enforce regulations and protect sensitive data.
7. **Simplified Migration & Modernization** – Organizations can transition from legacy systems or adopt new technologies gradually, replacing or upgrading modules one at a time rather than undertaking complex and risky full-system migrations.
By adopting a modular data architecture, organizations can build a highly adaptable, scalable, and resilient data ecosystem that evolves with changing business needs.
### Single Tenant Architecture
A **single-tenant architecture** offers several key benefits compared to a **multi-tenant** approach, particularly in terms of security, performance, customization, and compliance:
1. **Stronger Data Isolation** – With a separate database and infrastructure per tenant, businesses can ensure complete data sovereignty, making it easier to enforce stricter data protection policies and prevent cross-tenant data leakage.
2. **Greater Customization** – Organizations can tailor configurations, integrations, and optimizations specifically to their needs without being constrained by the limitations of a shared infrastructure, providing greater flexibility in system design.
3. **Consistent Performance** – Resources are not shared with other tenants, preventing performance degradation caused by high usage from other organizations and ensuring stable, predictable system behavior at all times.
4. **Seamless Scalability** – Organizations can scale their infrastructure based on their unique demands, increasing or decreasing resources as needed without being affected by the scaling limitations or unavailable resource availability of a shared environment.
5. **Improved Compliance** – A single-tenant architecture makes it easier to meet strict industry regulations and data governance requirements (such as GDPR, HIPAA, or SOC 2) by offering full control over data residency, security policies, and auditing.
While single-tenant architectures may come with higher infrastructure and maintenance costs, they provide unparalleled control and security, making them a preferred choice for organizations with high-performance needs and stringent data protection requirements.
## Technology
For each of the components of a data architecture, we work with one or more vendors, depending on the specific needs of the organization. Below is a list of key technologies we use for each stage of the data workflow.
### Extraction
[[Tooling/Extraction/Airbyte]] – An open-source data integration platform that automates data movement between various sources and destinations with pre-built connectors.
[[Hevo]] – A no-code data pipeline solution that enables real-time data integration from multiple sources into warehouses with minimal setup.
### Computation
[[BigQuery]] – Google’s serverless, highly scalable data warehouse that allows for fast SQL analytics on large datasets.
[[Snowflake]] – A cloud-based data platform offering high-performance data storage, processing, and sharing with built-in scalability and security.
### Transformation
[[dbt]] – A transformation framework that enables analysts and engineers to write modular SQL-based transformations, ensuring data quality and maintainability.
### Semantics
[[Cube]] – A headless BI platform that provides a semantic layer for consistent metrics across dashboards and applications.
[[PowerBI]] – Microsoft’s business intelligence tool offers a semantic model, enabling centralized metric definitions, relationships, and aggregations.
### Analytics
[[PowerBI]] – A powerful self-service analytics platform for creating data visualizations and reports.
[[Looker Studio]] – Google’s free dashboarding tool that integrates with various data sources for building interactive reports.
[[Preset]] – A cloud-hosted version of Apache Superset that enables teams to explore and visualize data with an intuitive UI.
### Calculators
[[Google Sheets]] – A cloud-based spreadsheet tool that supports collaborative data analysis, automation, and integration with various data sources.
[[Microsoft Excel]] – A widely used spreadsheet application with advanced functions, pivot tables, and built-in analytics capabilities for complex data processing.
## Governance
### Responsibilities
Each data architecture has a dedicated *Architect* who is responsible for understanding its structure, assumptions and logic. Whenever any major change is proposed, this Architect needs to be consulted. One or more *Engineers* will be responsible for building and maintaining the system and will take up day-to-day tasks and issues.
The client labels one person as their *Data Owner*, who is the direct contact for the Architect and Engineers. The Data Owner green lights all major development work and is the main contact throughout the duration of the development and maintenance.
<br>
> [!TIP]
> Pick one Data Owner, not two or three. Having more than one owner will cause misalignment on the client side and cause issues around scope of work, priorities and budget allocation.
For all the systems that we deliver, we take responsibility for maintaining its uptime and freshness. We have data engineers on stand-by to make sure that whenever an outage or anomaly is detected, we can solve those swiftly.
### Documentation
Our main approach to documentation, is that we describe and visualise the logic and structure inside each respective tools. For the transformation and semantics layer of the data pipeline we prefer to work in the Readme files of the [[GitHub]] repository for tools like a custom forked [[Tooling/Extraction/Airbyte]] , [[dbt]] and [[Cube]].
The overall documentation on how the data architecture is structured can be found under *Settings* in the [[Internal Analytics]]. The business intelligence software is the most commonly visited application for end-users, so putting the documentation there assures accesibility.
### Security
In our preferred security setup, described under [[Frameworks#Modular Security Protocol]], we maintain a database for each company, ensuring that credentials are strictly isolated per account. This approach enhances security by minimizing cross-account exposure and enforcing strict access controls.
The browser integration allows the engineers to securely log in to necessary platforms without manually handling credentials, reducing the risk of leaks or unauthorized sharing. With encryption at rest and in transit.
### Access Control
All systems within the data architecture are deployed in the company’s designated legal and data environment. Each tool is managed through a dedicated data-service account, which the company designates as the (super) admin for overseeing access rights and roles. Architects and engineers are granted admin privileges tied to their personal company user accounts.
Non-admin and non-engineering users receive viewer access to the specific data tools they need. Where necessary, domain-specific read restrictions can be implemented to protect sensitive company information, particularly financial data.
%% Owner: Philip %%