Frameworks - MAXQ Docs - Obsidian Publish

At our company we work with three frameworks: the **Standardised Data Architecture**, the **Integrated Performance Framework** and the **Modular Security Protocol**. Each of these frameworks have their own set of individual **Frameworks Checks** to validate the framework is correctly deployed and adhered to. ## Standardised Data Architecture ### Data Pipeline Design Each data pipeline is going through several layers before it reaches the data consumers. **Extraction layer** We connect the client's operating systems to a centralised reporting datawarehouse with extractors such as [[Tooling/Extraction/Airbyte]], [[Hevo]] . For the datawarehousing we prefer using Google [[BigQuery]] due to its low cost and ultra-fast quering speed. **Staging layer** (part of transformation) The staging transformations and all other concecutive data transformations inside the datawarehouse are done using the tool [[dbt]], which stores all logic in a repository like [[GitHub]]. This implies that we codify all your business logic, exceptions and definitions. **Marts layer** (part of transformation) With a clean and consistent staging layer, we define the business entities, dimensions, measures and metrics that fits the domain model for your industry and business model. **Historic layer** (part of transformation) After this step, we create slowly moving dimension tables per entity which contain the slowly moving properties, daily measures and metrics. Depending on the business model these entities are: `accounts`, `projects`, `people`,`articles`,`deals`, `departments`, `cost centers` and `enterprises`. **External layer** (part of transformation) When the core components of the data model are done, we prepare them for downstream applications by creating several [OBTs](https://dataengineering.wiki/Concepts/One+Big+Table), (One Big Tables). These tables are storage heavy, but contain dozens of dimensions, which makes the analysis downstream significantly more intuitive for end-users. The downstream applications are connected through the external layer that is defined in the datawarehouse. **Semantic layer** Based on the external layer in the data warehouse, we setup a connection with [[Cube]] which provides the semantic layer that enables downstream data consumers. **Consumer layer** Downstream applications that natively connect with the semantic layer are the headless business intelligence tools like Looker Studio, PowerBI, [Steep](https://steep.app/) and [[Preset]]. The semantic layer integrates with calculators like [[Google Sheets]] and [[Microsoft Excel]] that are used by sales, finance and operations to perform ad-hoc calculations and forecasting. The performance data can be connected to the Open Core [[Metricsrouter]] exchange for serving data to external stakeholders. ```mermaid graph TD %% Raw Source Data Layer R1[("Exact Online (Accounting)")]:::source R2[("Hubspot (CRM)")]:::source R3[("Float (Planning)")]:::source R4[("GSheets (Custom Input)")]:::source R1 --> E1 R2 --> E2 R3 ~~~ E1 R3 --> E2 R4 ~~~ E1 R4 --> E2 %% Extraction Layer E1("Airbyte VM (Extraction)"):::extraction E2("Airbyte Cloud (Extraction)"):::extraction E1 --> C1 E2 --> C1 %% Computation Layer C1[("BigQuery (Computation)")]:::computation %% Transformation layer T1("dbt Cloud (Transformation)"):::transformation T1 --> C1 %% Semantic Layer S1("Cube (Semantics)"):::semantic C1 --> S1 %% Presentation Layer P1["Preset (Analytics)"]:::visuals P2["GSheets (Reporting)"]:::visuals P3["Metricsrouter (Exchange)"]:::visuals S1 --> P1 S1 --> P2 S1 --> P3 classDef source fill:#f2f2f2,stroke:#666666 classDef extraction fill:#b3e0ff,stroke:#0066cc classDef computation fill:#ffb3b3,stroke:#e67373 classDef transformation fill:#ffcc99,stroke:#ff9933 classDef semantic fill:#d9f2d9,stroke:#339933 classDef visuals fill:#f3e6ff,stroke:#9933ff ``` *Example data architecture with all the layers and data assets of typical pipeline* **Data Architecture Maturity** The engineering of your data architecture has reached maturity when the SDA-score reaches > 95%. We base our decision to expand development of the data architecture and downstream assets when the following 4 performance metric targets are met over the last 3 months. 1. The data model uptime without freshness issues is >99.0% 2. The data architecture integrity tests pass >99.0% during operations 3. The entry data test score needs to be >99.0%. 4. The issue-resolve duration needs to be smaller than 48 hours for >80% of the issues. ### SDA Framework Checks The **Standardised Data Architecture 1** has **37 framework checks** that focus on consistency in formatting, layering, code structure, repository design and integrity testing. #### Integrate an actual switch > [!abstract] FC2 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: (absent) For more sizable companies that want to report their measures and metrics to stakeholders. The question pops up: "Is this data ready?" or "Is this data reliable?" Therefore the actual switch allows the financial controller to give the green-light on the input data. Each factual data model has an integrated `is_actual` boolan allowing alignment with the monthly close. #### All models are tested > [!abstract] FC4 > Points: `3` &ensp; Framework: `SDA-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/12D0HAuSqUS9eSTgcBClyfPufy_N56Ukr/view) Check all models on uniqueness and completeness using the [dbt evaluator package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/). Several general accepted data engineering best practices, when it comes to standardisation of the syntax have been captured in syntax evaluation packages. These packages can become part of the process when re-deploying your code. ```dbt dbt build --select package:dbt_project_evaluator ``` You can customise the dbt evaluator in the `dbt_project.yml` filed in the root directory to fully test all the layers with the additional option to deviate from the default test settings: ```yaml vars: dbt_project_evaluator: models_fanout_threshold: 10 #Default is 3 too_many_joins_threshold: 8 #Default is 7 model_types: ['base','staging', 'intermediate', 'marts', 'spines', 'historicals', 'other', 'utilities', 'data_quality', 'external'] utilities_folder_name: 'utilities' utilities_prefixes: ['util_'] base_folder_name: 'base' base_prefixes: ['base_'] spines_folder_name: 'spines' spines_prefixes: ['sp_'] historicals_folder_name: 'historicals' historicals_prefixes: ['hist_'] data_quality_folder_name: 'data_quality' data_quality_prefixes: ['dq_'] external_folder_name: 'external' external_prefixes: ['ext_'] ``` #### All models have descriptions > [!abstract] FC5 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1sioGW1GttgM5g98ePNDcVfeRmqbI8gmv/view) Models require descriptions about their contents and role in the data model. A 10-15 word description will suffice here. Use the dbt evaluator to check for any undocumented models in your property YAML files. ```yaml models: - name: stg_hubspot__companies description: Hubspot Companies ``` #### Dbt build jobs linked to log > [!abstract] FC8 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1_QInYKO5vkuvD7-bBbFtODbeHLyGUD-q/view) Every time a job is executed by dbt a log file is sent to a dedicated dbt [[Slack]] channel. Not that important. But gives you as operator an idea what jobs are operational and can get some feel how things are going and jobs are consistently labeled etc. ![[Frameworks dbt job log.png]] #### Cluster staging fields > [!abstract] FC9 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1IGhV3ilARFWq9xlJojWlYOyjJqN9nXQs/view) Cluster the output fields of each staging model according to standard groups that are either `IDs`, `Timestamps`, `Properties` or `Measures`. ```SQL SELECT -- IDs safe_cast(source.id as integer) as entity_id, safe_cast(source.uuid as string) as entity_uuid, ... -- Timestamps safe_cast(source.created_at as timestamp) as created_at, safe_cast(source.updated_at as timestamp) as updated_at, safe_cast(source.extraction_time as timestamp) as extracted_at, ... -- Attributes safe_cast(source.attribute_a as integer) as attribute_a, safe_cast(source.attribute_b as string) as attribute_b, ... -- Attributes | Subdomain A safe_cast(source.attribute_c as integer) as attribute_c, safe_cast(source.attribute_d as string) as attribute_d, ... -- Attributes | Subdomain B safe_cast(source.attribute_c as string) as attribute_c, safe_cast(source.attribute_d as string) as attribute_d, ... -- Measures safe_cast(source.amount_1 as numeric) as measure_1, safe_cast(source.amount_2 as numeric) as measure_2, ... FROM source ``` #### Correct filename convention > [!abstract] FC12 > Points: `3` &ensp; Framework: `SDA-1` &ensp; Video: (absent) Use the correct naming convention for yaml files and sql files, and test them using the dbt project evaluator. Convention: the source freshness is tested in the `_{{source_name}}_sources.yml` file, while the integrity tests are part of the `_{{source_name}}_properties` where **not null** and **unique** tests are done. ```git dbt project/ └── models └── utilities └── _util_properties.yaml └── staging └── source_name ├── _{{source_name}}_sources.yaml ├── _{{source_name}}_properties.yaml └── stg_{{source_name}}__{{modelname}}.sql └── intermediate └── _int_properties.yaml └── int_{{model_name}}.sql └── marts └── _mart_properties.yaml └── fct_{{model_name}}.sql └── dim_{{model_name}}.sql └── spines └── _sp_properties.yaml └── sp_{{model_name}}.sql └── historics └── _hist_properties.yaml └── hist_{{model_name}}.sql └── external └── _ext_properties.yaml └── ext_{{model_name}}.sql └── data_quality └── _dq_properties.yaml └── dq_test_aggregation.sql ``` #### All models are integrity tested > [!abstract] FC15 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/156_XjUz6IZb774vQ6UhZ6fJITlsAfnxn/view) Each table in the staging layer should be tested and modeled to prevent duplications and persisting deleted records. At least test on uniqueness and nulls of one or multiple columns. ```yaml models: - name: stg_hubspot__companies description: Hubspot Companies columns: - name: company_id tests: - not_null - unique ``` #### Dbt build jobs have freshness > [!abstract] FC17 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1N9-TaQ40mREZw8u-yWpMb0ckMmSvuxrJ/view) The dbt build also has a `dbt source freshness` command. With one integrated build job for both the building and freshness checks we have a steady cadence vs. 2 separate flows. #### Dedicated error log channel > [!abstract] FC19 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1N9-TaQ40mREZw8u-yWpMb0ckMmSvuxrJ/view) All transformation jobs are linked to a dedicated error log channel. For all setups this is #ops-dbt-error-log. #### Consistent defined measures > [!abstract] FC21 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1RykSrPIFhYPmBg1jw0RdM7mhnumQBBIL/view) Each measure is defined either in the staging layer, marts layer or in the historics layer. Over time, these three layers in the data model have shown to be the most intuitive and flexible location to define a measure. #### Dedicated client contact > [!abstract] FC23 > Points: `3` &ensp; Framework: `SDA-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1nreB-KGwDkJKlR7jF7UkLZzcRH_8FuFu/view) The client has a dedicated person who owns that account, the account manager role. This role is clearly allocated to a specific person over a specific period of time. #### Direct source contact > [!abstract] FC24 Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/190kJNxn2jEoOoKJ5qoZwptdBAQxAZw0Y/view) An engineer is allowed to directly contact the support department of each source system for issue resolution. For a quick-response, direct lines to the suppliers of applications is key for speedy and high quality issue resolution. #### Duplicate elimination > [!abstract] FC25 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1h-4mvNpumHctEsbMX5WXzJGV_G-05d_d/view) Use the latest record version available in the raw data file. This is a very standard first-step cleaning procedure to make sure that the input raw data into the staging model is the latest. Use our default duplicate elimination macro for this. ```SQL {% macro filter_id_latest(id, extraction, table) %} inner join ({{ id_on_latest(id, extraction, table) }}) as latest_extraction on latest_extraction.latest_extraction_at = {{ table }}.{{ extraction }} and latest_extraction.unique_id = {{ table }}.{{ id }} {% endmacro %} ``` #### End-to-end data studio > [!abstract] FC29 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1S64aTcSc8A1sPJFbx3u36NhXoL5ZOPYe/view) Have an extensive end-to-end data visualisation report with in-depth data drilldowns. A solid data model requires extensive and fast deepdives in all your models. This requires running queries on tables sized multiple GB. This setup enables a solid understanding if the domain model is working correctly. #### Apply data entry tests > [!abstract] FC30 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1C9bbNmIF3w3uE_ua_4hKjzsGwYavVCpV/view) The in-depth data entry tests are an essential ingredient for a solid data model. A clean data-set without errors is highly desirable for the engineering team as well as the end-users. In both cases the clean data prevents weird anomalies to occur during the development process, or duruing business processes and business calculations. ![[Frameworks Data Entry Tests.png]] #### Failed dbt jobs send email > [!abstract] FC33 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [1 min.](https://drive.google.com/file/d/1DeXxRjHweG-IqfkFoqRXtHH9ez_VIpOO/view) Every time a dbt job fails an email is sent to all involved people. This is equivalent to the Slack notifications, but communication channel agnostic. #### Full client-side ownership > [!abstract] FC35 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1rk303aM0ZYaWXFnApEPevK-8GD0f25yD/view) The client has full access and ownership of all data pipeline applications. A good customer-client relationship is built mutual trust. Whenever there are lock-ins that allow data engineering agency to have some sense of control of the client, this prevents the client from leaving. This prevents a systemic signal to reach the customer, when leaving is too hard, that customers are already turned off and stick with you, because they have some constraints. When the eventually leave, you get the negative feedback signal far too late. So its a preventitive maintanence check. #### Full source access > [!abstract] FC36 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1ZM9aaAwBgGLoWpfZ-HD6P6ZKphYExVGc/view) The engineers and architects have full-viewer rights access to all source systems. Without this access the entities, dimensions, measures and metrics cannot be validated. #### Many to one in staging > [!abstract] FC39 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1pa-7uhajEmsrcJ8fY7G4KEWs7wx9Kqv_/view) In the staging to fact-layer the models are only modeled like many to one. Many to many models cause long term loss of speed of development. #### Model slowly moving dimensions > [!abstract] FC41 > Points: `3` &ensp; Framework: `SDA-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1wFQ1sDEQTOmk63TdouPD_Ltl9qQacrhC/view) Apply a spine model to all key entities in the data model. Every entity that has attributes that can change over the life time over that entity requires a spine-based model that allows for timetravel, to se how that entity changed over time. #### No data cleaning in extraction > [!abstract] FC42 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1TIRvXFHrFii9uVViNbx6mYu4S8TPcuOg/view) Prevent doing any cleaning and quality checks before the transformation layer for consistency. You want all data quality and cleaning steps to be done in the staging layer. Nowhere else. You want to be able, as engineer, to know where all the make-up is applied. So prevent from #### No schema sync with source > [!abstract] FC42 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: (absent) Extraction tools do not automatically include new tables from source in the pipeline. Applies specifically for Airbyte and Hevo. When this feature is turned on, new tables can be added when the schema is expanded by the source provider, resulting in unexpected cost expansion and running out of credits for a specific month, which causes the pipeline to shut down. ![[Frameworks Airbyte backfill.png]] *No schema backfills of new or renamed columns* ![[Frameworks Hevo New tables.png]] *No new tables added to reporting database whenever the source creates new ones* #### No synonyms > [!abstract] FC44 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [4. min](https://drive.google.com/file/d/17OCPIwTmgTzndz0aw-i82RSMrOee78d0/view) Use consistent wording for each dimension, entity, measure and metrics. Any synonim is one too many, this causes confusion and unclarity downstream and across the organisation. #### Old tables have been cleared > [!abstract] FC45 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2. min](https://drive.google.com/file/d/130aluUDvYgiMtdOKWxW0aOz6I_xNnCqN/view) During development, models are built and removed within a few weeks. Stop the clutter. A clean reporting datawarehouse with only datasets, schemas and tables that are actually operational is key for a well needed sense of control from a data consistency operator perspective. #### Semantic layer serves data apps > [!abstract] FC49 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [5. min](https://drive.google.com/file/d/1mEI9Es_kLyBLlIwZUx_2_-2MDc4tqVGZ/view) For a consistent and single source of truth, a semantic layer should be in place. You can use tools that use data extraction jobs that pull data from your tables in your database to google sheets, looker reports etc, but this becomes chaotic and complex to operate quickly. The semantic layer is typically something you build at the end of the dev-cycle of SDA. #### Source agnostic data model > [!abstract] FC52 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [3. min](https://drive.google.com/file/d/1B3PeOAn9ZLr0gCqciNDQ1kVa_qLNFH5H/view) The data model is domain specific and does not mimic the technical source semantics. As an engineer, you create a single data source of truth, but you also need to create a language of the truth. This means that you have to decide how to call a customer. You can pick from `account_name`, `client_name`, `customer_name` etc. etc. Pick your poison and stick with it. And when you change it. Change it everywhere. #### Full source freshness > [!abstract] FC53 > Points: `4` &ensp; Framework: `SDA-1` &ensp; Video: [3. min](https://drive.google.com/file/d/1nMd4Jl3FCMsodypnT8jR4X6HUTRR23FR/view) Every source table from each source is checked on freshness with a warn and error-range. Some sources will not provide updated data every day, so a warn range of max 2 days is acceptable. The UTC Timestamp should be used where possbile. #### Source name in duplicate fields > [!abstract] FC55 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [2. min](https://drive.google.com/file/d/130aluUDvYgiMtdOKWxW0aOz6I_xNnCqN/view) When using multiple source systems for one entity, put the source name at the end. This setup is primarily easy when having all fields of a model ordered alphabetical (default) while building visualisation and/or modeling, which allows you to quickly see if a sister-field with common characteristics is present. For example `project_name_hubspot`, `project_name_float` and `project_name_exact_online`. #### Use staging as definition layer > [!abstract] FC56 > Points: `3` &ensp; Framework: `SDA-1` &ensp; Video: [3. min](https://drive.google.com/file/d/16umq8PrOzp7cahGHqWiSPfWbdmQYj4WP/view) Define all dimensions and its entities in the staging layer, not later. For rapid development, changes and de-bugging, it is best practice and most intuitive to define all your fields at the start of your transformation steps, in the staging layer. #### Staging layer has clean downstream > [!abstract] FC57 > Points: `2` &ensp; Framework: `SDA-1` &ensp; Video: [3. min](https://drive.google.com/file/d/1o39vHq7hV-hErBKhZSzgqkRxoFQ2sgA5/view) The staging layer models should only contain fields that are used downstream. When you dont do this, you might end up using dimensional fields downstream that have not been curated and through through by the data engineering team. #### Uniform extraction tracking > [!abstract] FC61 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [1. min](https://drive.google.com/file/d/1nEIBXGG6viwgpZlNA0n5xJVrR9LGFbLl/view) Apply the same field name for the timestamp of extraction. This allows you to communicate downstream how old you data is. Use for example: `extracted_at`. #### Upstream table in syntax > [!abstract] FC62 > Points: `1` &ensp; Framework: `SDA-1` &ensp; Video: [1. min](https://drive.google.com/file/d/1q92AbzjTaewdnNON2dutD5maJMOGZX69/view) The previous model name is used before each field name to track where each field originates. A SQL setup, where its not clear from which of 5-10 models a field comes from is tedious and requires a lot of work to figure out, especialy when the system has been humming for a few months and the domain model is not on top-of-mind of the respective developer. ```SQL -- Properties | Accounting projects.project_name, projects.project_code, projects.project_owner_name, ... -- Properties | Resource Planning projects_{{source_name}}.project_name_{{source_name}} projects_{{source_name}}.project_code_{{source_name}} projects_{{source_name}}.project_owner_name_{{source_name}} ... -- Measures | Accounting projects.project_budget, -- Measures | Resource Planning projects_{{source_name}}.project_budget_{{source_name}} ... FROM dim_projects as projects ``` ## Integrated Performance Framework When the data architecture is fully operational we can start building the Integrated Performance Framework on top of it. We visualise the data inside the warehouse through tools like [PowerBI](https://www.microsoft.com/en-us/power-platform/products/power-bi) or [Looker Studio](https://cloud.google.com/looker-studio), to validate that all code we put into place generates clean output data. As part of this step we embed data quality checks based on custom business logic requirements to make sure that there are no data quality errors for critical input fields. ### Internal Analytics Template The majority of clarification and documentation around the data model and its components, goes through the [[Internal Analytics]]. In this environment all relevant components are visualised, such as the `entities`, `properties`, `measures` and `metrics`. This is done in either of the three visualisation tools we work with: [[Looker Studio]], [[PowerBI]] or [[Preset]]. The Internal Analytics setup contains the following components: 1. **Analytics** is where we answer specific questions that are core to a client's specific business model. 2. **Metrics** enable deep-dives per date-range and dimensional property to make sure no anomalies take place on any aggregation level. 3. **Sources** are showing dedicated pages per source table pulled from your source systems, including multiple charts, aggregations and scorecards. 4. **Data Quality** to check on client specific data input sanity checks and business logic deviations. 5. **Settings** for providing a visualised overview of the entire architecture and data model catalog. ### IPF Framework Checks The **Integrated Performance Framework 1** has **14 framework checks** that focus on completeness of metrics, solid depth of filtering and sufficient context to all components of a performance tracking setup. #### All entities are visualised > [!abstract] FC3 Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1rbs42umcSnWhlfedqQlgu981JGXVIQdN/view) Each relevant entity that is part of the [[Semantic Nexus]] requires a dedicated analytics page. On this page you should be able to get a proper overview of all the properties of the entities and aggregations that describe the underlying demography. The most common entities across our data models are `accounts`, `projects`, `deals`, `persons` and `legal_entities` #### Consistent Design > [!abstract] FC11 Points: `4` &ensp; Framework: `IPF-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/10m7ouy7zHFfZ5V3uYfW3yenBdCft0FhV/view) Each page in the Internal Analytics Report should be consistent. When you ignore this check, the users will not get the impression they are working with a well thought through framework, but more like a patchwork of bolt-on and ad-hoc requests. When each page, each header, each positioning is consistent. It really brings home that feeling that you understand what you are looking at. #### Data freshness on all sources > [!abstract] FC14 > Points: `3` &ensp; Framework: `IPF-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/125cSxjHfMUE_Kqj9ZImraJjgDVHdA0Y-/view) Build a flow-chart in the visualisation tool with freshness per pipeline-stage. Whenever a client can see that their data is fresh and up to date, this strongly reinforces their peace of mind and trust in the data architecture and downstream reliability. ![[Frameworks Data Freshness.png]] *Data Freshness overview with tool, pipeline step, operator, deeplink and freshness in hours.* #### Visualise data pipline > [!abstract] FC16 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1rdQ2RY5u_Bk-NTiKxk5wB8QVGz4USit1/view) Show a flow chart of all software applications involved in the pipeline. An end-to-end data architecture can be rather complex to comprehend as the years pass-by and add-ons are requested and changes are made. Therefore a proper overview of all the operational components of the system is key for having that overview as executives, architect and engineers. ![[Frameworks Data Pipeline.png]] *Example data pipeline visualised by layer and data asset* #### Dedicated metric pages > [!abstract] FC20 > Points: `2` Framework: `IPF-1` Video: [3 min.](https://drive.google.com/file/d/1phidrASfEF-vKy1VxKIMKhFDixIqaYG_/view) For people to trust metrics, you need these people to be able to slice and dice this specific metric. Allowing them to understand it, based on differing dimensions, but also based on differing use cases. With this setup in place, you lay the foundation for effective team collaboration around trusted key team metrics. Therefore, each operational metrics has a dedicated page that provides the needed context for understanding and troubleshooting that metric. #### Embed definitions > [!abstract] FC22 Points: `1` &ensp; Framework: `IPF-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1I4V9Ze2iBi-GfSyUG6Hs17gSZnKT5gAf/view) Have a breakdown of all measures and counters, their definitions and deeplinks. A single overview of all measures in the data model gives the user a grasp of all moving parts and its properties. ![[Frameworks Definitions.png]] *Definitions overview with properties, source, domain, description and deeplinks* #### Each metric has an owner > [!abstract] FC26 > Points: `1` &ensp; Framework: `IPF-1` Video: [2 min.](https://drive.google.com/file/d/19xSbu7x6gfb_XJ3k9iN6eOQIfI2jANgI/view) Ownership in any organisation is key, therefore a dedicated owner per metric makes it clear for each department who is in the lead. This also holds for metrics, otherwise they become orphaned and they will get neglected. #### Exhausitve source pages > [!abstract] FC31 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1RuQVOzOvwc1tkp4Wn6Z-fc341j_tc1JE/view) All sources that are used for Metrics and Analysis pages should be present. For the Internal Analytics stack to be complete, show each individual source table so that every measure, counter or metric can be challanged and fully understood where its underlying data comes from and whether its correct. #### Full data application ownership > [!abstract] FC34 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [2 min.](https://drive.google.com/file/d/1J11ZE4paDr4PX7uqrFHcZyWLiilh4gHS/view) All sources that are used for Metrics and Analysis pages should be present. For the Internal Analytics stack to be complete, show each individual source table so that every measure, counter or metric can be challanged and fully understood where its underlying data comes from and whether its correct. #### Add legal footer to all pages > [!abstract] FC38 > Points: `1` &ensp; Framework: `IPF-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1VWeOO05Jg7ASSUAX1iXWMthWvSASYTA9/view) Each analytics visualisation page requires a legal footer. This footer contains the company name, the company slogan and 'Confidential Information'. #### Embed metric catalog > [!abstract] FC40 > Points: `1` &ensp; Framework: `IPF-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1hCEzfn_nRaW0_GlSW4WVhxG7CC2vTLcI/view) Have a list of all metrics, their critical values, targets, deeplink, and their owner. The metric catalog page provides a complete overview of all metrics that are provided by the data model. In this way the end-user gets a solid understanding what metrics are inside their data model. ![[Frameworks Metric Catalog.png]] *Metric Catalog with ownership, domains, deeplinks and bandwidths* #### Use semantics in a calculator > [!abstract] FC50 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1zSyN13LCIZxucULI4yjUruU8hwhAGbJa/view) The semantic model should be actively used in at least 1 calculator sheet. The semantic model is the single source of truth for in-company performance data. It should be used in the Analytics environment, but its operational value becomes significantly more clear, when the semantics are used in calculators. #### Show trailing measures > [!abstract] FC51 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [5 min.](https://drive.google.com/file/d/1qTQW-qKbAoZtguI5f6WGdihMOB2-f28r/view) The analytics pages should depict all relevant metrics with LTM charts. When you want to get a quick understanding of basic principles of a company, some high-over measures and ratios enable you to get a rough-around-the-edges feeling how the company is doing, or a subset of the company is doing based on used filters. ![[Frameworks LTM measures.png]] *Monthly (left) and trailing-12-month (right) revenue churn* #### The chat channel has bookmarks > [!abstract] FC58 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [2. min](https://drive.google.com/file/d/1SX0YNrztvaEGi7Fh5W-6kTEd_FctLkt0/view) Bookmarks to the resources used in the architecture enable more control. Often, clients are not as interested in the inner workings of the data-architecture team, especially when things are going smoothly. Having these bookmarks in the 1 place any issues or discussions occur is key to have a single place where practically everything occurs. #### Targets should be time dependent > [!abstract] FC60 > Points: `2` &ensp; Framework: `IPF-1` &ensp; Video: [3 min.](https://drive.google.com/file/d/1-njzm7CTJ6SZeEhmOx4vyWUfWvvmys6V/view) Effective performance tracking requires granular and contextual target information, therefore the most important metrics and measure targets should differ between time periods. Usually targets are set per quarter, year, or are dynamic in nature which allows them to align with company OKRs and technology roadmaps. ## Modular Security Protocol All our engineering projects are aimed at designing, building and maintaining client-side applications. Our vendor-agnostic modular approach to data architecture design usually results in a larger set of applications and vendors than a more single-vendor monolithic design architecture. ### Client-side vaulted credentials Some clients have their own vaulted application access protocol, which requires the engineering company’s own vault to contain the access credentials for this client-side vault. Getting access to a client-side application through this setup is more time-consuming on a day-to-day basis, but is also more simple to understand and therefore is a sensible setup from a access-governance perspective. ```mermaid graph TD %% Personal Accounts ACC["Vault Access Credentials"] --> V1["Client Vault Secret Engineering-side"] V1 --> V2["Client Application Secrets Client-side"] V2 --> A1["Personal Account 1 Client-specific"] V2 --> AX["Infrastructure Service Account Client-specific"] %% Client Applications A1 --> APP1["Client App 1"] A1 --> APP2["Client App 2"] %% Styling classDef access fill:#f2f2f2,stroke:#666666 classDef account fill:#b3e0ff,stroke:#0066cc classDef iam fill:#ffcc99,stroke:#ff9933 classDef vault fill:#f3e6ff,stroke:#9933ff classDef app fill:#d9f2d9,stroke:#339933 class ACC access class A1,A2,A3,A4,A5,A6,A7,AX account class V1,V2 vault class APP1,APP2,APP3,APP4,APP5,APP6,APP7 app class I iam ``` ### Client-side IAM The client has setup an IAM architecture that control the access of individual personal accounts into all client applications. Through this setup, all logins are centrally managed by the IT department of the client, which makes access management more scalable and more secure. For some applications the direct access setup is still required, since not all security access can always be structured through a centralised IAM. ```mermaid graph TD %% Personal Accounts A2["Personal Account 2 Client-specific"] --> I["Identity and Access Management Client-side"] A3["Personal Account 3 Client-specific"] --> I %% Client Applications I --> APP3["Client App 3"] I --> APP4["Client App 4"] I --> APP5["Client App 5"] %% Styling classDef access fill:#f2f2f2,stroke:#666666 classDef account fill:#b3e0ff,stroke:#0066cc classDef iam fill:#ffcc99,stroke:#ff9933 classDef vault fill:#f3e6ff,stroke:#9933ff classDef app fill:#d9f2d9,stroke:#339933 class ACC access class A1,A2,A3,A4,A5,A6,A7,AX account class V1,V2 vault class APP1,APP2,APP3,APP4,APP5,APP6,APP7 app class I iam ``` ### Our access protocol Our application access protocol across all our clients is built on the following principles: 1. All personal accounts access should be centrally managed through a client-side IAM architecture. 2. All personal accounts that are not part of the IAM architecture, should be part of the double vaulted application secret setup. 3. All service accounts access secrets are stored in client-side vaults, with engineering-side vault access. ### MSP Framework checks The above security protocols are put in place and audited based on the following checks: #### Security file for each client > [!abstract] FC47 > Points: `3` &ensp; Framework: `MSP-1` &ensp; Video: [-] (absent) The access credentials encyrpted database should be deployed. These encryption vaults allow the creation of a file that contains all relevant keys to access the relevant systems for the data & analytics team members. #### No passwords on desktops > [!abstract] FC64 > Points: `1` &ensp; Framework: `MSP-1` &ensp; Video: [-] (absent) There should be no passwords saved on any filed or document on a personal desktop. Whenever any of the desktops is compromised, there should be no passwords lingering around on the desktop, documents or downloads folder. #### All credentials in one vault > [!abstract] FC65 > Points: `2` &ensp; Framework: `MSP-1` &ensp; Video: [-] (absent) There should be no passwords stored in a personal or second vault. All credentials should be centralised per architecture, so to make sure the entire authorisation process is fast and seemless.