The way of working (part 2) in the era of Microsoft Fabric - The location of the semantic model - minceddata

# The way of working (part 2) in the era of Microsoft Fabric - The location of the semantic model #Security #Architecture #Governance #WayOfWorking #MicrosoftFabric #OnelLake #SemanticModel #Shortcuts #Lakehouse Series information: This is part two of a series of articles that focus on the way of working with Power BI and Microsoft Fabric: + Part one: [[The separation of data and content - why this is the way to go (part 1)]] + Part two: this article Since the introduction of the new workspace experience (workspace V2) in April 2019, I recommend the separation of content and data, meaning using two pbix files and two workspaces (read the reasoning behind this approach here: https://publish.obsidian.md/minceddata/Blog/Way+of+working/The+separation+of+data+and+content+-+why+this+is+the+way+to+go+(part+1) and here: https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-implementation-planning-introduction. I know this pbip thingy is out there, but I want to keep things simple here. When Microsoft Fabric went GA in November 2023, our Power BI world became more complex. This article is about the semantic model's location, which I mean the workspace where it is placed. In this article, I will delve deeper into the location of a semantic model and how this location affects the "mechanics" that make the model accessible to Power BI app users and report creators when the model utilizes Microsoft Fabric's new connection mode: direct lake. **In short:** the Direct lake connections mode allows querying delta tables in a lakehouse without importing the data while providing similar query performance and DAX compatibility of imprt mode models. This article discusses organizing items (of type: semantic model) inside our beloved Fabric tenant (fka Power BI tenant) to create a solution. # What is not inside this article This article does not discuss the performance implications of choosing the Direct lake connection mode over the other two connection modes, import and Direct query. I also do not explain the importance of relationships, proper DAX statements, or everything else that makes a model great. # Inside the box For whatever reason this article became somewhat lengthy (once again), for this reason this chapter provides an overview what you will encounter when you read on + [[#The application of Row Level Security (RLS) and Object Level Security (OLS)]] + Though not tied to the Direct lake connection mode I want to answer the question "Do Direct lake semantic models Row Level and Object Level securit?" I encounter this question quite often, either at communtiy.powerbi.com or people are reaching out to me on linkedin or X + [[#OneLake Data Access Roles]]# + OneLake Data Access Roles will help us a lot when we are sharing the lakehouse with teams further downstream. + [[#The data source of semantic models utilizing the Direct lake connection mode]] + The data source of a semantic model in Direct lake connection mode is special, at least from my perspective and for this reason the data source has a dedicated chapter. + [[#Owning the semantic model]] + Even if I focus on the location (the workspace) I want to introduce the term ownership (I propaply will stress this term in subsequent articles in this mini series of blog posts) and how this will affect the "mechanics" of how we (the ower of the semantic model) make the model accessible. + [[#Summary]] + Of course, a little summary, this will contain my reasoning why I consider it a good a idea to use three workspaces when creating a solution + [[#References]] + All the sources I consider useful # The application of Row Level Security (RLS) and Object Level Security (OLS) The question: "Do semantic models support RLS and OLS?" The simple answer: Yes It's (almost) identical to the support of Row Level and Object Level Security with Direct query models. Security (Row level and Object level security) is implemented in the semantic model. Creating roles in Power Bi Desktop or the web editing experience and adding groups (use groups instead of individuals) to these roles is not affected by the connection mode of the semantic model. From my experience, semantic models without RLS are still common (even sometimes, I think it's security by obscurity 😉). Sometimes, these models are accessible only by an app with no build permissions granted on them. Sometimes, multiple audiences are defined using different reports based on the same model. Of course, this works and keeps your data safe. However, when the number of users grows, they will likely want to build their own reports. This is when we realize that a one-workspace architecture is insufficient. For this reason, I always recommend separating the semantic model (the data) and the data visualization (the content) into two workspaces: one that holds the model and one that has the reports, which create our insightful data visualizations. I call these two workspaces the data workspace and content workspace. The reports with the data visualizations are now called thin reports; they do not connect to the underlying data source but to the semantic model in the data workspace. This architecture ensures that RLS and OLS are applied when users start building their reports; of course, these users **must** not be part of the data workspace and need build permission on the semantic model. The following image shows this architecture. ![[wow - model owner - data engineering team owns the semantic model.png]] The image above shows three different workspaces, the data workspace, and two content workspaces. Two different content workspaces illustrate the concept of data and content separation. A final word about implementing RLS and OLS: With the advent of Fabric, there is a second place where you can implement RLS and OLS: the SQL analytics endpoint. You can do this by using GRANT and DENY, which is basically everything you know from SQL Server. **Be aware:** But (of course) when you implement RLS/OLS at the SQL analytics endpoint, you must be aware that the direct lake connection turns into a direct query connection automatically. # OneLake Data Access Roles OneLake Data Access Roles will help us share data (delta tables) with teams further downstream, turn data into insights, and make fact-based decisions. The section on [[#Granting access to the data engineering lakehouse to the reporting data team]] describes how a data access role will be implemented. However, Data Access Roles need to be enabled per lakehouse. When enabled, a default role, DefaultReader, is created. This role provides access to all the data of the lakehouse. My recommendation is that as soon as you enable Data Access Roles, delete this role; otherwise, there might be situations where teams will see more than expected. # The data source of semantic models utilizing the Direct lake connection mode The Direct lake connection mode uses the lakehouse's delta tables as its underlying data. This is similar to semantic models that leverage the Direct query connection mode. In the following sections, I will delve deeper into this data source and discuss how we can ensure that app users and report creators can access its data, the delta tables in the lakehouse. When inspecting the default semantic model (the settings), one thing stands out: there is no "Gateway and cloud connection" section. When looking at the custom semantic model, we see that the data source type is SQLServer and that it maps to "Default: Single-Sign-On (Entra ID)." ## The SQLServer data source The following image shows the content of the workspace ContosoDataEngineering: ![[wow - model owner - dataengineering content of the workspace.png]] Whenever a lakehouse is created inside a workspace, it has a default semantic model (https://learn.microsoft.com/en-us/fabric/data-warehouse/semantic-models) and an SQL analytics endpoint. **In short**: the SQL analytics endpoint is > *a SQL-based experience for lakehouse Delta tables. This SQL-based experience is called the SQL analytics endpoint. You can analyze data in Delta tables using T-SQL language, save functions, generate views, and apply SQL security.* > (https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-sql-analytics-endpoint) Sample delta tables were created when I created my first lakehouse, and my first semantic model was built in direct lake mode. I can not remember what I was expecting regarding the data source type of the model in direct lake mode. Now, after quite some time, I love the fact that I can use T-SQL. Also, I usually use Python (PySpark) when interacting with the lakehouse. The workspace also contains a custom semantic model called "SimpleSales_OwnedByDataEngineering." This semantic model is not exceptional, except that it is used by the reports forming the app inside the ContosoReportingOnly workspace. A best practice has emerged not to use the default but a custom model. This is only one link that recommends a custom semantic model over the default semantic model: (https://radacad.com/power-bi-default-semantic-model-or-custom-a-guide-for-using-in-fabric-environment) I have to make this custom semantic model (connected to the source data via the connection mode Direct lake), at least the data, accessible to app consumers. This is required; otherwise, app consumers will not have access to the data. ## The cloud connection that maps the semantic model to the lakehouse I view a connection as a tunnel that provides access to the source data, fueling the reports with data. When we consider what "Single-Sign-On (Entra ID)" means, we realize that we (the Data Engineering team) need to provide access to all the delta tables for all the app users and, of course, for all report creators. The following image shows a different connection, not the default one. ![[wow - model owner - datasource of a semantic model in direct lake mode.png]] ## How a "Fixed identity" makes life more simple Instead of granting permissions to all app users and report creators access to the underlying data source (the delta tables), there is a very simple way: using a fixed identity. Using a fixed identity is also a recommendation from Microsoft: https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-fixed-identity A fixed identity is a service principal used to log in on behalf of the app users and the report creators. The following image shows the connection details of the above connection: ![[All the images/Way of working - workspace in the era of direct lake/wow - model owner - connection details of the fixed identiy.png]] A connection is a tunnel, whether a data gateway connection (utilizing an on-premises data gateway) or, like in this example, a cloud connection. It is a safe passageway from the semantic model to the underlying data source. Selecting the connection in the "Maps to" dropdown of the semantic model's connection marks one entrance to this tunnel. The second entrance (or an exit?) needs to be defined. Because we are the data engineers, we own everything (at least in this scenario). We can add the service principal as a viewer to our workspace; everything works like a charm. The following images show the service principal as a viewer the inherited access rights (by the viewer role) to the SQL analytics endpoint: The service principal as a viewer of the DataEngineering workspace: ![[wow - model owner - Service Principal as App Viewer.png]] The inherited access permissions to the lakehouse's SQL analytics endpoint: ![[wow - model owner - inherited permissions to the lakehouse of the viewer role.png]] ## The owner of the tunnel called "connection" I (the Admin of the DataEngineering workspace) am accountable (maybe not) and responsible (absolutely) for making data accessible to all parties who have a justifiable interest in the data. I can ensure this because I: + Own the service principal + Create the connection (leveraging the service principal) + Assign the service principal to the viewer role of the workspace + Map the connection to the semantic model + Assign the build permission to a security group that comprises all the report creators + Assign a security group that comprises all the app users to an audience of the app With all the above, it's simple to create a solution that is accessible to all app users and provides all report creators with access to the lakehouse's delta tables. Unfortunately, sometimes (maybe most often), the Fabric world is not that simple. In the following chapters, I will introduce an architecture that we will probably see more often in the not-so-distant future. # Owning the semantic model With all of Fabric's capabilities, I predict the following: + there will be more data teams inside an organization, especially when the organization is adhering to the concept of the Data Mesh (another different story). + one data team creates delta tables that fuel multiple semantic models, which will most likely be owned by different "reporting teams." I think there is a clear distinction between the architectures (regarding the ownership of the semantic model) we will encounter or build ourselves. The first of these architectures is explained in the preceding chapters. This architecture can be characterized by one fact: the data engineering team owns the semantic model(s), not only the lakehouse. The other architectural type is simple: the semantic model is not owned by the lakehouse team (the team that creates the delta tables); instead, it is a different team. In the following chapters, I will address the requirements of the 2nd architecture type and summarize the architecture type "data engineering owns semantic model." ## The data engineering team owns the semantic model This approach is closest to many existing solutions, following the approach of separating the data from the content. ## The reporting team owns the semantic model This approach will become the new standard, whether existing Power BI teams embark on their Fabric journey or organizations join the Fabric train from scratch. This approach honors existing skills across various teams and mitigates the risk of accidentally harming an item. The data engineering team will be able to utilize all the powerful data engineering components of Microsoft Fabric, like Data Pipelines and, of course, notebooks. This approach allows data engineering to serve independent reporting teams. In contrast, nothing will change for the reporting team (at least not much). The following image shows the involved workspaces: ![[wow - model owner - the reporting team owns the semantic model.png]] How does medallion architecture fit into the above picture? This article is not about medallion architecture. Here, I explain how the semantic model's owner can connect it to the data in the data engineering team's lakehouse. For this reason, the lakehouse of the data engineering must be assumed to be the gold layer 😉 **Note:** The pink(ish) arrow makes the lakehouse data accessible to app consumers and report creators. The following image shows a slight extension of the above picture and the pink(ish) arrow is replaced by a shortcut. The picture shows that the data engineering team creates more delta tables than used by the reporting team and that a lakehouse is added to the workspace "ContosoReportingData." ![[wow - model owner - the reporting team owns the semantic model - shortcut.png]] The reporting team, when familiar with Power BI, is confronted with a new item, the lakehouse, inside their workspace. The lakehouse in the ContosoReportingData workspace becomes the data source of the semantic model, and it may be one of the many data sources. This is not that new because there have been data sources before, and there will be data sources in the future. Of course, the reporting data team can utilize the lakehouse as their skills grow over time and not only use it to gain access to tables from the lakehouse owned by the data engineering team. ## Creating a shortcut pointing to the lakehouse of the data engineering team One of Fabric's most epic features is the shortcut feature. Shortcuts allow us to use a single copy of data for many workloads or use cases. We can also reuse data that exists outside of our Fabric tenant, e.g., Amazon S3, or Google's GCS (you can read more about shortcuts at https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts). Data at the target of a shortcut (some might still argue it's the source, including me) must contain delta tables, if this is the case, the magic begins. When someone from the ContosoReportingData team tries to create a shortcut (the source of a shortcut) pointing to the data engineering team's lakehouse; they will realize it can not be selected as a data source. Rethinking this approach, we will recognize that this is expected because, until now, the data engineering team has not granted permissions to the reporting data team. ### Granting access to the data engineering lakehouse to the reporting data team These are the steps necessary to provide the required access to the lakehouse of the data engineering team to the reporting team: 1. The **reporting team** creates an Azure Entra ID security group that contains all the users of the ContosoReportingData workspace, meaning all the Admins, Members, Contributors, and Viewers. 2. The **data engineering** team grants this group read permission to access their lakehouse. This step needs some attention, as it is required to enable the option "Read all SQL endpoint data." 3. The **data engineering** team limits access to a subset of the tables leveraging OneLake access roles I do not describe step one here. This step is described in this article: https://learn.microsoft.com/en-us/entra/fundamentals/how-to-manage-groups #### Step 2a: Granting access to the lakehouse The following images show how the **data engineering** teams grants read permission to the lakehouse. The following images show read permissions are granted: ![[wow - model owner - granting access to lakehouse.png]] #### Step2b: Providing access to the SQL analytics endpoint The following image shows that the security group "contoso_reportingdata" is selected, but in addition to that also the option "Read all SQL endpoint data" is selected. ![[wow - model owner - granting access to the lakehouse and the SQL analytics endpoint.png]] Granting access to the SQL analytics to the endpoint is required as this will be the data source of the semantic model owned by the reporting data team. Referring to the tunnel analogy from above, when thinking about a connection, I also imagine a shortcut as a tunnel or safe passageway. The SQL analytics endpoint will be one part of the tunnel called shortcut. This tunnel needs to be created to enable app consumers and report creators who are not part of the ReportingDataOnly workspace to gain access to data in the underlying lakehouse (owned by the data engineering team). Looking closely, read access is granted to all the data in the SQL analytics endpoint. This must not happen. We (the data engineering team) must ensure that the "ContosoReportingData" team only has access to the required data. This access can be negotiated and granted based on a data contract between the DataEngineering and ReportingData team (a data contract is worth another article) #### Limiting access to the reporting data team The DataEngineering team can limit access to delta tables by creating a OneLake role and configuring the delta tables that this role can access. The creations and configuration of a OneLake access role is described in this article: https://learn.microsoft.com/en-us/fabric/onelake/security/get-started-data-access-roles The following image shows the creation of the role "roleContosoReportingDataSalesNotThatSimple": ![[wow - model owner - OneLake access role.png]] The following image shows how the security group "contoso_reportingdata" is assigned to the role: ![[wow - model owner - assigning a security group to the OneLake data access role.png]] # Summary It's essential to remember that entering a shortcut means that the shortcut owner will be used for authentication via the target, not the caller (in this example, the app user). The following image contains the tasks of the two teams, the DataEngineering and ReportingDataOnly teams ([[#The reporting team owns the semantic model]]): ![[wow - model owner - all tasks.png]] Whenever a team creates a semantic model utilizing Fabric's Direct lake connection model, another level of separation should be used by separating the lakehouse (storing the data) from the semantic model. The reasons are: + Mitigating the risk of accidentally harming an item due to the growing number of users weaving a Fabric solution + Making a clear distinction of tasks (and for this reason of accountability and responsibility) + Avoiding adverse effects like throttling by placing the workspaces in different Fabric capacities # References I listed external and internal links below, pointing to additional and more detailed documents. I use Obsidian as a 2nd brain but also to publish "articles" (like this one), so this article can contain references to notes I already published, like part 1, or notes inside my Obsidian vault that I did not publish now and probably will never do. If you hit one of these "private" internal links inside this article, you will receive the message that the document is unavailable. These "private" references are marked in the section (Internal references). ## External references + The default semantic model: https://learn.microsoft.com/en-us/fabric/data-warehouse/semantic-models + Model data in the default semantic model https://learn.microsoft.com/en-us/fabric/data-warehouse/default-power-bi-semantic-model + Reza Rad recommends the custom semantic model over the default semantic model: https://radacad.com/power-bi-default-semantic-model-or-custom-a-guide-for-using-in-fabric-environment + The SQL analytics endpoint of the lakehouse: https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-sql-analytics-endpoint + Fixed Identities for semantic models with RLS/OLS (blog announcement 2023-11-15):https://blog.fabric.microsoft.com/en-us/blog/microsoft-fabric-november-2023-update/#post-25061-_Toc150157956 + Microsoft learn https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-fixed-identity + Cloud connections API: https://blog.fabric.microsoft.com/en-gb/blog/announcing-the-availability-of-rest-apis-for-connections-and-gateways-in-microsoft-fabric?ft=All + Introduction to Microsoft Fabric's feature called shortcut: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts + Create Microsoft Entra ID security groups: https://learn.microsoft.com/en-us/entra/fundamentals/how-to-manage-groups + Role Based Access Control to a subset of the delta tables in the lakehouse + Get Started with Data Access Roles in OneLake https://learn.microsoft.com/en-us/fabric/onelake/security/get-started-data-access-roles + Best Practices for OneLake Security https://learn.microsoft.com/en-us/fabric/onelake/security/best-practices-secure-data-in-onelake + Data Access Roles APIs https://blog.fabric.microsoft.com/en-us/blog/onelake-data-access-roles-apis-announcement?ft=All ## Internal references + Part 1 of this series about the Way of working using Fabric [[The separation of data and content - why this is the way to go (part 1)]] + (private) The Excalidraw drawing [[Microsoft Fabric - the Direct Lake connection mode and the workspace]]