learning Deneb - minceddata - Obsidian Publish

# Learning Deneb (or is it Vega or Vega-Lite) I used Deneb (or, being precise, Vega-Lite and Vega) for data visualization tasks since the release of Deneb. This document is the result of what I learned. I will use this document as a reference for creating simple (maybe more complex) and more advanced data visualizations inside Power BI. If you are not familiar with Deneb, you should start here: https://deneb-viz.github.io/ Using Vega and Vega-Lite means that there will be some Java Script-ish syntax. This, then, can present a unique challenge, as it's a departure from the programming language I'm more accustomed to. As a result, even simple tasks can become complex due to the need to effectively 'phrase' my thoughts in this new language, being precise it's a declarative language that tells Vega-Lite and Vega what to do. >**Dont't be afraid** We do no program our visuals. We describe how they should look like😀 In one of my latest projects, I was creating a scatter plot. One of the iterations is shown in the following image: ![[learning Deneb - a more complex scatter plot (Vega).png]] Deneb is a very powerful custom visual. Deneb does not provide a solution for a particular data visualization task; instead, it provides a framework that allows the creation of almost any data visualization needed to communicate insights with the users of the Power BI reports. During the last few years, I encountered various requirements for specific data visualizations that the default visuals have not met - *yet*. It's possible to use #pbicorevisuals to search for Power BI's default visuals on different social media platforms, e.g., at linkedin.com, and see how the pbicorevisuals have changed over time and, more importantly, what's coming. The pbicorevisuals have changed since I started learning Deneb. Sometimes, these requirements could have been described as minor additions to how information is presented; sometimes, these requirements could only be met by creating R or Python script visuals. Sometimes, I consider creating a new custom visual. All of these custom approaches have their downsides; no matter how powerful packages like ggplot2 or seaborn are, data visualizations using a script visual (R or Python-based visualizations) lack the interactivity (cross-filtering and the feature tooltip) the users of Power BI are used to. Creating tailor-made custom visuals requires additional tools like Visual Studio Code for developing the custom visual. The incredible freedom of creating a dedicated custom visual that fits the needs exactly comes with a price tag, maintenance, and the integration of additional functionality, meaning new features. Next, I only know a few business users who are familiar with the required development skills. Deneb is different because it provides a framework where I, the report author, can focus on the data visualization task without needing additional tools. When I say Deneb provides a framework where almost any data visualization can be created, this also implies that certain tasks can not be solved. This is true for two reasons. The first reason, of course, is simple: it's simply because of my lack of knowledge, but this is what this repo is all about: sharing my learning experience. The second reason is a more complex one. Deneb packages two Javascript libraries used for data visualization: Vega-Lite and Vega. Specific data visualization tasks, such as mapping data to spatial aesthetics, can not be solved alone with the above-mentioned libraries. Often, spatial data mapping requires communication with external services, e.g., mapping services that provide maps. From an enterprise perspective, it is more critical that Deneb is certified than to be able to communicate with third-party services. But besides that, of course, it's possible to create spatial visualizations as soon as the map is passed as a GeoJSON object. Deneb allows specifying how the data should be visualized using a JSON notification. This specification is then used to render a data visual that supports cross-filter and tooltip capabilities out of the box, thanks to the Deneb custom visual inventor. While this may seem straightforward, the Vega-Lite and Vega library offer numerous ways to manipulate and create new data using transforms and expressions. However, expressing these manipulations in a syntax, I must familiarize myself with is sometimes challenging. I find the words of Alberto Ferrari and Marco Russo about DAX also apply to Vega-Lite and Vega: Using these libraries is simple but not easy. My experience with Deneb has convinced me of its potential. I am committed to mastering Deneb as a valuable tool for creating custom visuals. The possibilities that Deneb opens up are exciting, and it will be a beneficial addition to my data visualization toolkit. I will use this article and the git repo to constantly share my findings because now and then, I'm working on data visualizations that I can not create using the Power BI core visuals. The next chapters will describe the specification's topology, including sections like mark, encoding, and layers, as well as scales (a topological element of Vega that is not present in Vega-Lite. I'm writing this article not to explain how to use Deneb within Power BI; Deneb's official website ([introduction | Deneb (deneb-viz.github.io)](https://deneb-viz.github.io/)) is doing a great job. I'm writing this because I will use it as a reference for future data visualization tasks, hoping to develop/compose future visualization tasks faster. Finding out things while tackling a current challenge is one thing, but tackling a future challenge is a different story. The reason: I do not remember the syntax of writing a conditional check to decide on the color of a text mark or why sometimes it's sufficient (or better) to pass a value directly to a key and why sometimes it's necessary "wrap" this value into this: ``` {"value": 42} ``` The coming chapters are + How to use this document and what to expect in the next weeks maybe months + The sample file ([[#The sample file]]) and how to get started + Vega or Vega-Lite ([[#Vega or Vega-Lite]]) + The specification (it's a simple JSON document) ([[#The specification, or a simple JSON document]]) + What is a view, and why it'S important. This is about one of the building blocks to create insightful visualizations ([[#What is a view - and why it's important]]) + Axis, Channels, and Scales (Scales are more relevant to Vega) [[#Axes, Channels, and Scales]] + Parameters and Signals, from simple constant values to user interactions ([[#Parameters and Signals, from simple constant values to user interactions]]) + Data, there is more to know than passing columns and measures to Deneb ([[#Data]]) + Transform operations ([[#Transform]]) + Marks, mapping data to geometric objects [[#The mark]] + Encodings, mapping data to visual perceivable channels, like color, size, and position, the things that are called aesthetics ([[#The encoding]]) + Expressions ([[#Expressions, pluck, and all the other powerful functions]]) + More complex examples ([[#More complex examples]]) + Templates ([[#The templates]]) + Simple things that are only simple if you know them ([[#Simple things that are only simple if you know them]]) + Quick and dirty, the copy-and-paste section [[#Quick and dirty - the copy and paste chapter]] + Resources, where Í look for more complex Vega and Vega-Lite implementations and to get inspired # How to use this document You do not need to read this document from start to end. It's as a reference. This means I go here and do some reading whenever I struggle with an aspect while creating/composing/developing a Deneb (Vega/Vega-Lite) visual. If I tackle a challenge and consider the underlying problem more generally, I will add this to one of the available chapters or create a new one. It makes sense to start with the sample file. The report pages are ordered from left to right, from very simple to more complex. Almost every report page is a textbox explaining its purpose. # The sample file and how to get started You will find the sample file "learning Deneb - Contoso 1M - git" in my git repository: [GitHub - tomatminceddata/learningdeneb](https://github.com/tomatminceddata/learningdeneb) Initially, I downloaded the pbix from here: [Release Ready to use data · sql-bi/Contoso-Data-Generator-V2-Data · GitHub](https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data) I used the "pix-1M.7z" file and renamed it. I only added a very simple table, "simpleData." When you read this document, I expect you have Deneb installed and know how to get to the specifications if you want to look closer at the specifications (or specs). If not, go here: [introduction | Deneb (deneb-viz.github.io)](https://deneb-viz.github.io/) I recommend reading these articles from the Deneb introduction: + Getting Started + Visual Editor # Vega or Vega-Lite I've been using Deneb, meaning Vega or Vega-Lite, for quite some time, but I still can not answer the question, "Where should I start? Should I learn Vega-Lite or Vega?" A Vega-Lite specification is compiled/transpiled to a Vega specification. This means that concepts/features that are available with Vega-Lite do not necessarily exist within Vega. One example is Vega-Lite's ability to aggregate data inside the encoding block. The below Vega-Lite spec makes this clear: ``` { "data": { "name": "dataset" }, "layer": [ { "description": "the bars", "mark": { "type": "bar" }, "encoding": { "y": { "field": "brand", "type": "nominal" }, "x": { "field": "Sales", "type": "quantitative", "aggregate": "sum" } } }, { "description": "the points", "mark": { "type": "circle", "color": "aliceblue" }, "encoding": { "y": { "field": "brand", "type": "nominal" }, "x": { "field": "Sales", "type": "quantitative" }, "size": {"field": "Sales", "type": "quantitative"} } } ] } ``` The table dataset contains 6 rows. If you need all the rows, e.g. if you want to add points to show the detailed values, it's possible to use an aggregate function in the mark's encoding block that defines the bars, e.g., ```sum```, and use all the data for the points. The above spec creates the visualization below (you will find this on the report page: learn - a bar chart with points - a layered view (Vega-Lite) ![[learning Deneb - Vega or Vega-Lite - the layered view (vega-Lite).png]] The spec of the above visual is based on a layered view. Two marks (bars and points) are layered on each other. The next chapter [[#View composition - one of the building blocks of vega/vega-lite]] is delving deeper into the composition of views, meaning a data visualization. Of course, we can create the same visual using Vega (see the report page "l", but our thinking has to be different. ![[learning Deneb - Vega or Vega-Lite - the layered view Vega.png]] If you look closely, you will see some minor differences, though, these differences are: + title of the axes Vega-Lite generates titles for the axis based on the encoding; even the aggregate is considered and capitalized. Vega does not generate these titles. The titles for the axis are defined in the axes definition. + gridlines Vega-Lite creates grid lines automatically; Vega does not. + the opacity of the circles Vega-Lite adds opacity to the points; Vega does not. You can make both visuals look the same; additional "coding" is required. Currently, I lean towards Vega because I want to have a deeper understanding of ```scales```. Next to that, Vega-Lite only offers some of the transform operations that are available with Vega. No matter what we use, we can create unique data visualizations. Also, we can create great data visualizations using Vega-Lite, e.g., the visualization on the report page: "working - when geom objects convey meaning" (at least, I consider it "great"). # The specification, or a simple JSON document For the first time, a JSON document appeared in this document in the previous chapter, and I can tell: there will be more JSON. The Javascript libraries Vega and Vega-Lite are doing the heavy lifting of visualizing my data; I only provide a JSON document to these libraries. Basically, it's not me who is passing the JSON document to one of the libraries, and I also do not have to take care that the libraries are properly registered within Power BI; all of this is done by Deneb and more things like adding features (commands) like "pbiFormat." Even if Deneb allows me to focus on the data visualization task, knowing about JSON documents is helpful. The JSON document I create that decides whether my data visualization looks weird or renders the data and presents it in a meaningful and impactful way is called specification. The magic of creating an insightful data visualization is that the libraries understand the JSON document. For this reason, Vega and Vega-Lite can be considered declarative libraries; I declare how the visualization should look. This chapter is about the general structure of a JSON document. You will find more information about JSON here: [What is JSON (w3schools.com)](https://www.w3schools.com/whatis/whatis_json.asp) This means this document focuses on what I consider necessary to create a working spec. A spec is based on records and arrays. A record is defined by one or more key:value pairs. Records of the same type form an array. Square brackets define an array, and curly braces define a record. The code below shows a spec that creates a simple column chart. ``` { "data": { "name": "dataset" }, "mark": { "type": "circle" }, "encoding": { "y": { "field": "brand", "type": "nominal" }, "x": { "field": "Sales", "type": "quantitative" }, "size": {"value": 200} } } ``` There are 3 keys at the top level of the document, meaning there are 3 records. These records are named: + data + mark + encoding The record ```data``` references the named table dataset. By default, all the columns and measures we pass to Deneb form a single table named dataset. The most fun are the ```mark``` and the ```encoding``` blocks. Looking at the ```encoding``` block, we will realize that this block is a record containing more records. The first record is called "y." This record contains two key:value pairs, the first pair is ```"field": "brand"```. I will leave it here with this short introduction to a JSON document, basically, it's a document formed by curly braces and key:value pairs. Of course, records can contain records, that contain records, ... # What is a view - and why it's important A view is a window that allows me to see data being visualized. > **Note** A quote from the official Vega-Lite documentation: "... a [single view](https://vega.github.io/vega-lite/docs/spec.html#single), which describes a view that uses a single [mark type](https://vega.github.io/vega-lite/docs/mark.html) to visualize the data." taken from here: [Vega-Lite View Specification | Vega-Lite](https://vega.github.io/vega-lite/docs/spec.html) on 2024-09-04 6:45 PM (CEST) This chapter is essential because composing data visualization is essential in Vega-Lite and Vega. And it helps me a lot before I start implementing a data visualization. Reading about view composition at this early stage might be confusing without knowing about Deneb, Vega, and Vega-Lite. I will omit as many specifics as possible. Composing the view helps me to create a picture of what I want to do without knowing how to do this exactly. Of course, the more I know, the better my compositions, but this is a good place for this chapter. When I learn something new, I always try to understand the building blocks at the foundation of the "new" thing. Like a subscription, a resource, or a service principal in Azure. Like the evaluation context, extended tables, or filter propagation in DAX. Like the concept of a lakehouse in modern data warehousing. You might disagree with what I consider building blocks so far, then consider the things in this section just being simple notes I stare at when I have to understand why something is not working as expected. On my journey learning Vega / Vega-lite, I consider a view one of these building blocks. The following sections describe how I understand a view and how I created the tiny table shown in the following image. Do not be afraid; this document is not about creating an ordinary table (it's not that ordinary, though). I will use the fancy stacked bar chart most of the time (to be honest, one of my favorite data visualization types). ![the table - more than the obvious](https://github.com/tomatminceddata/learningdeneb/assets/29025119/cccb3bf7-d4ab-4519-8160-de1d29aa60e9) The image above was inspired by this question on stackoverflow: https://stackoverflow.com/questions/75975968/table-visual-with-customized-grid-layout-in-deneb-vega-lite/76277789?noredirect=1#comment134512446_76277789 Here you can have a look at the [vega-lite specification](https://vega.github.io/editor/#/gist/db3a824af2ef0b6e6204d1a3ee56d6f4/TableAndVerticalConcat.json) When looking at a data visualization, I'm often one of two personas: either a reader who wants to learn from the visualization and decide about the next actions, or I'm a creator, meaning I create a data viz that others will read. When I'm a creator and looking at a visualization that I did not make myself, I often wonder how the visualization was created. Sometimes I consider this a training or a mental exercise - decomposing a data visualization into the ingredients I know, meaning Vega-Lite or Vega. >**Note** The most straightforward view is a single-view data visualization. The official documentation about view composition: [Composing Layered & Multi-view Plots | Vega-Lite](https://vega.github.io/vega-lite/docs/composition.html) ## The single-view data visualization Thinking Vega/Vega-lite - the specification of a single-view contains only one mark without any view composing functions like ```facet```, ```repeat```, or``` hconcat```. If you have yet to learn what the former functions are good for, bear with me. I will explain these functions (at least two of them) in one of the following sections. The following gist provides some simple data and a spec. The data and the spec become more "complex" as you read along. Please be aware that the specs do not represent best practice advice in data visualization. The following sections cover only the technical aspects of creating data visualizations or composing a view. [The gist points you to a specification of a single view stacked bar chart:](https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneBtUATgPYDuA5sWgA7wgCSIANCFhmQLYYB2UtAwkxDIU8AEwAGSVIC+jQqQpFqtBs1YduvOCABCg4fADsUmXJDFylGtrqqWbTj1oBBfangAWE5NnzLS63o7dUctXTcROABGb3FfcwUrFWCHTVoAEQixWOkAXVkQdggCAGt4UCgATyoMWgAjYpAC7gBjIjAASy4ycpBK3oAzDowUMFoLRWVmKpraLiJ2LvdmCAAPDpwEECgOqBRauC40FBRpAtXB4dHaA2nqg5AARzQIHl3oDsRalfXNit39vAjiczsw2igiARLiMxtoQmk7rNtPNFlx3GdpEA) And, of course, the image: <img width="184" alt="view composition - single view" src="https://github.com/tomatminceddata/learningdeneb/assets/29025119/01f5f438-cd2e-4944-b31d-9d43316c0c69"> If you have a closer look at the specification, you will see that there is nothing special about it, it's simple, it's this: ``` { "data": { ... }, "mark": { ... }, "encoding": { ... } } ``` When you use analytical tools with built-in data visualization capabilities, as I do, I use Power BI; it seems very easy to show data labels for the segments and the total value of the entire stack. I consider I and II a stack and A, B, C, and D segments. This is not that easy when using vega/vega-lite. The reason for this is also simple. There is no UI where we can turn on the setting "data labels." Instead, we have to start thinking in compositions of marks forming a data visualization. For this reason, we must understand what a mark is. Here is my simple definition of a mark: >**Note** A mark is a visual representation of data as a geometric object placed on a 2-dimensional plane. It's important to note that a data label is a mark of type text. The most commonly used marks are rectangles and points. Libraries like Vega/Vega-lite make visualizing bar charts like the one above easy, as we do not have to consider how to draw a rectangle, and even more importantly, we do not have to consider where exactly we want the rectangle drawn. Creating a bar chart using Vega is more complex, as we use the primitive mark ```rect``` to visualize our data as a bar chart. However, we will also realize that we must show the data labels by ourselves, as no UI presents us with myriads of properties we can use. Instead, we must know that a data label is a second mark of type text that uses the same data set. A second mark comes with its encodings but is "stacked" at the mark that defines the bars. Encodings are the mapping of data to visual properties of a mark. The mark bar, for example, has visual properties like width, height, color, and some more that are less obvious than the former ones. It is beneficial to understand the available marks and what data visualizations can be created using each of them, as very often, data visualizations that are aesthetically very appealing are a composition of multiple marks. ## Multi-layer views Adding data labels to a "simple" bar chart transforms the single-view data visualization into a multi-layered data visualization. ![View composition](https://github.com/tomatminceddata/learningdeneb/assets/29025119/8a215f3b-ea6e-4ba4-804a-0dcb1bcd8088) The above image shows how two layers, to be precise, three layers, are stacked on top of each other. These layers are 1. the bar chart (mark - type: bar) 2. the data labels of the segments (mark - type: text) 3. the data labels of the stack (mark - type: text) Creating multi-layer data visualizations is simple: wrapping the function layer around multiple groups of marks and encodings. Each of these mark/encoding combinations represents a single view, like so: ``` { "data": { ... }, "layer: [ { "mark": { ... }, "encoding": { ... } }, { ... layer 2 ... }, { ... layer n ... } } ``` Creating the above data visualization requires more effort, but it is easy once understood. Here is [the gist to the stacked bar chart with data labels](https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneBtUATgPYDuA5sWgA7wgCSIANCFhmQLYYB2UtAwkxDIU8AEwAGSVIC+jQqQpFqtBs1YduvOCABCg4fADsUmXJDFylGtrqqWbTj1oBBfangAWE5NnzLS63o7dUctXTcROABGb3FfcwUrFWCHTVoAEQixWOkAXVkQKAIILiwAMyICdjxQEgBLLjBSGpAiQIsAfS40dgAjDAJBCBxtCzpGjAAPHQBPAHErZ0aAZVSnPOZFal6ZvAT-ZTU13nyzACsiBogyCjZoDBayuowUMFoDZjbaLB6hkfsNDxlr8NiAtlQdnsLODBCE0qdQBcrjcCHcoA84PgQE8Xm9tB9WoEftVmMNaODgdVQeDIZj9jCEdi6ih0YNtJAoD1cAByMYTabzRYrY7c3IAXglURA8XqjWamNAX1GpC6PX6g1J-z5YCmswWAWluU2VlpuHpSSNLEqWixONe32OgkqOrZIGGAGNuGAGmRDTKGk0SC0lSxfsw7XiAaFKX9aAQ0FwuD7KbNVoDeNSTbs6dCLWprY9nvblSRxjqBfrDq0CC7aB6vT6-WZ3ah3WgUPdaByubyE0muGQUzM06FRQACAC0Y+77B5cKBPVFse0kxHmh0wweoI7MwGLXYEAIAGt4KAoDMqBiQL1D9LmNx3URvQPTyBJq+I+93Mxz5faABHNASigOoYBAxAHlJSY6hGM9QJQDFuhQFBpAKbNQE-EsYR-C8ry4Ih2CuEQoJg18QKgBD4CQlCCkfFBKg-ItI3nXgcL-bR8MIrhvxABCyC9Mj4MQ9sUE+AhnicbReiIKAoAI6VUPiA9jzI3DaHRSZWJAOiGKkjt3RPUkUDqMguFoBCyi071XQICjBDAd84FEAoHyfRsEDfRjcVoVdjg3VhBF-K9AOA0DoDqCC7xAdCmW8rCkjYvCCKIqKNK0DCmK-YjCjU7QQp4MLwMg7FKgPMIADpxBwRSzGUk8PKC9Spi01ATLM7QLKsxyAFYXK4R9n19DzHIyuKwSsGNEoAoCCrAiKt2YGLMPNA0po45LuJEAo0q84txoCSacvYkB8vI8LIvDUroFoSrqtQy1USwIgUEijysBbSiPJ010AwwS8JnWVCgA). The function ```layer``` stacks multiple single views on top of each other. > **Vega** > There is no "layer"-section in Vega; instead, different marks are specified as records in an array. The marks inside an array are automatically layered on top of each other. Suppose you are wondering why so many data transformations are needed to create data labels for a stacked bar chart. We need to "calculate" the text coordinates to position it correctly; the text is data labels. In that case, I like the ```joinaggregate``` operator more than the stack operator (see the [[#Transform]] chapter for more details). The ```joinaggregate``` is more versatile. Nevertheless, it is simpler to use the ```stack``` transform (see the report page "learn - the text mark (Vega-Lite)"). But there is more about label placement in the chapter: [[#Some text, it's not just colored rectangles, it's also about text and numbers]] If you are also wondering why I do not use the stack transform operation inside the encoding, I use transforms because I want the encodings to be as simple as possible. Please be aware that this is just a straightforward example of multi-layered data visualizations. If you combine bars with ticks, you can easily create a bullet chart; see [this gist](https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneBtUA5gE4D2aADvCAJIgA0IscIAwvSAEbwBMADPwIZQe3AKwD+AXzqFSFKrQZNW7LnADsE-kJHiJ02WUrNFjKmwZqALFt4643MVoMhiRhe2UARVfACMtvaOegIubvImnlQ+ljyBIMIOTvoyrnLGNFHMMZzwAGzxicHOqeEZ1KbKAKK+cAAchf51IVKl6QqVVDWxcDZa9n7NJYYRNJ3M3bkajXCDLbySALrSCUQQAHZYAGYkRAC2eKAA7gCW62AkR4cgJMa4IKRHAPrraHscGEQgi0o4cPePajnDAADwAQgBPADi6QAgucAMoYAh7DDrYTLH5pIwcCF4bHyJTfFwAKxIZwgBGIyOgGGuWxOGBQYHxXCxt3xWDe31+nORqPRCO5mIYZVx+Ngovay1SZIpVKINKgdP+oAZTJZ-1y7LuIC5ByxED+9zKQoNSylOLxWrKxNSDJQyq+zEgUDeuAA5IDgeDoXDEfy0VAPYsALzhvwgFync6Xa4crWPF5vD5fQ3Gh6XIFgUGQmHuEUE8jim3ShhYXaJfAgdXMqiSm5EHPOxhYADGaLAZwIUZlxzOFyuqpuGX17Frmr1gcF3N5zCIaHW627ZshSJRQd7looJZNZb1lfpjLr86zPrz7QYu2b9fbne7vZcbdQbbQKFpVFd7q9i+X6wIq4QuuArBosAAEAC0YFfnsnpYNOUBmiGnh-CAILAUGYJGnSFogG2JDrAyPYIHhBHPokoBYOQEBtg+cALCsAAW+HrOR1xQCcUAoCqCSMScOAMHsEBEAA1vAoBQBC5A8cqILCAwYAgv4CwMGi+Fdv+4kgNaarHpOtpCFJPHrCQewUignggvxWkcVxKqvCgKCSCsskUTWelUKas41rsQmJCAAB03A4M5LjvhCnzsZx3FUFAjHQOwQmiTZRlUBwwnsIxGAnAQjFuel5zwLwAUAJzOaprEkBpxGgDp7kap5l4JKlzAmWZ6yoJZ1lwA5TkMEpJETlQ8EbjOByGdJVAAI5oBstnQCciB0koVl-BJ0X2W+KAMDs+wJcwQUhSs+EoLsWlDcwDbcQQnY2Rt8C9VeRCMuiaUkFAUCmVGoWpElYkkZJk3MK57AnWdzBHHxyrsF2LaOl8SgoDl6xUNxWzyeAA3cCsalVXRtXnR5zAGc1QMgG15lRv1hMNcw6EIVh8HsIDPEzXNnELUtVMJKCbkXVOo2Id5u1+VQh3fS4f0pWTIMMGDLYcO+bZiQpJxw1ACOMEjBAo8waMY4pPA45V1VaXV-MkyzVAUx1FkrANum00WZrMy1IBs+iHMcVzLm8zTJ7O8Lvn7YFwUS79wn-RJbsccroMkKdCtKyrnAbGAWAnAAXiqADMdi8ScyvrNgfzYxV6n49p-v6U1VutaZlP29XsWu2THvzd7OHOX2CT3cwFaonF3bQScWxbJ8m6CZH0syX7CkDQEcsJ+DnDJ9zuOmyR5tE0WrfGQ3ttdX8vW+3JWnIOgPEBdfEtYoqFYoFzJHXbdJHy1QA4YNJwKvaFQA). Another example of a multilayered composition is the creation of a lollipop visualization, which is often used to visualize differences. This type of data visualization combines the rule and point mark. You will see a lollipop visualization on the report page "working - when geom objects convey meaning." ## Multi-view compositions Functions like ```hconcat``` or ```voncat``` combine multiple views (no matter if it's a single-layer or a multi-layer view) into one data visualization, either horizontally (hconcat) or vertically (vconcat) aligned. Functions like facet or repeat create a trellis of data visualization. If you are more familiar with Power BI, then these two functions are doing what small multiples are doing. The difference between facet and repeat is this: + facet - facet filters the dataset by the specified definition + repeat - repeat provides the entire dataset for each visual The next image shows an example of a multi-view composition: <img width="343" alt="view composition - multi-view" src="https://github.com/tomatminceddata/learningdeneb/assets/29025119/905339c7-90c3-4c7a-80ef-0c9a72d92f27"> The [link to the gist](https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneBtUAZhAMYZQDmATgPZoAO8IAglgJawA0INA7lbQzggAkiC5YM5ALYYAdlEYBhMSGQp4AJgAMO3VzV1KGRPADMu3QF8OhEmX71GLdit4PBIlROlyFQgEIqavAA7BY6+qiGxvAAHOFa1rakFDSOQs6c3NR8aR7CouKSMvJOQajwACwJkSjRJnDm4UkgRCnuTmxZbnmMBV7FvoyBtfAAjDWqUUYNE802rXapAp0uXD0rQv1FPqVCACLl6nDa4bX1mgmWALrWIFCUELJYBNSUUnigPKyyYDmfIGoHl4AH1ZGgpAAjDCUFQQHBCXjCX4YAAe-gAngBxPJMX4AZUGpRuXHckIxeEW7V66xyHR2JQUtwWACtqD8IOQqJJoBgAQRWBgUGBGMEuEDGFgIXCESBvIz8dKSSAyRS4LgqfYadlcisGUNmaA2RyuUZyLz+YLhaKKuKPFKPlx4Yx3IqPsrVZS2lq9TqOoaQMbZJzueaoHz1YQrSKhAYZioJUIHTKXXk3QAFeMevLkr1Lel+3oBgUocOwoSQKAQ3AAciRKPR2NxBKJUBr1wAvF2xiAWt9fv9I4DgTkwRDobCnbL62A0ZicStswJc+rNctHLTdY5ruI3n58K1o5LWwnKLPyyB4aRfj9yL2A-2-jwAYm5dKuAKhTG5a23SnEWgsiyLebqYoSuwKEu9Arhq3rrgwm7+rulD7lGX6MDOc5Nr6bznowV5yGAt73i0xCoMQaAoBaFbQNWdaAcBsjkKBGLgYy7YAAQALQcZWdHyr4brtv+IComxvj+PCfKkeRlHUeAtFSLWroQpxPF8UpNYqVImbGMJU6MMQAAWTzkBgukmMqRnENQshkahIBURiMIAlArBQCgEYgIoaCUEY8gcc5ECTiARkYKw5BGX4ACsWhcFIwUANbwKAUAYnQXmQsFvZcHINlEUxKWiUVn7WrGtr3OlXkAI5oE8bkwG5iB8k6qKsAiqXuZ58DgigKCWHcapoWVha+mlGWMLI1BSBy6ite1RUNd1cC9f1dw2SgbwlUeSYnlw41eVNM3BnNjmSIRi1dRGq3ipQgp7CAkLUFAUDTb2A0tAllDJQglUTUI4aogoXAbVtQiQlRxDJU6KARbIjCeQQwPgKwF4oadYCopodx5dQBV3r9WO-aV35ia2kkSCoB2MLV9XudArDNTlIBDYe6GInS2rU0IR2zczgN+MN35in9NV1fI9NNS1rRvAlfggAAdFoOAfQsX0-alVWMALcKw+Q8NCIjyOY-A0U43ZePEYT23syqabvqLNPiw1DNM4NNsjRsG6Ozz0183cOvEztdsCH++1a0ItMS41jPS687zQIwSsqwNAZOS5Q5hRFUWm3FIDfGAUBGeMET3FdSgmUxGAcdQzWUBx9SsLQWCBRg2XxUli0R9wlHSblFv40VrMkxhnNjd3vMnXCbUIqtdxE0LhmV2ZFlU93Ucu1L08Lb9S3XZRa0g9Qm3lqANk3m5tmLdg8sabWxmmeZMycQAPBxWhHJgGEYCKdxqF-4NIbJVVqAdWXd-ogDoOyUo4gTIQOIKjYg3UPysD6j-eADxMDm3ylbUAw9g5ewQj7EAk8Kjzw9t+B+VdV7hwgRvSWsdmag1PiAc+RFL4G1SjfRgd8axUJXs-a4HE34f1qAA7g6C-6oHERDEgwDU43EsEAA). The specification slowly becomes more complex :-) ``` { "data": { "values": [...]}, "transform": [{...}, {...}] "hconcat": [ {"title": {"text": "current year", "anchor": "middle", "align": "center"}, "facet": {"row": {"field": "facetgroup", "header": {"labelAlign": "left", "labelAngle": 0, "labelPadding": 10} } }, "spec": { "layer": [ { "mark": {"type": "bar"}, "encoding": {...} }, { "mark": {"type": "text", "color": "black", ...}, "encoding": { ... } }, { "mark": {"type": "text", "align": "left", "dx": 5}, "encoding": { ... } } ] }, {"title": "change over the previous year", "facet": {"row": {"field": "facetgroup", "header": null}}, "spec": { "layer": [ { "mark": {"type": "rule"}, "encoding": { ... } }, { "mark": {"type": "point", "filled": true}, "encoding": { ... } } ] } } ] } ``` And, of course, the "final" iteration adds the facet operator to the specification. The image: <img width="421" alt="view composition - multi-view faceted" src="https://github.com/tomatminceddata/learningdeneb/assets/29025119/0835d89e-a40f-45e8-bd38-d4b961ee1dd6"> [The gist](https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneBtUAZhAMYZQDmATgPZoAO8IAglgJawA0INA7lbQzggAkiC5YM5ALYYAdlEYBhMSGQp4AJgAMO3VzV1KGRPADMu3QF8OhEmX71GLdit4PBIlROlyFQgEIqavAA7BY6+qiGxvAAHOFa1rakFDSOQs6c3NR8aR7CouKSMvJOQajwACwJkSjRJnDm4UkgRCnuTmxZbnmMBV7FvoyBtfAAjDWqUUYNE802rXapAp0uXD0rQv1FPqVCACLl6nDa4bX1mgktbfa9QgCqAMquOR1bA7t+IMqjjZMGMzMVwWN2W6RATxeuU2nh2JS+IymxzmZyRFzg8XmyVuMMh61edxEhRA3nhZV+1VRAJifyxi3ahLx2Wh4O2JMGexAiOCJ3+0xpKKsIKWbwhz3xLPyxNJQwOR0uVP5DVOFksAF1rCAoJQILIsARqJQpHhQDxWLIwDkTSBqB5eAB9WRoKQAIwwlBUEBwQl4wgtGAAHv4AJ4AcTyTAtjw5CnVXHcLuDeHpOMcEo6cKGGoWACtqOaIOQqJJoBhrQRWBgUGBGMEuLbGFhnZ7vezPo9m3GQAmk3BcCmwQx071M6Vs6A8wWi0ZyKXy5Xq7WKvWPE3jVwvYx3B3jV2e8nQRnmR1xyBJ7JC8XZ1Ay33CAua0JqSYV43mxvW9vnQAFQF7vKJgeIojsevSnhWKA3h6QiQFAzq4AA5L6-pBmGEZRjGCFqgAvLhYwgC0ZoWlad42naOSOs6boeh+jDIWAgYhuGKz-gIgF9gOR4bI4ariIafj4K0D6NjGKiGgx0EgF6pAWua5AEaeRGWjw1oNkIa4qBWVaPm28I7i2dFoLIshyTuIbRp8CnxgBvb9oeIHcQwvEkvx87aXROR+gxqHMWmNqUBJjDSXIYByQpLTEKgxBoCgc4wdA8FIUZJmyOQZnBhZ8JYQABAAtNlsGJTK8g7lhBlCAGmW+P4XplhFUUxXF4AJVIiFflIOX5YVrUIe1v7GGVtFCMQAAWurkBg-UmF2I3ENQsiRQJoBQKwUAoLey2Bl80WUEY8jZcGGAQDRUkLSNhqMFIrBgGA62eigrDkLIjAyVBBFcKC8CgLwX1Ce5Qj2SsXAjUdgUICAsVuigTAPU9jDrQQChcJDVaRuQd1wFoyMQFD34QDdYVwBMlgk+IdAYMQv2xYd0GCVIx0ANa-VAwbk4wLrHe9IByHNoWpb9Aa-Vpi5PsuWqs7eIAAI5oLqK0wCtiBlhuAasN6y2rRjTooCgH2GvTXwAHQaDgpMgL297-aBQPi2zQiyNQV0XuoKtq8zmu3trKBm3NKAXeDws6cVSO25LDtO2L60TRa7trZ7MW6-5laci61BQFAjsESTLT05QTPgyzdtaltKi+-7IAurFxBMxusPPUICMh6FkmUGtKhgILJyajz1B8-J4Od5bIsgJVMY1RIKiF5LMty6t0CsErXMW39w+OZPEuMOHBbqJqN4Bn4Q86XWoeMDP8hz4ryutPr0CMIbWim6ToC5-ny0b0Ie8h6gj31xDGCI+3TuABWbuC1e6E1AIPFeOl2rryLmfeW89F6amXoHDyko4Fh0dtvLmn8hbCSELArgU9T6y3PgrBeV8DRGlvkIY2j91RPy1B7F6Y1UoYGytQJWlBspQBBtleorBaBYAOkdE6n1wY-QDgQziIEQb43dPAL2ZssDk0puDamijSIv2Zu-bgMU6pcB7n3X6qCZFr1dt6ZRXAoFoOGmwiaU1MGkNnhQxeljY5awTnrGhRsTZZy4GXaCoA5qyRWvNZm2AvjdUQqNcak0Zg5QADzZS0EcTAdEMA1k1GoDJQhK4kCZtnBYOiC56LoPmUoH1WA6yyfAbUmBQG8wgebfBVsLFSVVlYhOmpbEyLiewpxxC9EIIvpQz0XTPHxx1j7agftgkgFCaFcJ9dNpYGiS1WJDiEkDTVNlFJaTah5O4HUnJqBjkFOrlnRhjCgA). ## View compositions inside Deneb and Power BI When using a single or multi-layer view with Vega-Lite or Vega, we are spoiled because the visual adapts when it's resized on the Power BI canvas. Vega and Vega-Lite are currently not supporting auto-resize for multi-view data visualizations. This is not an issue, as the width/height can be easily adapted to the space available. The below snippet shows how to set the width for two horizontally aligned views: ``` { "data": { ... }, "transform": { ... }, "hconcat": [ {"width": 100, ... }, {"width": 200, ... } ] } ``` ## How do I create multi-view data visualizations ### Ideation of the result When I create Deneb visuals, it often starts with a hand-drawn sketch from one of my colleagues; when it becomes more complex, I tend to make a concept layout using PowerPoint. The following image shows the conceptual design of the multi-view composition of two horizontally aligned facets of multi-layer views: <img width="350" alt="view composition - conceptual drawing" src="https://github.com/tomatminceddata/learningdeneb/assets/29025119/5ea69296-d271-4721-8bdc-bc6bdd14463f"> These conceptual drawings help me a lot to stay focused, but I also use these conceptual drawings to communicate with my colleagues. ### Ideation of a complex data visualization I need a more detailed visualization of the view composition when the expected result is more complex. Of course, this can also be used for documentation purposes. Suppose you are looking at the spec of the Deneb visual on the report page "working - when geom objects convey meaning" of the sample file "learning Deneb - Contoso 1M.pbix". In that case, you will find a complex view composition that consists of 3 columns (this is specified) and three rows. The rows of the visualizations will be created automatically, though, simply by putting the visualizations in a new row when the 3rd column is "filled." This is the Excel that I used to stay focused: learning Deneb - view composition - IBCS performance ![[learning Deneb - view composition - IBCS performance.png]] # The general structure of a Vega-Lite specification A single-view specifiction for Vega-Lite looks like this: ``` { // Properties for top-level specification (e.g., standalone single view specifications) "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "background": ..., "padding": ..., "autosize": ..., "config": ..., "usermeta": ..., // Properties for any specifications "title": ..., "name": ..., "description": ..., "data": {"name": "dataset"}, "transform": ..., // Properties for any single view specifications "width": ..., "height": ..., "mark": ..., "encoding": { "x": { "field": ..., "type": ..., ... }, "y": ..., "color": ..., ... } } ``` I took the above "snippet" from here: [Vega-Lite View Specification | Vega-Lite](https://vega.github.io/vega-lite/docs/spec.html#single) I only adapted the data part, as this is defined, because I'm using Deneb inside Power BI, and there is a table called dataset right from the beginning. It looks like as if a simple spec is a lot, but we do not need that much, the above can be reduced to this: ``` { "data": {"name": "dataset"}, "layer": [ { "description": "the bars", "title": "the first Deneb visual", "mark": { "description": "bar, definition", "type": "bar" }, "encoding": { "y": {"field": "brand", "type": "nominal"}, "x": {"field": "Sales", "type": "quantitative"}, "color": {"field": "gender", "type": "nominal", "scale": { "scheme": "category10"} } } } ] } ``` And because the above spec is a "simple" one I can remove more characters. I do not need ```description``` , and I do not need the ```layer```object because there are no layers. The "simple" spec reflects how I started the development of the data visualization: 1. I have data 2. I decide what geometrical object I want to use. In the above spec I decided to use a mark of type bar: ```"type": "bar"```. 3. I map data to the channels in the ```encoding```block But then, the simple rectangles (left) evolve to a more complex visualization (right): ![[learning Deneb - view composition - the evolution of a visualization.png]] I'm done with the ```data```and the ```transform```object quite early. I use most of the time for the "fine-tuning" of the encodings and the marks. Even if Deneb allows me to collapse the different blocks ("data" and "transform"), scrolling to the correct block that I want to work on is a hassle. For this reason, I must remember that the spec is not a program that runs from top to bottom. Instead, it's a document where the order of the blocks does not matter, except, of course, it needs to be complete. The spec of the right visual does not follow the order of the "official" link. Instead, it looks like this (an image): ![[learning Deneb - view composition - a complex spec.png]] The ```layer``` block contains all the marks and their specific encodings. I moved all other blocks to the bottom. This is more than convenient, as it reduces errors caused by editing "code" in the wrong block. When I share specs with colleagues who are not familiar with Deneb I put the ```params``` block to the top because then they do not have to scroll to the end of the spec if there are values they can change ;-) # Axes, Channels, and Scales Next, to view composition, axes, channels, and scales are also some of the founding concepts of Vega and Vega-Lite. This chapter touches on these concepts. Nevertheless, if you start creating visuals using Deneb, you may not need to know anything about these things because Vega-Lite takes care of them. There is another reason these three aspects stay unnoticed for quite some time: a more profound understanding only becomes necessary if we create multiview visualizations, whether these visuals are layered or horizontally arranged using one of the view composition functions. As soon as we start making a layered view or multi-view compositions, these aspects will strike, and sometimes this does not meet our expectations 😀 Your first visual might look like this: ![[learning Deneb - Axes, Scales, Channels - first visual.png]] The visual is coming from the report page "l**earn - axis, channels, and scales (Vega-Lite)**" ## Axes If you like the visual above, that's perfect; we're done. But then, on a 2nd look, you will realize that the sorting of items on the y-axis is a little odd because the brands are sorted in descending order (the things on the y-axis are considered brands): the s-brand is at the top of the y-axis. This is because the origin is at the top of the view. Noticing this, the ordering makes sense again: Still, after all this time, I'm considering this odd. Nevertheless, Power BI is doing it the same way: ![[learning Deneb - Axes, Scales, Channels - Default ordering.png]] Vega-Lite, Vega, and Power BI treat nominal and numerical values differently; at least, this is how it appears 😉 I read from the bottom to the top of the coordinate system and from left to right. I expect smaller values to be at the bottom and the left. I'm used to the Cartesian system, which is this x/y axes thingy ([Cartesian coordinate system - Wikipedia](https://en.wikipedia.org/wiki/Cartesian_coordinate_system)). By the way, Excel and all other spreadsheets do it the same way, "A" is at the top, and "Z" is at the bottom, at least with the cultural settings of my machine. Moving the s-brand to the bottom and "this" to the top is simple. Adding the snippet below to the y-axis encoding will *fix* this. ``` "y": { ..., "sort": "-y" } ``` Most likely, the items (the y-axis) will be sorted a numeric value like "Sales." No matter if it's stacked bar chart or a simple one. The following snippet can do this: ``` "y": {"field": "brand", "type": "nominal", "sort": {"field": "Sales", "order": "descending", "op": "sum"} } ``` We are still working because we want to add a little triangle to the visual that helps to mark the 50% mark of the complete stack. The following image shows what I'm talking about: ![[learning Deneb - Axes, Scales, Channels - where we want to go.png]] Of course, sorting the segments by value makes sense. It would be helpful to have the values ordered descending when the 50% mark is reached. The triangle starts to unfold its power when there are more than two segments. But here, it's not about the visual power of stacked bars and triangles; it's about layering two single visuals on top of each other. ### Layered visuals and shared axes When implementing the triangle marker, the spec might look like the image below (it's an image because I want to highlight the structure): ![[learning Deneb - view composition - layered view spec structure.png]] But when adding the below encoding to the point layer, it looks like the sorting of the bars got lost: ![[learning Deneb - Axes, Scales, Channels - almost non-visible triangle.png]] The reason is simple: by default, **common properties of marks (or common encoding channels) will be shared between layers**. > **Note** Encoding channels (axes are considered encoding channels) will be shared between layers by default. This sharing happens between layers of a multi-layered visualization and multiple single/multi-layer visualizations in a complex visual composition, e.g., when concat is used. Officially, this sharing is called unioned. Generally speaking, Unioned axes, channels, and scales are good, but sometimes, they catch me **off guard**. It's possible to interfere with the union of encoding channels; if channels are not shared (unioned), they are independent; this is achieved by using the ```resolve``` object. The official documentation: [Scale and Guide Resolution | Vega-Lite](https://vega.github.io/vega-lite/docs/resolve.html) Before getting the lost sorting back, it's essential to "visualize" the multiple y-axes. This can be done by adding the below snippet to the code: ![[learning Deneb - Axes, Scales, Channels - independent y axis.png]] Now, a second y-axis appears. Of course, no matter how many layers there are, only two are visualized, even if three or more layers form the visualization. The ```resolve``` object must be placed after the ```layer``` object. Depending on the complexity of the visual composition there can be more than one ```resolve```object in the spec; remember: records are nested in records that are nested in records ... Fixing the "sort" issue is simple. Adding the same sorting of the y-axis of the bar mark to the encoding of the point mark immediately fixes the "issue." Nevertheless, creating a general encoding section (outside the individual mark blocks) is the better solution. The image below shows how using a general encoding block brings back the sorting of the y-axis. The spec still contains the resolve object at the end of the spec, but now the y-axis is set to shared (otherwise, there would be two y-axis again). The spec also includes a minor adjustment that makes triangles more visible by reducing the height of the bars (see the mark definition of the bars): ![[learning Deneb - Axes, Scales, Channels - triangles with default coliring.png]] ## Channels Looking at the above visual, it has to be noted that the triangle is not readable "enough" if it is on a blue segment. This can be changed by adding the following snippet to the encoding block of the ```point```mark: ``` "color": {"datum": "lightgray"} ``` But then the visual is freakin' out: ![[learning Deneb - Axes, Scales, Channels - shared color channel.png]] The bars' coloring seems lost, and the light gray triangle has completely freaked out and become orange. Neither is the coloring of the bars lost, nor have the triangles become mad. Instead, everything works as designed (no, it's not a bad design; it's due to a lack of understanding). What happens for the axes (shared between layers) occurs with all the encoding channels (see here for more: [[#The encoding]]). Color is one of these encoding channels. A color scheme (to be precise: categrory10) is used to color the bars. Using ```datum```adds a "constant" value to the datastream, which will be colored. The value "lightgray" will be treated precisely the same as the values "female" and "male," and for this reason, there is one legend with three items and, for this reason, three colors. Once again, there are multiple solutions to get everything in order again. ### Independent color channels Using the ```resolve```object defining the color channel as independent brings back the color scheme of the bars and creates a 2nd legend: ``` "resolve": { "axis": { "y": "shared" }, "scale": { "color": "independent" } } ``` Considering the ```scale``` ```color``` being independent will "resolve" to two legends, with the adjusted snippet the legend also has a title: ``` "color": {"datum": "lightgray" , "title": "fancy triangle(s)"} ``` And the visual: ![[learning Deneb - Axes, Scales, Channels - independent color channelnnel.png]] Looking closely, we will notice that the triangle's color is something but not "lightgray." But here it's about shared or independent legends of encoding channels, more about coloring in the chapter: [[#Color]] ### Using a constant value Instead of using ```datum```the color can be defined inside the encoding block using```value```: ``` "color": {"value": "lightgray"} ``` Using ```"value":```will not add the datapoint to the datastream, meaning there will be no additional item in the legend, but the triangle will be "lightgray." ### Defining the color in the mark block If only one triangle exists, the color does not need to be defined in the encoding block. Instead, it can be defined inside the definition of the mark: ``` "color": "lightgray" ``` I prefer using definitions as simple as possible for this reason: > **Remember this: keep it as simple as possible** > Keep your definitions as simple as possible. Use mark properties if you do not need legends for encoding channels. Use values in combination with expressions if the value becomes data-driven, but you do not need a legend. Use a transform operation in all other cases. Do not tinker with datum if it's not necessary. ### A custom legend There is another solution that allows the creation of a "custom" legend, with "custom items" and "custom colors" (remember this is about the color channel). Everything is custom, more or less. Everything custom: [[#Independent legends, parameters, and data label placement]] ## Scales This chapter will be filled at a later point in time 😎 # Parameters and Signals, from simple constant values to user interactions Parameters are used to define constant values or bind a given set of values to a widget like a slider or dropdown. It's also possible to track user input with a click and make the data visualization reactive. The most common user interaction is presenting a tooltip triggered by hovering (mouseover) over a data point. For whatever reason, this capability is called: + parameter in Vega-Lite and + signal in Vega ## Defining values and binding data to widgets Please keep in mind that Vega and Vega-Lite are data visualization libraries, meaning interaction focuses on data visualization tasks. Signals are processed/evaluated in the sequence of their definition, even though Vega provides an update property that triggers the reevaluation of downstream parameters/signals. But this will be covered in more detail later in time. ### Defining a constant value ``` ... "params": [ {"name": "...", "value": 666} ] ... ``` ### Binding values to a slider ``` ... "params": [ "name": "...", "value": 20, "bind": {"input": "range", "min": 1, "max": 100, "step": 1} ] ... ``` ### Binding values to a dropdown ``` ... "params": [ {"name": "this or that", "value": "this", "bind": { "input": "select", "options": [ "this", "that" ] } } ] ... ``` ### Binding values to a checkbox, of course there is only true and false ``` ... "param": [ {"name": "cluster", "value": false, "bind": {"input": "checkbox"}} ] ... ``` ## Capturing user interactions like hover, brush, click, etc In Power BI, it's expected that a tooltip appears when a data point triggers a mouseover event. User interactions include hovering (a tooltip appears) and selecting an item (cross-filtering is performed). These interactions and many more can be implemented in Vega-Lite and Vega. The sample file contains many visuals where this is implemented. I'm still reading this article from the Deneb documentation now and then: [Interactivity Features - An Overview | Deneb (deneb-viz.github.io)](https://deneb-viz.github.io/interactivity-overview) Next to the interactions mentioned above, more interactions can be implemented using Vega-Lite and Vega, but a more detailed explanation of these events will follow at a later point in time. Until then: [Dynamic Behaviors with Parameters | Vega-Lite](https://vega.github.io/vega-lite/docs/parameter.html) ## Universal laws of parameters and signals Parameters can only be used in the filter transform (filtering for rows) or in expressions but can not be used to filter different columns. ## Referencing parameters This chapter shows how parameters and signals are used throughout the spec. ### Referencing parameters inside a transform It's essential to notice that parameters (or signals) can only be used in certain transform operations. These operations must not change the structure (the available columns) like calculate, aggregate, or many other transforms. One of the transforms where parameters can be used is ```filter```. The below snippet shows how a parameter is used: ``` "params":[ ... {"name": "labelPlacement", "expr": "labelAtTheBaseOrTheTop === 'place the label at the base (left)' ? 'val_start' : 'val_end'" } ], ... "transform": [ { "filter": "datum.key === labelPlacement" } ] ``` ### Referencing parameters inside a mark or encoding The below snippet ``` "align": {"expr": "align"}, ``` shows how the value of a parameter called "align" is assigned to the property align. It's necessary to understand that the parameter's evaluation must be triggered inside a mark and an encoding. This contrasts referencing the parameter in the filter transform. The thinking is + We must create a key/value pair: ```"align": ...``` + Because it's not possible to pass the parameter directly, it's necessary to pass the value of an object (an expression object, another key/value pair: ```"align": {...}``` needs to be created. + The function "expr" evaluates a formula. The formula is passed to ```expr``` as a string: ```"align": {"expr": "align"}```. The formula in this example is concise. It only refers to an object (namely the parameter/the signal) called "align." This is exactly how the parameter is named. # Event stream Event streams capture users' interactions with data visualizations and can lead to a signal update that is input for the visualization itself; event streams are the foundation of interactive data visualizations. ## Eventtypes ## Scope of events # Data A data visualization needs data. Using Deneb inside Power BI makes this simple. All the fields the data visualization needs are added to the values bucket of the Deneb visual. These fields/measures will form a single table called dataset: ![[learning Deneb - beginning - Values bucket.png]] It does not matter if it's a native column, a calculated column, or a measure. Also, the order of the fields in the bucket is unimportant. ## The default table called "dataset" Deneb creates a table called a dataset from the fields and measures. This dataset is then used throughout the entire spec. I tend to pass only the bare minimum of data to Deneb, which can sometimes make the spec more complex. When creating a column chart with the Month's names and the Sales Amount, months (items) will be ordered by name (the default), the same as when Power BI is doing this. The ordering of the month's names is already solved in the semantic model's calendar table: Date. Then, passing the column "Month Number" to Deneb will simplify the spec (see here: [[#Sort, start with the most important thing]]). Passing "additional" values (columns and measures) is potent because these values help keep the spec simple. Nevertheless, the powerful transform operations of Vega and Vega-Lite allow calculations and data shaping inside the spec. As always, it depends if all required values are created in the semantic model or if a mix with transform operations is more beneficial. ## Multiple tables While Vega-Lite allows only a single table, Vega allows multiple tables. However, it's possible to use transforms in each layer in Vega-Lite or each visual when using view operators like ```hconcat``` or ```concat```. A single visual is created, even if the rendered result contains more than one chart. Using multiple tables seems weird, especially when putting columns and measures to the Values bucket that originate from numerous tables of the semantic model. However, the snippet below shows how an additional table is defined in a **Vega**-spec: ``` "data": [ { "name": "thedata", "values": [ {"rownumber": 1 , "id": "house1", "no of rooms": 2}, {"rownumber": 2 , "id": "house1", "no of rooms": 3}, {"rownumber": 3 , "id": "house2", "no of rooms": 4}, {"rownumber": 4 , "id": "house3", "no of rooms": 8} ], "transform": [ {"type": "formula", "expr": "datum['no of rooms'] <= 5 ? 'small' : 'large'", "as": "myType" } ] }, { "name": "thedataBars", "source": "thedata", "transform": [ {"type": "aggregate", "groupby": ["id"], "ops": ["sum"], "fields": ["no of rooms"], "as": ["no of rooms"] } ] } ], ``` In the above snippet, a second table is defined; this table is named "thedataBars" and is derived from the table called "thedata." The above limitation of Vega-Lite regarding multiple tables is valid for complex data visualizations, though, but can be surpassed when the view composition becomes more complex. This is demonstrated in the report page: "working - when geom objects convey meaning." Still, there is only one table at the beginning of the spec, but each visualization can have its transform block. This is used to aggregate the values for the "Total Line." ## Referencing table values Sometimes, especially when using expressions or using the ```calculate```or ```formula```transform operations it's necessary to reference a table or a given column. Often, column or measure names will contain a space to separate words. Vega and Vega-Lite, by default, do not expect these spaces. For this reason, a special syntax is used to honor spaces in the names. ### Referencing columns that contain spaces Assuming the table contains the columns "Color" and "by new customers" it's possible to reference the column "Color" like so ``` datum.Color ``` ```datum``` provides the context of the current table. This can be leveraged when defining an expression in the encoding block or inside a transform operation like ```calculate```. But referencing the column "by new customers" the column must be referenced like so: ``` datum['by new customers'] ``` Because of the spaces, a more complex syntax is required. ### Referencing a single value from a given column Vega or Vega-Lite do not provide a SQL-like interface. Nevertheless, sometimes it's necessary to reference a single value, either to give an explanatory text to the reader of the visual or for downstream calculation. This can be done using the snippet below: ``` pluck(data('noOfValues'), 'noOfValues')[0] ``` Using ```data(...)``` creates a reference to the table "noOfValues." The result is an array object that contains the entire table, meaning all the rows and all the columns. Using ```pluck``` returns an array that now only contains the column "noOfValues." Using ```[0]``` retrieves the value from the first row. This is used for the Vega visual on the report page "learn - aggregate of detail." The Vega-Lite visual utilizes a different approach to determine the number of rows. Here, the length of the table is "counted" using the ```length(<tableinstance>)```function. ### The difference between datum and value + "datum": 42 ```datum``` creates a constant value, which is added to the datastream and, for this reason, will become part of downstream processing, e.g., creating legends. + "value": 42 creates a constant value that is **not** added to the datastream and, therefore, is not considered in downstream processing, e.g., it will not be considered in legends automatically. # Transform Transforms can be applied to the data to shape it, create new data points, or even apply algorithms, like density or network analysis algorithms. This chapter describes the tranform operations I use. A Vega or Vega-Lite spec might or might not contain an array of data transformations. The specification of a data visualization for a single view here, a bar chart is depicted: ``` { "data": ... , "transform": [ { ... } ], "mark": {"type": "bar", ... }, "encoding": { ... } } ``` To better understand what this article is about. The following data ```"data": { "values": [ {"brand": "this", "gender": "male", "Sales": 1000, "PY": 900, "Plan": 950}, {"brand": "this", "gender": "female", "Sales": 200, "PY": 50, "Plan": 50}, {"brand": "that", "gender": "male", "Sales": 500, "PY": 400, "Plan": 450}, {"brand": "that", "gender": "female", "Sales": 1000, "PY": 800, "Plan": 700}, {"brand": "something", "gender": "male", "Sales": 500, "PY": 100, "Plan": 400}, {"brand": "something", "gender": "female", "Sales": 20, "PY": 20, "Plan": 40} ] ``` will be transformed and visualized. The next image shows a basic data visualization from the above sample data: ![[learning Deneb - start - a simple bar chart.png]] A quick count of the data points reveals that the data viz only visualizes 3 data points, whereas the table contains six rows. When looking closer, it can be noticed that the "brands" represent the sum of the sales values. No matter what, the table's data has to be transformed in one way or another. However, no transform operation is required to create the above visual. The above visual can be achieved by using an aggregate operation in the x-encoding block (Vega-Lite): ``` ... "encoding": { "y": { "field": "brand", "type": "nominal" }, "x": { "field": "Sales", "type": "quantitative", "aggregate": "sum" } } ... ``` The following sections describe specific transformations. I will update this section whenever I use a transformation for the first or second time. However, not all available transform operations are included in this chapter. ## General thoughts about data transformations Data transformations are required to shape the available data so that a given body of data (basically the dataset) can be used to create a data visualization that helps to tell the story of the data and convey insights efficiently. > **Think** > Do not create DAX-based objects or Power Query objects only to be used in a single Deneb visual. If the object of the semantic model will be used more than once or it's simpler to create the object in the semantic model than using Vega or Vega-Lite, okay, then it's the semantic model. > **Note** > My experience dictates using the Vega or Vega-Lite transformations, though. Transformations are fast and do not add measures to a data model that are only used in combination with specific data visualizations. It seems more straightforward to adapt a transform inside a Deneb visual than to adapt a complex measure to the visual. No matter what, the combination of DAX with Vega and Vega-Lite is compelling. Often, transform operations are specified at the root of a spec, which is one of the main blocks of a specification. Still, it can also be defined inside an individual view if there is a multi-view composition, whether it is a multilayered view using "layer": ... or a multi-view composition, e.g., using "hconcat:" ... Some things to know about data transformations: + The transform array can contain multiple transformations. + Transformations are executed in the order of definition. + The result of a data transformation is available in subsequent data transformations. Nevertheless, using transforms at the root level keeps the encodings or the single layers simple. Defining all transforms in a single place reduces the overall complexity of the specification. ## A list of transforms The following list provides a quick overview of the transform operations that I'm using. This overview marks if the transform is available in Vega-Lite or Vega, if the transform adds one or more columns to the current table, if the transform filters rows, if the structure of the current table will be changed, and if the transform creates objects that can be used in subsequent transforms. Knowing if a transform filters rows and alters the current table's structure is important because these transforms can not be used inside Vega's mark definitions. There are more places where a transform can be defined 😎 , not only at the root level of the spec. This is even more true when we are using Vega. Sometimes, transform operations execute the same operation but have different names, so the value in the *Transform* column must be read like this: Vega-Lite-name/Vega-name. The list below only contains transform operations I used in the past or in my current data visualization tasks. This list is likely to grow in the future. | Transform | Vega | Vega-Lite | adds column(s) | filters rows | changes the structure | creates objects | | --------------------- | ---- | --------- | -------------- | ------------ | --------------------- | --------------- | | aggregate | x | x | yes | yes | yes | no | | calculate/ formula | x | x | yes | no | yes | no | | extent | x | x | no | no | no | yes | | filter | x | x | no | yes | no | no | | flatten | x | x | yes | no | yes | no | | fold | x | x | yes | no | yes | no | | joinaggregate | x | x | yes | no | yes | no | | stack | x | x | yes | no | yes | no | | window | x | x | yes | no | yes | no | ## Transform operations ### aggregate transform The Vega-Lite documentation: [Aggregation | Vega-Lite](https://vega.github.io/vega-lite/docs/aggregate.html) and the Vega documentation: [Aggregate Transform | Vega](https://vega.github.io/vega/docs/transforms/aggregate/) No matter the power of the aggregate transform, this must be remembered. > **Note** > Aggregate is changing the structure of the table. Only the columns used with the groupby and the calculated columns will remain. The following snippet shows how three numeric columns are grouped by an empty grouping operand using Vega-Lite. The table will only contain three columns after this operation: ``` ... { "aggregate": [ {"op": "sum", "field": "Sales Amount", "as": "actSum"}, {"op": "sum", "field": "Sales Amount PY", "as": "pySum"}, {"op": "sum", "field": "Sales Amount Plan", "as": "planSum"} ], "groupby": [] } ... ``` The ```groupby``` property can be used empty, as in the above snippet, or omitted. ### calculate transform The Vega-Lite documentation: [Calculate Transform | Vega-Lite](https://vega.github.io/vega-lite/docs/calculate.html) and the Vega documentation: [Formula Transform | Vega](https://vega.github.io/vega/docs/transforms/formula/) In the following Vega-Lite snippet, two columns are referenced, and a new column is created: ``` { "calculate": "datum['runningSumBySegment'] - datum['segmentSum']", "as": "xSegmentBase" } ``` The formula to calculate the desired value is provided between quotation marks. The name of the new column is defined using the keyword "as." ```datum``` refers to a column inside a data object. It can be used with many notation styles, such as datum. column name or datum['column name']. ### extent transform The Vega-Lite documentation: [Extent | Vega-Lite](https://vega.github.io/vega-lite/docs/extent.html) and the Vega documentation: [Extent Transform | Vega](https://vega.github.io/vega/docs/transforms/extent/) This transform does not transform the existing table in any form. Instead, it creates an object. This object is a two-valued array formed by a column's minimum and maximum values. The following Vega snippet creates an extent named "extentSales": ``` { "data": [ { "name": "dataset", "transform": [ { "type": "extent", "field": "Sales", "signal": "extentSales" }, { "type": "formula", "expr": "extentSales[0]", "as": "minSales" } ] } ], "scales": [ {"name": "scaleX", "type": "linear", "domain": {"signal": "extentSales"}, "nice": true } ], "marks": [] } ``` The above snippet shows how an extent object called "extentSales" is created and how it can be used throughout the spec, e.g., in subsequent transform operations or during the definition of a scale. ### filter transform The Vega-Lite documentation: [Filter Transform | Vega-Lite](https://vega.github.io/vega-lite/docs/filter.html) and the Vega documentation: [Filter Transform | Vega](https://vega.github.io/vega/docs/transforms/filter/) The ```filter``` transform filters the data object, and the next Vega-Lite code snippet shows a straightforward filter: ``` { "filter": "datum['rowIndexByGroupAndSegment'] === 1" } ``` It's recommended to use three equal signs for the equality check ([[#The equality operator, or why should I use === instead of ==]]). ### flatten transform The Vega-Lite documentation: [Flatten | Vega-Lite](https://vega.github.io/vega-lite/docs/flatten.html) and the Vega documentation: [Flatten Transform | Vega](https://vega.github.io/vega/docs/transforms/flatten/) This transform unpivots an array of values into rows, the number of rows is growing. The number of items inside the array can be different per row. Assuming a column named "aggregated" contains this array [true, false], then this snippet ``` {"flatten": ["aggregated"]} ``` transforms this table ![[learning Deneb - transform - flatten start.png]] into this: ![[learning Deneb - transform - flatten result.png]] This is used on the report page "learn - aggregate of detail." This is often the starting point for filtering the dataset based on user interactions. On the above-mentioned report page, this technique, in combination with subsequent transform operations like the Vega-Lite one below: ``` { "calculate": "datum['aggregated'] === true ? datum['theDetails'] : datum['Color']", "as": "Detail" }, ``` allows data to be filtered based on user interactions. ### fold transform The original Vega-Lite documentation: [Fold | Vega-Lite](https://vega.github.io/vega-lite/docs/fold.html) and the Vega documentation: [Fold Transform | Vega](https://vega.github.io/vega/docs/transforms/fold/) This transform unpivots fields (from columns to rows), this is useful if measures have to be stacked. This Vega-Lite snippet ``` { "fold": ["Sales Amount", "Margin"] } ``` transforms this table ![[learning Deneb - transform - fold start.png]] into this (the following image shows an excerpt of the table): ![[learning Deneb - transform - fold result.png]] This is used on the report page "working - stacked bullet charts with measures (Vega-Lite)." ### joinaggregate transform The Vega-Lite documentation: [Join Aggregate | Vega-Lite](https://vega.github.io/vega-lite/docs/joinaggregate.html) and the Vega documentation: [JoinAggregate Transform | Vega](https://vega.github.io/vega/docs/transforms/joinaggregate/) JoinAggregate is a powerful transform operation that creates new values based on aggregations across multiple rows while preserving all rows. This transform is very useful if detailValue / groupValue calculations are needed for the data visualization. > **Remember** > Preserving rows, of course, means repeating values. This must be remembered. The following two Vega-Lite code snippets show how the columns segmentSum and groupSum are calculated: ``` { "joinaggregate": [ { "field": ["b"], "op": ["sum"], "as": ["segmentSum"] } ], "groupby": ["group", "a"] } ``` ``` { "joinaggregate": [ { "field": ["b"], "op": ["sum"], "as": ["groupSum"] } ], "groupby": ["group"] } ``` Looking at the syntax, it becomes obvious that each joinaggregate transformation is part of an array. This array contains multiple transformations, where each transformation is a record. The following image shows how the transform works to create the column "segmentSum": <img width="343" alt="transform - joinaggregate" src="https://user-images.githubusercontent.com/29025119/236118505-3e4fed28-dd7e-4d1d-a11d-32ad0c0f2592.png"> It's necessary to note that the order of the columns passed to the "groupby" operator does not influence the result. ### stack transform The Vega-Lite documentation: [Stack | Vega-Lite](https://vega.github.io/vega-lite/docs/stack.html) and the Vega documentation: [Stack Transform | Vega](https://vega.github.io/vega/docs/transforms/stack/) At first glance, the below Vega-Lite snippet might appear a little spooky because no new name(s) are defined. New columns will be created automatically if no new name is defined. The column containing the values to be stacked is called "val." Without defining new name(s), the column names "val_start" and "val_end" will be used. ``` { "stack": "val", "groupby": ["rowgroup"], "sort": [{"field": "segment", "order": "ascending"}] }, ``` See the report page "learn - the text mark (Vega-Lite)." ### window transform The Vega-Lite documentation: [Window | Vega-Lite](https://vega.github.io/vega-lite/docs/window.html) and the Vega documentation: [Window Transform | Vega](https://vega.github.io/vega/docs/transforms/window/) The ```window``` transform adds new columns based on calculations for a sorted partition of a data object (a sorted group of rows). This is similar to the windowing functions from SQL and can also be compared to the newer DAX windowing functions. The next two Vega-Lite code snippets show how the columns "rowIndexByGroupAndSegment" and "runningSumBySegment" are calculated: ``` { "window": [ { "op": ["row_number"], "as": ["rowIndexByGroupAndSegment"] } ], "groupby": ["group", "a"] } ``` ``` { "window": [{ "op": "sum", "field": "segmentSum", "as": "runningSumBySegment" }], "groupby": ["group"], "sort": [{"field": "rowIndexByGroup", "order": "ascending"}] } ``` Due to the nature of the operator ("op", the operator determines the window function), the data field can be omitted, e.g., row_number does not require a data field. The transform that creates the column "runningSumBySegment" is special in many cases. First, it demonstrates that columns created in preceding transforms can be referenced, but it also uses the power of the window transform. Here, the data object is partitioned by the field "group," then, the rows inside each partition are sorted by the field "rowIndexByGroup." The columns specified by the "groupby" operator determine the partitions. But the real "magic" of the window transform happens when the aggregation function (defined by the "op" operator) is applied to each row of the ordered partition. However, this application does not only consider the current row; the aggregation function is applied to the current row and all preceding rows. The preceding rows are determined by the column value that the "sort" operator defines. The application of a window transform can be imagined as an iterator. This iterator loops across all the rows inside a partition and applies the aggregation function(s) to a stack of values. Of course, this stack can be a single value, e.g., when the aggregation is simpler, like count or sum, or a real stack of many values when the aggregation function is more complex, like rank. ## Data engineering using Deneb and Vega or Vega-Lite Vega and Vega-Lite offer compelling transform operations that can help shape the dataset in a way that fits the visualization task. This, in combination with the power of DAX measures, performs data engineering tasks and helps overcome limitations, like using parameters/signals in some of the transform operations. This can also help to accelerate the rendering performance of Power Bi visuals. >**Note** >No matter the power of the transform operations, they are part of data visualization libraries. Keep the transforms to a minimum and solve the data engineering tasks upstream. The following chapters will describe some data engineering tasks I use more frequently. ### Expand and Collapse data, or - to aggregate data or not Sometimes, users ask for the ability to toggle between the visualization of aggregated data and detailed data using a checkbox widget. See the report page "learn - aggregate of detail:" ![[learning Deneb - transform - expand and collapse.png]] The solution must rely on filtering alone because changing the columns used in a data visualization based on a parameter or signal is impossible. This means the table has to be prepared before the filtering happens. The steps below describe the transformations from the Vega visual; however, the transformations of the Vega-Lite visual are similar. + using ```formula``` a new column "aggregated" is created, containing the two values true and false. These values are "formatted" as an array. + using ```flatten```, the array is "unpacked", this "duplicates" the rows of the dataset + using ```formula```, a new column is created called Detail + using ```filter``` filters the rows that now contain the detailed data or an "aggregated value" in the column "Detail". The filtering considers the signal "aggregated." + using ```aggregate```, the numeric column is aggregated, the column Detail also used in the ```groupby``` either contains unique values or a value that repeats for each group. Done😎 The transform block also contains two transform operations at the beginning. These transforms extract the detail values per group. This must be done in two steps: the first extracts a data object per group using ```joinaggregate```and ```values``` ([Aggregation | Vega-Lite](https://vega.github.io/vega-lite/docs/aggregate.html#ops)), and the second extracts the column containing the detail data as an array. # Expressions, pluck, and all the other powerful functions This chapter is intended to explain only some of the available functions. This chapter focuses on the contribution of these functions to tackle a visualization task. Functions are used from inside an expression. An expression is defined as a string: ``` "nameofparamter * 0.8" ``` Sometimes, it becomes necessary that a property demands a value, then this will look like this: ``` "PropertyExpectingAValue": {"expr": "nameofparameter*0.8"} ``` Expressions can be used almost everywhere in the spec. Sometimes, the same property, e.g., color, can be defined in both the mark and encoding blocks. But there is a significant difference though, while + properties in the mark block are value-based, which means they will be defined like so: ```"color": "red"``` while + properties in the ```encoding```are data-driven and will be defined like so (most often): ```"color": {"field": "gender", "type": "nominal"}``` However, when used with Vega-Lite, expressions will be evaluated for each data point no matter if the expression is defined in the mark block or the encoding block, but the syntax is different: + expression in the mark block ```"mark property": {"expr": "..."}``` + expression in the encoding block: ```"encoding-channel": {"value": {"expr":"..."}}``` ```value```can/must be used when an expression calculates a value. Both versions will be transpiled to the same Vega spec, which is probably a matter of personal style. I do the following: + when I want to use a constant value (even if it's based on a parameter), I use the mark block + when the value is data-driven, I use the encoding block ## List of functions The following list only contains a tiny fraction of the available functions, you will find all these functions here: [Expressions | Vega](https://vega.github.io/vega/docs/expressions/) The list of functions is not ordered because sometimes the functions are nested: ``` peek(slice([array],0, indexof([array], value)+1)) ``` | Expression | snippet | comment | | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | indexof( [array], value) returns a numeric value | indexof(['female', 'male'], datum['Gender']) | indexof(...) helps to "translate" a string into a numeric value, e.g., in case the strings represent an ordinal sequence. If "female" is found, 0 will be returned. | | slice( array, start, value ) returns an array | slice(['female', 'gender'], 0, indexof(['female', 'male'], datum['gender'])+1) | This returns an array. From the start until the position of the "lookup" element. If "female" is found, ['female'] will be returned. If "male" is found ['female', 'male'] will be returned | | peek(array) returns an element | peek( slice(['female', 'gender'], 0, indexof(['female', 'male'], datum['gender'])+1) ) the above can be replaced with this slice(['female', 'male'], indexof(['female', 'male'], datum['gender']), indexof(['female', 'male'], datum['gender'])+1) I prefer the peek() approach because it's less characters | This returns the last element of an array. If "female" is found 'female' will be returned. If "male" is found 'male' will be returned | | pluck(arrayobject, 'nameofcolumn') | pluck(data('nameofatable), 'nameofcolumn') | Returns all the values of the column 'nameofcolumn' from the table 'nameoftable' | ## Expressions outside of the transform block The following table contains expressions used regularly outside of the transform block: | block | expression | comment | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | mark block | "title": { "text": "using stack for label placement: Sales (beautified)", "fontSize": 10, "dy": {"expr": "title_yOffset"} }, | The expression is referencing a parameter called "title_yOffset" | | mark block | "mark": { "type": "text", ... "color": {"expr": "datum['Sales'] < 300 ? 'transparent' : 'black'"} }, | This expression returns a color. This can be used to "hide" data labels when they do not fit into a segment, e.g. when creating stacked bar charts | | encoding block | "color": { "value": {"expr": "datum['Sales'] <= 500 ? 'red' : 'green'"} } | By default the color-channel is defined by a field assignment like so: "color": { "field": "Species", "type": "nominal"} But it's also possible to define the color based on a condition, but then the "complete" formula has to be defined as in the snippet on the left | # The mark In his book "ggplot2," Hadley Wickham describes a data visualization like so >a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). Wickham, Hadley. ggplot2: Elegant Graphics for Data Analysis (Use R!) . Apress - A. Kindle Edition. Marks do exactly this: map data to geometric objects. The following chapters are about using marks to visualize data. I do not cover all the marks available with Vega and Vega-Lite, but I cover the ones I use most often. The Financial Times team did a fantastic job in classifying the data visualization task and created a great poster that helps a lot in narrowing down what type of data visualization to create, this poster can be found here: [chart-doctor/visual-vocabulary at main · Financial-Times/chart-doctor (github.com)](https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary) At the current moment only two mark types are covered in this document, this will change in the future. These mark types are: + the bar (or is it a smarter rectangle) + the text ## The bar - a simple rectangle A bar chart (the Vega-Lite documentation: [Bar | Vega-Lite](https://vega.github.io/vega-lite/docs/bar.html)) is the most frequently used mark in data visualization. This chapter does not differentiate between a column and a bar chart. A bar compares one numerical value across categorical values, like the Sales Amount spread across a product brand (see the report page: "learn—a bar chart with points—a layered view "). A bar can do more than visualize a simple comparison, it can also visualize a time range inside a Gantt chart, here are two great Vega implementations of a Gantt chart: + [Deneb-Showcase/Gantt Chart at main · PBI-David/Deneb-Showcase (github.com)](https://github.com/PBI-David/Deneb-Showcase/tree/main/Gantt%20Chart) + [Vega-Visuals/20240724-hierarchical-gantt-chart at main · Giammaria/Vega-Visuals (github.com)](https://github.com/Giammaria/Vega-Visuals/tree/main/20240724-hierarchical-gantt-chart) Then, a bar chart is not limited to visualizing one numeric value. If you look at the visual on the report page "working - when geom objects convey meaning," the left-most visual is visualizing three numeric values for one categorical variable (product:Brand): + the acutal value (Act) + the previous year value (PY) and + the Plan value (Plan) It's up to us what we do using a bar chart, or, more precisely, the ```rect```, one of the primitive geometric objects used in data visualization (the Vega documentation of the ```rect``` mark: [Rect Mark | Vega](https://vega.github.io/vega/docs/marks/rect/). When using the mark type bar, only a numerical value must be passed to Deneb, and a rectangle will be drawn from 0 up to the value. It's not required to care if the value is positive or negative. Vega and Vega-Lite do all the rest. Of course, a categorical variable can be used as well 😉 drawing a bar chart (horizontal rectangles) can be as simple as this: ``` { "data": { "name": "dataset" }, "layer": [ { "description": "the bars", "title": "a layered chart using Vega-Lite", "mark": { "type": "bar" }, "encoding": { "y": { "field": "brand", "type": "nominal" }, "x": { "field": "Sales", "type": "quantitative" } } } ] } ``` >**A recommendation** > I put all my single-view non-layered visualizations into a layer block so I can add a short description. This helps me remember what I did when looking at the viz on a future day. The numerical field "Sales" represents the width of the bar (a rectangle), starting at 0. If x and y are switched, the width becomes the height, and the bar chart turns into a column chart. ## Text, it's not just colored rectangles, it's also about text and numbers Assuming the following visual has to be created (see the report page: "learn - the text mark (Vega-Lite)"): ![[learning Deneb - mark text - data label placement.png]] But because this chapter is about the mark of type "text," the focus is on the data labels. All the other text elements will either be removed (most probably the fate of the x-axis label) or formatted differently to enhance the visual's readability and support the overall process of conveying meaning. When looking closer at the above visual, the following can be noticed: + there are two types of data labels: inside a rectangle (segment data labels) and the data label representing the total brand value. + The segment labels are placed at the top or end of a segment, not at its base or center. Placing text as a data label for stacked bar charts is not simple: stacking is required. There is a transform that creates the needed coordinates with ease: [[#stack transform]] The most apparent properties of the placement of text are + the y-coordinate, + the x-coordinate + the text (the text to be visualized) One data point is not labeled: brand:something/gender:male. To get a better idea of why placing data labels is not that simple, the next visual will be the starting **point** for the following: ![[learning Deneb - mark text - data label placement alignment.png]] The Vega-Lite documentation of the mark type ```text```: [Text | Vega-Lite](https://vega.github.io/vega-lite/docs/text.html) ### Alignment (horizontal) Text is horizontally aligned relative to the data point. Besides the two properties used below, there is also ```center```: + ```"align": "right"``` means the data point is at the right side of the text (the end of the text), the text will be written to the left + ```"align": "left"``` means the data point is at the left side of the text (the beginning of the text), the text will be written to the right ### Alignment (vertical) Text is vertically aligned relative to the data point. In addition to the two properties used below, there are some more. + ```"baseline": "top"``` means the data point is above the text, the text will be written below the data point + ```"baseline": "bottom"``` means the data point is below the text, the text will be written above the data point ### Horizontal and vertical offset: dx and dy Placing data labels at the end or top of a segment can interfere with the "total" data label: the ```dx``` property allows the text to be set some pixels away from the data point. # The encoding The encoding (Vega calls it ```encode```) is the first part of Wickham's quote: mapping data to aesthetic attributes, such as color. Vega and Vega-Lite offer many encoding options called "channels." At the moment, this document only covers the following encoding channels + Color + Size + Order + Axis ## Color Often, it's simple to make a boring blue bar chart more colorful: ![[learning Deneb - coloring - start.png]] Sometimes color is added to a data visualization only to get a chart more "colorful," but the color does not convey additional information, as in the chart below: ![[learning Deneb - coloring - colorful but meaningless.png]] In the above image, color is used "only" to differentiate the brands. This information is already available via the y-axis. The reader's attention now has to be divided by the brand's position, the width of the rectangle, and the color. Because the number of data points is tiny, this intellectual strain will likely not influence the reader's decision-making process. Nevertheless, the designer of a data visualization must provide as much information as possible and required but must honor the reader's resources. >**Recommendation** Do not use color to make data "fancy" or "more colorful." Consider the above an introduction to more meaningful coloring. All the following screenshots are from the report page "**learn - simple bars colored (Vega-Lite)**." There are four types of coloring: | Coloring ... | example (how it looks) | how to do this | Comment | | ------------------------------------ | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | by a nominal field | ![[learning Deneb - coloring - colored by nominal field.png]] | "encoding": { ..., "color": {"field": "gender", "type": "nominal"}, ... } | Now color adds additional meaning | | by a numerical field | ![[learning Deneb - coloring - by a numeric value.png]] | "endcoding": { ... "color": { "field": "Sales", "type": "quantitative", "scale": {"scheme": "blueorange" , "reverse": true } } } | Adding color by an additional numeric field can add tremendous value, but also requires additional intellectual effort. In the example the reader has to "lookout" for the x-axis value and the color. The "reverse" switch allows to adapt the direction of the gradient. | | by a given set of defined colors | ![[learning Deneb - coloring - by defined set of colors.png]] | "encoding": { ..., "color":{ "field": "brand", "type": "nominal", "scale": { "domain": ["this", "that", "something"], "range": [ "orange", "gray", "black"] } } } | This can be very helpful if this type of coloring is used in combination with stacked bars or ordinal values like "low", "middle," "high" | | by a condition | ![[learning Deneb - coloring - by a condiition.png]] | "encoding": { ..., "color": { "value": {"expr": "datum['sales'] <= 500 ? 'green' : 'lightblue'" } } } | This can be very helpful to highlight value when a condition is met. But from the example data, it's necessary to remove the aggregate operator from the x-channel. This is why the title of the x-axis changed. | Looking closer at the visualization of the last example, "Coloring by a condition," two things have to be noticed: + there is no legend + the title of the x-axis changed The reason is simple: the aggregate operation from the x-channel encoding was removed; otherwise, everything would have been green. >**Note** A legend can only be created when data is available. ### The legend, based on a transform operation Using a transform operation that implements the condition: ``` ... "transform": [ { "calculate": "datum['Sales'] <= 500 ? 'bad' : 'good'", "as": "theCondition" } ] ... ``` A legend will be created automatically when the new "theCondition" field is used in the color-channel of the encoding block: ![[learning Deneb - coloring - condition and legend.png]] ### One legend based on two layers Assuming the following data visualization has to be created: ![[learning Deneb - coloring - using layers.png]] The above visualization is the result of superimposing two layers of type mark. The first layer visualizes the numeric column "Plan" (colored lightgray), while the second layer references the column "Sales" (colored black). Creating a legend requires data. This data can be made using a little "trick. Each layer uses this simple line in the encoding block (this is for the Plan-layer): ``` "color": {"datum": "Plan"} ``` Instead of referencing a field from the table, the command ```datum``` is used. This command creates a constant value and adds it to the data stream (not the table). This means the constant value "Plan" is added to the "color" legend or "Sales," respectively. Of course, the appropriate colors must be defined, but this is something that is already described in the above example "**Coloring** by a given set of defined colors." Now the spec of the visual will look like this: ``` ... "encoding": { "y":{"field": "brand", "type": "nominal"}, "x": {"aggregate": "sum"}, "color": { "type": "nominal", "scale": { "domain": ["Plan", "Sales"], "range": ["lightgray", "black"] } } }, "layer": [ { "description": "Plan", "title": {"text": "two columns, using layers"}, "mark": {"type": "bar", ... }, "encoding": { "x": {"field": "Plan", "type": "quantitative"}, "color": {"datum": "Plan"} } }, { "description": "Sales", "mark": {"type": "bar", ... }, "encoding": { "x": {"field": "Sales", "type": "quantitative"}, "color": { "datum": "Sales"} } } ] ``` The above spec shows three encoding blocks: 1. the first encoding block defines all encodings shared with the subsequent layers, including mapping the value "Plan" to the color "lightgray." 2. the encodings special to the "Plan"-mark 3. the encodings special to the "Sales"-mark Leveraging a general encoding block reduces lines of code, and that is good! But it also requires some focus. ### One legend based on a fold transform This image: ![[learning Deneb - coloring - using fold transform.png]] looks almost identical to the previous one, but here oly one layer is used, this is possible by using the ```fold``` transform like so: ``` "transform": [ {"fold": ["Plan", "Sales"]} ] ``` This part defines the color: ``` "color": {"field": "key", "type": "nominal", "scale": { "domain": ["Plan", "Sales"], "range": ["lightgray", "black"] } } ``` By default using the ```color```channel leads to a stacked bar chart, but because a clustered chart is requred, the following line has to be added to the y-encoding ``` "yOffset": {"field": "key"} ``` or ```xOffset``` when vertical columns need to be drawn, of course, this needs to be added to the x-encoding. Unfortunately, this creates clustered bars/columns equally distributed across the bandwidth (the width used for a categorical item), but the bars must overlap. Instead of ```yOffset``` this line is used to anchor the bars to the zero point: ``` "x2": {"datum": 0}, ``` ### Comparing the layer approach with the fold approach I prefer to use transform operations whenever possible. Each layer adds some lines of code, which means the possibility of erroneous code is rising. Nevertheless, both approaches require some clean-up. + the title of the x-axis needs to be fixed in both visuals if a title is required at all + the title of the legend from the fold approach needs to be fixed + this can be done by renaming the column "key," using a "calculate" transform after the fold. This is my preferred solution + provide a custom title to the legend definition in the color block ## Sort, start with the most important thing There is more than one way to sort the data in visuals. The following image shows the starting point for sorting operations: ![[learning Deneb - sorting - the starting point.png]] ### Sorting, basics The screenshots are taken from the report page: "learn - simple bars sorted (Vega-Lite)"): | Sorted ... | example (how it looks) | how to do this | comment | | --------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | by a given order | ![[learning Deneb - sorting - by a given order.png]] | "y": { ... "sort": ["this", "something", "that"] ... } | Sorting by a given order can be used to implement a specific business logic or "visualize" an ordinal sequence | | by a numerical field (cost) | ![[learning Deneb - sorting - stacks sorted by sales.png]] | "y": { ... "sort": { "field": "Sales", "order": "descending"} ... } | This is probably the most often used sort "algorithm." | | by value, the bars and the segments inside the bars | ![[learning Deneb - sorting - segments sorted by sales.png]] | "encoding": { "y": { "field": "brand", "type": "nominal", "sort": {"field": "Sales", "order": "descending"} }, "x": { ... }, ... "order": {"field": "Sales", "type": "quantitative", "sort": "ascending"} } | Using sort inside the y-encoding and the separate order-channel helps to identify the importance of the different segments per brand more easily. | The Vega-Lite documentation: [Sorting | Vega-Lite](https://vega.github.io/vega-lite/docs/sort.html) # More complex examples This chapter shows how a combination of features, such as parameters and transforms, can work together. The examples might not create the most compelling data visualizations. Instead, it's about showcasing how features can work together. For one reason or another, all report pages prefixed with "working ..." can be considered more complex; at the current moment, only the first "working ..." is described in more detail. ## Independent legends, parameters, and data label placement The report page "working - bars with legends and parameters" shows the data visualization below: ![[Pasted image 20240902074954.png]] This is about creating independent color legends, and transform operations can be used to sort the segments inside the bars. The sorting of the segments is done either by name or by a value. ### Independent legends To get two legends based on the same encoding channel, namely color, requires the ```resolve```object setting the ```scale``` ```color```to independent: ``` "resolve": { "scale": { "color":"independent" } } ``` Each triangle has its distinct color defined, the next snippet shows the definition for the 25% triangle: ``` "value": "#C5CAE9" ``` The condition only controls if the mark appears, this depends on the parameters "show25percentMark", "show50percentMark", and "show75percentMark." Because the color of the triangles is defined using ```value```, no data is added to the data stream, meaning no legend will be created. This requires the manual creation of the legend, this happens in the encoding of the 75% mark: ``` "color": { "datum": "the75Percpoint", "condition": { "test": "show75percentMark === false || datum['segmentCount'] == 1", "value": "transparent" }, "scale": { "domain": ["the25Percpoint", "the50Percpoint", "the75Percpoint"], "range": ["#C5CAE9", "#7986CB", "#5C6BC0"] }, "title": "the marks" }, ``` Using ```"datum": "the75Percpoint"``` adds this datapoint to the datastream: a legend will be created with the title "the marks." ```"condition"```controls if the 75%-triangle appears or not. ```"scale":```finally controls the items in the "the marks" legend. The ```"domain":```array defines all items, including the 25% and 50% marks. The ```"range":```array defines the colors of the three items; of course, these colors have to match the colors of the "25%", "50%", and the "75%" marks. The shapes in the legend are automatically derived from the mark's shape: triangle-down. the link to all the available shapes of the ```point``` mark: [Point | Vega-Lite](https://vega.github.io/vega-lite/docs/point.html) Of course, all three marks use the same shape in the mark definition. ### Parameters that allow interactive control The above image's parameters "sortby" and "orderby" control how the bar segments are sorted. This snippet does the sorting in the general encoding section: ``` "order": { "field": "xSegmentSort", "type": "ordinal" } ``` The above snippet looks simple, and it is, but creating the field "xSegmentSort" is not. This is simply because having dynamic fields in the ```order``` object is impossible. For this reason, transform operations "calculate" the ordering position because ordering is dependent on two things: + if the ordering is dependent on the name ("female" and "male") or + the value ("Sales"). I'm doing this using the transform operation: ```window```. I use ```window```instead of ```joinaggregate```because ```window```also allows sorting. Using the transform operation ```"calculate": "...", "as": "xSegmentSort"``` utilizes the ```window```trransforms to define the sort order. ### Calculating the position of the segment label In the above data visualization, the label of the segments "female" and "male" also utilizes the same ```window```transforms. But now a 2nd ```calculate```is calculating the position for the label placement of the segments: ```"calculate": "...", "as": "xSegmentBase"``` The little snippet below shows how this field is used in the encoding of the ```text```mark that is writing the segment data labels: ``` "x": { "field": "xSegmentBase", "type": "quantitative" }, ``` The same x-position is used to "write" the value data label and the "percentage" data label. ## Bar chart with triangle slider + ((((xslider) --> xslider_zerocheck && triangle) --> sanitycheck_max) --> sanitycheck_max_prev) --> sanitycheck_max_prev_zerocheck)--> x1 + ( (xslider) --> xslider_zerocheck && triangle) --> sanitycheck_max --> x2 # The templates ## The rectangular pie chart Using triangle down markers allows easy discovery of the segemnts inside a single stack that contribute to a given cumulative percentage like 25%, 50%, and 75%. ==Describe where to find the template and how it can be used== # Simple things that are only simple if you know them ## The equality operator, or why should I use === instead of == If we do not have a Java Script heritage, we might stumble upon specific syntax (I did, and still do). One of these syntax things was using three equal signs instead of two. I decided on three equal signs because two equal signs are performing an object conversion that I do not want. This means three equal signs might not return true, whereas two will return true. This returns **false**: ``` ['this'] === 'this' ``` This returns **true**: ``` ['this'] == 'this' ``` The first snippet compares a single-valued array with a text. Because both objects are not of the same type, the equality comparison immediately returns false, even if the value of the array equals the text value on the right-hand side of the equality operator. My favorite explanation is given by "Bill the Lizard" here: [Which equals operator (== vs ===) should be used in JavaScript comparisons? - Stack Overflow](https://stackoverflow.com/questions/359494/which-equals-operator-vs-should-be-used-in-javascript-comparisons) ## The exclamation mark (!), the NOT operator in Java Script # Quick and dirty - the copy and paste chapter For now, all the snippets are unordered, and this might change if the number of C&P "hacks" is growing 😎 ## Create a "clean" data visualization by getting rid of all this x- and y-axis stuff This is on the report page: "learn - simple bars now beautified (Vega-Lite)." The chart can be beautified by adding the following snippet to the x and y encoding: ``` ... "axis": { "title": null, "domain": false, //the x- or y-axis "ticks": false, "grid": false, "labels": true } ... ``` Because there are negative and positive values, showing a vertical line at x = zero is often required. Two approaches can achieve this. The first one shows a grid line only at x = 0. In the encoding for the x-axis this line ``` "grid": false, ``` has to be replaced with these two lines: ``` "gridWidth": 1, "gridColor": {"expr": "datum['value'] === 0 ? 'darkgrey' : 'transparent'"}, ``` But there is another approach. Read this: [[#Create a constant line]] ## Color based on a condition This can be used to add coloring based on a condition. This is added to the encoding block; hence ```value```is required: ``` "color": { "value": {"expr": "datum['Sales'] <= 500 ? 'red' : 'green'"} ``` ## Create a color legend based on conditions and given items ``` "color": {"datum": "the75Percpoint", "condition": { "test": "show75percentMark === false || datum['segmentCount'] == 1", "value": "transparent" }, "value": {"expr": "colorTriangle"}, "scale": { "domain": ["the25Percpoint" , "the50Percpoint", "the75Percpoint"], "range": [{"expr": "colorTriangle"}, {"expr": "colorTriangle"}, {"expr": "colorTriangle"}] }, "title": "fancy triangle(s)" }, ``` A color legend wlll be created because of ```datum``` Because of ```condition```the colow will either be "transparent" (the item will not be visible) or the color will be read from a parameter/signal called "colorTriangle". Because of ```scale```the legend has three items, all have the color read from the parameter/triangle. Because of ```title```the legend has the title "fancy triangle(s)." ## Create a constant line My favorite approach to adding a y-axis is using a mark of type ```rule```. From my experience, this approach offers more "styling" options than the gridline approach. A vertical line can be added using the following snippet (adding a mark requires ```layer[...]```): ``` { "description": "the vertical line called y-axis", "mark": { "type": "rule", "color": "darkgray", "size": 2 }, "encoding": {"x": {"datum": 0}} } ``` ## Overlapping bars (horizontal bars) To create overlapping bars, it's required to change the yOffset of the mark; the offset is measured in pixels. Because the below snippet is inside the mark block the expression is **not** enclosed in the ```"value": { ... }``` key; the expression is sufficient. ``` "yOffset": {"expr": "datum['key'] === 'Sales' ? 3 : -3"} ``` # Resources ## Deneb repositories These are my favorite Vega and Vega-Lite repositories: + [Giammaria/Vega-Visuals: This is a growing compilation of my visuals written in Vega. Many of these visuals also have a Power BI file where the visual has been migrated into Deneb. My hope is that others can learn, critique, and gain inspiration from visiting this collection. (github.com)](https://github.com/Giammaria/Vega-Visuals) + [PBI-David/Deneb-Showcase: A collection of advanced dataviz examples using Vega, Vega-Lite, Deneb and Power BI. (github.com)](https://github.com/PBI-David/Deneb-Showcase)