The Eight Stages - PKC - Obsidian Publish

This article is derived from [[2023_Dec_08]]'s original work. It explains the proposed eight stages in a typical [[DevOps]] iterative process. The thesis is that all stages are composed as iterative stages of a unifying timeline, making up the **[[Continuation|continuous]] integration and [[Continuation|continuous]] delivery** data processing pipeline. Recent advancements in [[Multi-modal Large Language Model]] also enable faster cycle times for semantic judgments, thereby lowering the efforts in data content alignment within the overall data pipeline. [[The Eight Stages]] of outline can be visualize as follows: ![[HolyTrinity_DevOps.png]] It is also well known that the [[DevOps]] cycle is related to [[Kuhn Cycle]]. # Naming the Eight Stages 0. [[No Attention, Yet!]]: At this level of data quality, anything goes, mostly unaware to the data owner. It may also be considered as the [[exploratory data collection]] stage. 1. [[Coaching]]([[Subjective Interpretation]]): This is where some project or data can be claimed with human or organizational judgment. This judgment can be very primitive, and it may use the definition of [[judgment|judgment in logic]]. The important fact is that when this subject interpretation gets registered, it will automatically get a timestamp, which defines its [[context]]. It can be any note or card written in haste. 2. [[Lean Operations]] ([[Logical Test Preparation]]): [[Logic Model]] enters this stage, when a [[Hub/Theory/Category Theory/Logic/Hoare triple|Hoare triple]] is defined for this note or card in [[PKC]]. A set of default logic model content should be automatically associated with this note. It is like the basic metadata associated with a [[File]] in a typical file system. 3. [[Software Delivery]] ([[Labeled Test Event]]): A set of formulated tests must be associated with this data point, from this stage, some test must be automatically conducted, and some test events should be generated to test the Logic Model. 4. [[BDD ready]] ([[Labeled Interpretation]]): Some [[Accountability]] should be associated with these tests. This is basically a form of [[RLHF]]. 5. [[Effective Change]] ([[Documented Computation]]): This is where the human players during the [[RLHF]] should learn and try to come up with some comments during this stage. The written notes and interpretation will be compiled into the knowledge system as a part of the Documented Computational artifacts. 6. [[DevOps Qualified]] ([[Computed Interpretation]]): After enough human-provisioned inputs are given to the [[PKC]] for the previously refined data set, a multi-signature quality control process will follow to compile and compare the entire data set before submitting for publish consumption. The multi-signature quality control process can be emulated by multiple [[LLM]]-powered agents, and they can simulate and be programmed to judge whether the content quality is good enough or not. If they all pass the tests, the final result will be submitted for the next stage. 7. [[DNA Stage]]([[Publicly Interpreted Data]]): At this stage, data will be released to the public, it will likely to attain some reviews and feedback. Some forms of comments and online feedback can be captured using a web-based system, and this completes one cycle of the content deployment process. The eight stages laid out in a sequential diagram with known [[DevOps]] tools associated with them: ![[RefiningCorrectness_In_DevOps.png|800px]] ## Correctness as Composition of Arrows It is very important to know that this eight stage process is a composition of eight individual [[Hoare triple]]. See the following diagram: ![[HoareTripleDiagram.png|800px]] ### Funneling control to one data type: Arrow In system design, we strategically choose a **unifying [[type]]** of control or data flow. This concept of funneling processes through a single structure, often visualized as a **funnel** or hourglass, is a well-established practice in systems engineering. This recurring theme appears in [[DevOps]] lifecycles, causal state space analysis, and networked data governance. Each function (or [[arrow]], [[Hoare triple]]) can be conceptualized as an hourglass, with its behavior serving as the control mechanism ([[control plane]]) regulating the interaction between the [[data plane]] and [[application plane]]. This idea is illustrated in the following diagram: ![[HourglassModel_DevOps_Cone_Bowtie.png|800px]] ### Representing Correctness in a concrete form Before we explore the [[Logic Model]] as a data submission form, let's delve into the functional aspects of a workflow's controlling engine. The following diagram depicts a layered platform with three core elements: Input, Process, and Output. Vertically, these layers can be mapped to [[Context]], [[Goal]], and [[Success Criteria]]. This abstract interface, akin to an execution engine, carries out the intentions of participating agents. It embodies three layers of correctness: - **Input Correctness:** This ensures the information entering the platform ([[Logic Model#Inputs|Inputs]]) is accurate and aligns with the intended use case defined in the [[Context]]. - **Process Correctness:** This guarantees the processing steps ([[Logic Model#Process|Process]]) adhere to the designed logic and effectively transform the Input data to achieve the desired outcome. - **Output Correctness:** This verifies the final results ([[Logic Model#Outputs|Outputs]]) match the predefined [[Success Criteria]], fulfilling the purpose of the workflow. By ensuring correctness at each layer, the platform guarantees the overall functionality aligns with the intended goals. ![[Three_Levels_Of_Correctness.png|800px]] ## Encode Hoare Triple in Logic Model ![[LogicModel_SampleForm.png|800px]] ## Illustrations of the Eight Stages The eight categories of data can be shown in the following diagram: (Go to the diagram: [[EightStagesOfMLOps.excalidraw|Link]]) ![[EightStagesOfMLOps.excalidraw.svg|800px]] **Visualizing Decision-Making Strategies with a Venn Diagram** The three circles in the included Venn Diagram represent key stages in a decision-making process: [[Subjective Interpretation]], [[Logical Test Preparation]], and [[Labeled Test Event]]. The dashed line indicates a potential flow during the process, returning to "drawing board" after data release signifies a potential restart if needed. See [[@SeeingOrganizationalPatterns1995|Seeing organizational patterns: a new theory and language of organizational design]]. **Numbered Sequences within Overlaps** The diagram employs a clockwise numbering scheme, starting with minimal overlap areas (1) and progressing towards the central area (3) with increasing overlap. This highlights the sequence in which decision points might be encountered, starting with [[Subjective Interpretation]] (1), followed by [[Logical Test Preparation]] (2), and culminating in the [[Labeled Test Event]] (3). **Naming the Sequences: Preference Orders and Strategies** While the numbers simply reflect the order, different names could be assigned to each sequence. Each named sequence would represent a specific decision-making preference order (e.g., "Intuition First" for Subjective Interpretation followed by testing). This allows users to identify a strategy that suits their needs based on their preferred order of addressing these decision points. **Mapping the Path Dependencies** Naming each decision sequence highlights its underlying path dependencies, making each a unique option within a 'namespace' of alternative approaches. This allows us to directly compare different decision-making strategies, as each named sequence maps to a specific, defined method. With three nameable areas in the diagram (and '[[Top]]' and '[[Bottom]]' fixed), there are six possible sequences, representing six distinct approaches that could be named and analyzed. (see [[Counting]].) ### A Concurrent Way of Classifying Decision Processes ![[TriVennDiagramAsLattice.excalidraw.svg|800px]] Go to the [[TriVennDiagramAsLattice.excalidraw|diagram]] Interestingly, this diagram's structure mirrors that of a multi-layered [[neural network]]. Both employ a series of staged filters to process and refine information. In a neural network, each layer applies computations to the data, leading to a progressively more acceptable representation. Similarly, The Eight Stages method uses **layered sequences** (sequences that allows for [[concurrency]]) to transform and clean data, leading to higher-quality results. This shared filtering logic makes the neural network analogy particularly apt. This diagram also informs us to think about the definition of these **ordered entries of tasks**. You might notice the [[Lattice]] pattern and the [[LatticeOfErrorsAndSigns.png|lattice graphics]] in the upper left corner of the diagram, figure 15 in [[@CousotCalculationalDesigna|Cousot's paper]]. ![[LatticeOfErrorsAndSigns.png|center]] This diagram underscores the importance of rigorous data analysis and verification, highlighting a stepwise progression of computable judgments from '[[Bottom|BOT]]' to '[[Top|TOP]]'. Modern tools like [[LangChain]]/[[LangGraph]] and [[LlamaIndex]], integrated with frameworks like [[Fabric]], can significantly enhance the efficiency of this process. By labeling each of the eight stages with human-readable terms and employing a structured framework for concurrent data verification, we gain a deeper understanding of data processing [[correctness]] – a clarity made possible through the lens of [[abstract interpretation]]. This unified concurrent model allows users to execute all eight stages of data content verification in both an efficient and systematic manner. ## Another Tri-folded Diagram Look at the following Venn-Diagram: ![[dna-small.png]] ### Another very interesting picture ![[Blockchain_AI_Human_3Ciricles.png]] I got this image from this link: [Using DuckDB for Embeddings and Vector Search](https://blog.brunk.io/posts/similarity-search-with-duckdb/) ## The Calculational Design Model of $AI^2$ After studying the seminal paper "[[@CousotCalculationalDesigna|The Calculational Design of a Generic Abstract Interpreter]]" by [[Patrick Cousot]], the alignment between the [[bridgelet]] and the design principles outlined in this work became even more apparent. The eight stages of tasks represent the interpretive actions within a continuously updated data refinement program. A visually revealing diagram would illustrate how each error type can be determined within a [[Lattice]] structure. (also see [[@baezPhysicsTopologyLogic2009|Logic and Computation: A Rosetta Stone]] by [[John Baez]]) The lattice diagram shown above can be compared with [[BDD]]'s eight regions, and the Eight Stages in a tabulated fashion: | | The Eight Stages | BDD | Abstract Interpretation | |:----|:--------------------|:-----------------------------|:-----------------------------| | 0 | No Attention | NULL | BOT, ARE | | 1 | Subjective Interpretation | Coaching | POS | | 2 | Logical Test Interpretation | Lean Operations | NEG | | 3 | Labeled Events: Passive| Software Delivery | ZERO | | 4 | Labeled Interpretation | BDD ready | NEGZ | | 5 | Labeled Computation | Effective change | NZERO | | 6 | Computed Interpretation | DevOps qualified | POSZ | | 7 | Interpreted-Computed Label | DNA stage | INI, TOP | ## Leveraging a Unified Semantic Model Please see [[Semantically Consistent Workflow with BDD]] or a summary of this approach as transcluded here: ![[Behavior-driven development#Apply BDD in the Context of Generative AI]]. # Conclusion In conclusion, the eight stages of the [[Data Life Cycle]], viewed through the lens of data cleansing, are essentially about proactively detecting and rectifying errors before they reach production (the [[Top]]). This same interpretive framework can be harnessed to automate the reduction of known errors within the realm of [[MLOps]]. The core challenge lies in organizing and managing well-defined [[Logic Model]]s that capture the different error types encountered across diverse datasets. It is crucial that domain experts incrementally define these error types, and that the corresponding user interfaces are designed with simplicity in mind. This ensures users can consistently understand the prompts they are given and provide less biased [[judgments]]. By revisiting the [[### A Concurrent Way of Classifying Decision Processes|lattice diagram]], we can see why [[Abstract Interpretation]] is relevant to this process. The power of the [[Curry-Howard-Lambek isomorphism]] lies in its ability to organize and leverage logical inclusion and causal/sequential dependencies in various ways, making it a valuable tool for reasoning about error reduction in MLOps. # References ```dataview Table title as Title, authors as Authors where contains(subject, "Eight Stages") or contains(subject, "Lattice") or contains(subject, "Triadic") or contains(subject, "Concurrency") and !contains(subject, "易中天") or contains(subject, "Three Circles") ```