JETLS environment isolation - JETLS

- On-disk UUID rewrite (easiest) - Which package to be rewritten? Base shouldn't be rewritten - On-memory UUID rewrite - Infeasible at this moment - Or, implement an additional mechanism in Julia code loading mechanism for unshared dependency loading - Use `uuid5` for this (instead of `uuid4`) - A master key + old UUID => A separate UUID (deterministic) - Share precompilation files? - Maybe we should talk with Kristofer - Could be useful in general? --- # Problem statement When developing a Julia language server (e.g., JETLS.jl), a critical issue arises when the language server's dependencies conflict with the packages being analyzed. For example: - JETLS depends on JuliaInterpreter v0.10 - The package being analyzed (e.g., Revise) depends on JuliaInterpreter v0.9 Since Julia's code loading mechanism uses PkgId (UUID + name) as the identity of a package, only one version can be loaded in a process. This causes conflicts because: 1. The language server needs its specific version for analysis tools 2. The analyzed package needs its version for correct runtime behavior 3. Both must coexist in the same process (the language server needs to introspect the analyzed package directly using Julia's compiler infrastructure) The goal is to isolate the language server's dependencies completely, making them independent from the target package's dependencies, while keeping both in the same Julia process. # On-memory UUID rewriting approach ## Core concept Rewrite package UUIDs in memory after loading, rather than modifying on-disk files. This allows the same package to be loaded twice with different UUIDs. ## Key data structures 1. `PkgId` (base/pkgid.jl:3-9): - Struct containing `uuid::Union{UUID,Nothing}` and `name::String` - Packages are identified by the combination of UUID + name - Equality and hashing are based on both fields 2. `loaded_modules` (base/loading.jl:2557): - `Dict{PkgId,Module}` mapping package identifiers to loaded modules - Primary registry for loaded packages 3. `loaded_precompiles` (base/loading.jl:2558): - `Dict{PkgId,Vector{Module}}` for precompiled modules - Must be cleaned up when rewriting UUIDs 4. `loaded_modules_order` (base/loading.jl:2559): - `Vector{Module}` for GC rooting - Does not need to be cleaned up ## Module UUID storage Modules store their UUID in C struct (src/julia.h:833-856): ```c typedef struct _jl_module_t { ... jl_uuid_t uuid; ... } jl_module_t; ``` Two C functions are available for UUID manipulation: - `jl_module_uuid`: Get module's UUID - `jl_set_module_uuid`: Set module's UUID ## Implementation ```julia function isolate_package!(original_pkgid::Base.PkgId) @lock Base.require_lock begin mod = Base.maybe_root_module(original_pkgid) mod === nothing && return # Generate new UUID (deterministic) new_uuid = Base.uuid5(original_pkgid.uuid, "JETLS-isolated") # Rewrite module's UUID ccall(:jl_set_module_uuid, Cvoid, (Any, NTuple{2, UInt64}), mod, convert(NTuple{2, UInt64}, new_uuid)) # Remove from loaded_modules delete!(Base.loaded_modules, original_pkgid) # Remove from loaded_precompiles (important!) if haskey(Base.loaded_precompiles, original_pkgid) delete!(Base.loaded_precompiles, original_pkgid) end # Note: loaded_modules_order does not need cleanup # It serves as GC root and will contain both modules # Register with new UUID new_pkgid = Base.PkgId(new_uuid, original_pkgid.name) Base.loaded_modules[new_pkgid] = mod end end ``` Integration with JETLS: ```julia const JETLS_DEPENDENCIES = Set([ "JuliaInterpreter", "LoweredCodeUtils", # ... other dependencies ]) function __init__() # Isolate all JETLS dependencies after loading for (pkgid, mod) in collect(Base.loaded_modules) if pkgid.name in JETLS_DEPENDENCIES isolate_package!(pkgid) end end end ``` ## How it works 1. JETLS and its dependencies are loaded normally with their original UUIDs 2. In `__init__()`, all dependency UUIDs are rewritten in memory 3. When user loads a target package that depends on the same packages: - `identify_package_env` reads original UUID from Manifest.toml - `maybe_root_module` looks for original UUID in `loaded_modules` - Original UUID not found (it was rewritten) - Package is loaded again with original UUID 4. Result: Two independent copies of the same package exist in memory ## Precompiled cache limitation Goal: Load both module copies (JETLS's dependency and target package's dependency) from precompiled caches. What actually happens: - First copy (JETLS dependency): Loads from precompiled cache successfully, then UUID is rewritten - Second copy (target package dependency, original UUID): Cannot use precompiled cache, loads from source Why the second copy cannot use precompiled cache: Precompiled caches are created in isolation without knowledge of runtime state. When the system attempts to load the second copy from cache: 1. The cache was created when no module with that name existed 2. At runtime, a module with the same name already exists (with different UUID) 3. Julia's C-level module loading detects this name conflict 4. Error: "Error reading package image file" 5. System falls back to loading from source This limitation is fundamental to how precompiled caches work - they cannot account for the specific runtime scenario where the same package name exists with different UUIDs. Result: A warning appears, but functionality is correct. The second copy loads from source instead of cache. ## Performance implications This affects not just the target package being analyzed, but also its dependency tree: - Any package that depends on an isolated package (e.g., JuliaInterpreter) cannot use its precompiled cache - The package must be loaded from source instead - This applies transitively: if package A depends on JuliaInterpreter, and package B depends on A, both A and B may need to load from source Impact on language server workflow: - JETLS may not use precompiled cache for the target package itself (depending on implementation) - However, the target package's dependencies are loaded via the normal code loading mechanism - These dependencies typically benefit from precompiled caches for faster loading - UUID rewriting breaks cache compatibility for any dependency that conflicts with JETLS dependencies - This results in slower loading times for the target package and its dependency tree Trade-off consideration: - Benefit: Complete isolation prevents dependency version conflicts - Cost: Slower loading due to source compilation of conflicting dependencies - The severity depends on how many dependencies in the target package's tree conflict with JETLS dependencies ## Advantages 1. No on-disk file modifications required 2. No complication to release/build process 3. Works entirely at runtime 4. Complete isolation of language server dependencies 5. Simple implementation (< 20 lines of code) ## Limitations 1. Precompiled caches cannot be reused for reloaded packages (loaded from source) 2. Memory usage increases (multiple copies of same package) 3. Must be applied consistently to all transitive dependencies 4. Performance impact on loading target packages and their dependency trees # On-disk UUID rewriting approach (vendoring) ## Core concept Instead of modifying UUIDs at runtime, rewrite UUIDs in the on-disk files before loading. This creates truly independent packages from Julia's perspective. ## Directory structure ``` JETLS.jl/ ├── src/ │ └── JETLS.jl ├── vendor/ │ ├── JuliaInterpreter/ │ │ ├── Project.toml # UUID rewritten │ │ └── src/ # Source code (symlink or copy) │ ├── LoweredCodeUtils/ │ │ └── ... │ └── CodeTracking/ │ └── ... ├── Project.toml # Depends on vendored versions └── Manifest.toml ``` ## Implementation UUID rewriting in vendor/JuliaInterpreter/Project.toml: ```toml name = "JuliaInterpreter" uuid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # New UUID (not the original) version = "0.10.0" [deps] # Original dependencies (or vendored versions if they are also vendored) ``` JETLS Project.toml: ```toml [deps] # Vendored dependencies with new UUIDs JuliaInterpreter = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" LoweredCodeUtils = "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy" # Other dependencies... ``` ## How it works From Julia's code loading perspective: ```julia # Original package PkgId(UUID("aa1ae85d-cabe-5617-a682-6adf51b2e16a"), "JuliaInterpreter") # Vendored version PkgId(UUID("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"), "JuliaInterpreter") ``` These are completely different packages. Both can coexist in `loaded_modules`. When loading: ```julia using Revise # Loads original JuliaInterpreter with original UUID using JETLS # Loads vendored JuliaInterpreter with new UUID ``` No conflict because PkgIds are different. ## Automation requirements A vendoring script must handle the following: 1. Dependency resolution - Identify all packages to be vendored - Resolve dependencies between vendored packages - Determine which dependencies should use vendored vs. original versions 2. UUID generation - Generate new UUIDs deterministically (e.g., using `uuid5`) - Maintain consistent UUIDs across releases 3. File operations - Copy or symlink source files to vendor directory - Rewrite UUIDs in Project.toml files - Update dependency references in Project.toml to use new UUIDs where appropriate 4. Dependency graph handling - If vendored package A depends on package B, and B is also vendored: - Update A's Project.toml to reference vendored B's new UUID - If vendored package A depends on package C, and C is NOT vendored: - Keep A's dependency on original C unchanged 5. JETLS Project.toml update - Reference all vendored packages with their new UUIDs - ~~Ensure Manifest.toml is regenerated correctly~~ ## Advantages 1. Complete isolation - JETLS dependencies and analyzed package dependencies never conflict 2. Uses standard code loading mechanism, no [[#Precompiled cache limitation]] 3. Deterministic builds - vendored dependencies are version-locked 4. Simplified distribution - easy for [[Bundle JETLS.jl into jetls-client]]: The entire package (including vendor/) can be distributed as-is. Users simply install JETLS and all vendored dependencies are automatically included with their rewritten UUIDs. ## Limitations 1. Recursive vendoring - if vendored packages share dependencies with the analyzed package, those shared dependencies may also need vendoring 2. Disk space - vendored packages duplicate source code (though usually small, < 1MB per package) 3. Updates - when updating dependencies, re-run the vendoring script (should be part of release process) 4. Complexity in release process - requires running vendoring script before each release # Alternatives considered ## Separate Julia process Run language server in separate Julia process with IPC communication. This approach does not work: The language server needs to use Julia's compiler infrastructure (type inference, JuliaInterpreter, etc.) to analyze user code. This requires the language server's dependencies and the user's packages to coexist in the same Julia process, so the analysis tools can operate on the loaded user code. ## Custom `DEPOT_PATH` Use separate depot directory for language server. This approach does not work: The language server needs to access the user's packages to analyze them. A separate `DEPOT_PATH` would completely isolate the language server from the user's environment, making it unable to see or load the target packages. # Conclusion ## Recommended strategy For language server development, use different approaches at different stages: Development stage: - No isolation - develop and test with shared dependencies - Simpler workflow, faster iteration - Dependency conflicts are acceptable during development - Easy to debug and modify dependencies Release stage: - Apply on-disk UUID rewriting (vendoring) - Complete isolation for production use - Maintains precompiled cache performance for end users - Deterministic builds for reliability ## Why on-disk vendoring for releases 1. Performance - target packages load normally from precompiled caches 2. Reliability - users get the exact dependency versions JETLS was tested with 3. Simplicity for users - no runtime overhead or warnings 4. For distribution - vendored dependencies can naturally be [[Bundle JETLS.jl into jetls-client|bundled]] ## Why not on-memory approach The on-memory approach was investigated but has fundamental limitations: - Second copy of shared dependencies cannot use precompiled caches - Must load from source, impacting performance - Warning messages on every load (though functionality works correctly) - Performance penalty proportional to dependency overlap The investigation demonstrated that on-memory UUID rewriting is technically feasible and functionally correct, but the precompiled cache limitation makes it unsuitable for production use where loading performance matters.