Skip to content

In-Process DotNet DLL Workers (default)

The WorkManager supports running pre-compiled .NET DLL workers directly in-process via the in-process engine (Virtufin.WorkManager.Engine.DotNetDll.DotNetDllEngine). This is the default engine for the application/x-dotnet-dll MIME type as of LIBRARY_VERSION 0.0.59.

Architecture

┌────────────────────────────────────────────────────────┐
│  WorkManager (AOT-compiled native binary)              │
│                                                         │
│  ┌──────────────────┐  ┌────────────────────────────┐  │
│  │ DotNetDllEngine  │  │  CoreCLR (embedded via     │  │
│  │ (AOT code)       │  │  libhostfxr on first use)  │  │
│  │                  │  │                            │  │
│  │  LoadCodeAsync   │──▶  JIT-compile worker.dll    │  │
│  │  ProcessAsync    │──▶  Direct in-process call    │  │
│  │  (no socket)     │  │  (sub-microsecond)         │  │
│  └──────────────────┘  └────────────────────────────┘  │
│           │                                              │
│           ▼                                              │
│  ┌──────────────────────────────────────────────────┐  │
│  │  WorkerLoadContext (per worker, isCollectible)   │  │
│  │  Holds the worker assembly + its dependencies    │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

What you get

  • Sub-microsecond per-call latency. A direct in-process method dispatch, vs. ~100-1000 µs over a Unix socket + JSON serialization + subprocess scheduling for the out-of-process engine.
  • Shared runtime. All in-process workers share one CoreCLR instance. A WM with 10 in-process managed workers uses ~50-80 MB of working set, vs. ~300-500 MB for 10 separate subprocesses (one per worker).
  • No subprocess lifecycle. No health monitor, no socket PING/PONG, no restart on subprocess exit. The engine's LoadCodeAsync is the lifecycle boundary.
  • Lazy CoreCLR init. The .NET runtime is loaded into the WM process on the first LoadCodeAsync, not at WM startup. The cold-start cost (~200-500 ms one-time) is paid then.

What you give up

  • In-process fault isolation. A managed worker that throws StackOverflowException, OutOfMemoryException, or P/Invokes into native code that corrupts the process can take down the entire WorkManager. AssemblyLoadContext provides assembly isolation, not process isolation. Workers that need process-level isolation should opt back into the out-of-process engine (see below).
  • AOT purity, in practice. The engine code is AOT-compiled, but the WM process now contains a CoreCLR. Reverse-engineers will see a .NET runtime in the WM's memory; the on-disk WM binary is still AOT, but the process is mixed.

Setup

The WM looks for libhostfxr (hostfxr.dll, libhostfxr.so, or libhostfxr.dylib depending on OS) on the host. The runtime can be located via any of:

  • System install. A .NET runtime installed in the standard location (e.g. C:\Program Files\dotnet on Windows, /usr/share/dotnet on Linux, /usr/local/dotnet on macOS).
  • DOTNET_ROOT environment variable. Set this to the directory containing the .NET runtime.
  • Side-by-side. Place a host/fxr/<version>/ directory next to the WM binary.

The generated Virtufin.WorkManager.Engine.DotNetDll.runtimeconfig.json declares "rollForward": "Major", so a .NET 11+ runtime on the host can satisfy this .NET 10 build without rebuilding the WM.

If the runtime is not findable, the first LoadCodeAsync call fails with a clear error message. The WM itself starts fine without a runtime — the cost is paid only when a managed worker is first loaded.

Worker contract

The same Virtufin.Worker.DevKit.IWorker interface used by all other engines. No changes needed for existing workers.

Worker packaging

The same .nupkg layout used by the out-of-process engine:

worker.nupkg/
├── <id>.nuspec                       # declares id, deps, virtufin* extensions
└── lib/
    └── <tfm>/
        ├── <id>.dll                  # the worker assembly
        └── <dep>.dll                 # sibling dependencies (e.g. DevKit)

The <virtufinLibrary> extension element in the nuspec must match the worker DLL's basename (without extension). See dotnet-dll-workers.md for the full nupkg format.

Canonical-assembly invariant

The in-process engine relies on a load-order contract for type identity. For every assembly the worker DLL references:

  1. The WorkManager host process loads its own copy of the assembly into the default AssemblyLoadContext (because the WM's own code references it, e.g. CloudNative.CloudEvents in WorkerBase.BuildResponse, Google.Protobuf in gRPC stubs).
  2. The worker ALC's Load(AssemblyName) is invoked for the same assembly reference.
  3. If the simple name matches an assembly the host already loaded, the host's instance is returned. This guarantees that the worker's CloudEvent parameter (worker ALC) and the CommandWorker<T>.HandleCommandAsync abstract method's CloudEvent parameter (default ALC, via DevKit aliasing) are the same Type instance. The override binds correctly.

If the host doesn't have an assembly of the given name, the worker ALC falls back to the nupkg's sibling DLLs (e.g. a worker-private helper assembly). This path produces a separate Type instance in the worker ALC — fine for worker-local types, but the worker must not depend on these being identical to anything in the host.

Practical implication for worker authors: package every assembly your worker needs in the nupkg. The host will alias the ones it already has; the ones it doesn't will load from the sibling bytes. Don't assume the host has an assembly you didn't bundle — the worker ALC will fail to load and you'll see a FileNotFoundException in the loader diagnostics.

Loader-error surfacing

When a worker assembly can't be loaded, the engine throws Virtufin.WorkManager.Engine.DotNetDll.InvalidWorkerException whose message lists every loader error from ReflectionTypeLoadException.LoaderExceptions. The InnerException is the original ReflectionTypeLoadException for programmatic inspection. Common failures you'll see:

  • Could not load file or assembly 'X' — a referenced assembly is missing from both the host and the nupkg siblings. Add the assembly to the nupkg's lib/<tfm>/ directory.
  • Method 'HandleAsync' ... does not have an implementation — pre-0.0.60 failure mode. Fixed in 0.0.60 by the canonical- assembly invariant. If you see this on 0.0.60+, your worker is referencing a Type from a sibling-only assembly in an override signature; bundle that assembly in the host too (or restructure the override to use BCL types only).

Caveats and known issues

  • Thread-static state: AssemblyLoadContext reload does not preserve [ThreadStatic] state across LoadCodeAsync calls. Workers that depend on per-thread state should reset it on each ProcessAsync call.
  • First-load latency: the first LoadCodeAsync for a managed worker pays a one-time ~200-500 ms CoreCLR init cost. Plan accordingly for cold-start-sensitive deployments.
  • No multi-runtime: the WM process can have only one CoreCLR instance. If you need to load workers targeting different .NET major versions in the same WM process, this is not yet supported.