Navigating through uncertainty in open source

date

Dec 22, 2025

slug

open-source-uncertainity

author

status

Public

Designing a Parallel Framework

Adding parallelism is not just about making code faster. You first have to decide how work should run in parallel.

Different approaches behave differently. We primarily investigated two paradigms: thread pools and work-stealing schedulers.

How work is split across threads

Thread pools use a fixed number of worker threads and a shared task queue. Threads pull tasks from the queue and, once finished, wait for more work.

Work-stealing schedulers give each worker its own queue. A worker executes its local tasks first and steals work from other workers when it becomes idle. (This is a simplification; real implementations are more involved.)

How well load is balanced

Thread pools rely on an even initial distribution of work. If tasks vary in runtime, some threads may sit idle while others do most of the work.

Work-stealing schedulers handle uneven workloads better, since idle workers can steal tasks from busy ones.

Thread pools also have other drawbacks: if not designed carefully, they can deadlock, and they do not support nested parallelism well because tasks are usually assigned only once.

Despite these limitations, we chose thread pools in Raven due to their simplicity and because our main use case—matrix operations—rarely has highly unbalanced workloads.

Debugging behavior

Ideally, a parallel library should prevent users from shooting themselves in the foot. When things do go wrong, however, failures can be hard to debug. Bugs may fail silently or surface in unexpected places.

In practice, debuggability often comes down to how simple the implementation is. Thread pools usually win here because their behavior is easier to reason about.

Work-stealing schedulers are more complex. They often rely on internal bookkeeping—such as tracking how long tasks run and deciding when to migrate them—which makes failures harder to understand and reproduce.

Using Unboxed Types

Unboxed types improve performance by avoiding extra memory allocations. Heap allocations involve pointer dereferences, which can make reads 5–10× slower in tight loops. By keeping values unboxed, loops run faster and the garbage collector has less work to do.

These gains come with trade-offs. Unboxed types break polymorphism, which in turn breaks many abstractions. Common data structures and pattern matching often stop working or require special handling.

As a result, unboxed types are most useful in hot code paths, where small performance gains add up and justify the loss in flexibility.

Working with the OxCaml Extension

OxCaml provides more low-level control than standard OCaml, but it also introduces uncertainty.

Documentation is limited, examples are sparse, and error messages are not always helpful. To understand how something really works, you often have to read compiler code or tests directly.

This slows development at first, but it also forces a deeper understanding of the system. Over time, this makes it easier to work confidently in less well-defined parts of the codebase.