The API for atomics should be the same regardless of the platform. Macros may be used.

If C11 atomics are available, they should be used. However, if atomics are not available on a platform directly, they should be simulated however possible, even at the cost of performance.

The atomics API should have the same memory orderings as C11, except for consume. (However, while consume may not exposed as an API, it could be used internally should conditions allow it.)

The API should not require passing the memory ordering, like C11 does. Instead, it should have separate functions for each memory ordering.

Structured Concurrency

  • The “nursery” should be called threadset.

  • Implement a “helper stack”:

    • This is only meant to be for things that should be owned by the stack but need to be heap-allocated and does not need to grow. (For example, the actual array for vectors should not be allocated from this area.) This should be most stuff that is not owned by heap-allocated stuff.

    • Should be a linked list bunch of allocations.

    • Each allocation should be 16 kb (4 pages).

    • Allocations should be listed as 16-bit sizes in their own stack.

      • Size will only need 12 bits.

      • The other bits should be used as flags.

    • Any allocation equal to or greater than 4 kb should be shuffled off to the general allocator, and flags used to make that clear.

      • This will ensure that there will at most be 1/4 fragmentation.

      • It will also mean that a lot of the bigger allocations will use whole pages, reducing (hopefully) the fragmentation.

    • When an allocation from the 16 kb will overflow, the stack should ask for another 16 kb and add it to the list.

      • When possible, mmap() (or equivalent) should be asked for the next 4 pages.

      • If they are given, then the stack should use the extra space in the existing allocation and overflow.

    • Should provide routines for automatically marking places on the stack, for when users enter separate scopes that they may want to free once exited.

    • Should provide a routine for automatically free()’ing everything in the function, to be called on return.

      • This should also be capable of calling destructors in general, even if no stack memory is freed, like for an open file. In that case, store 0 in the size stack.

      • Also, if I let stack allocators move stuff to the general allocator, I would need a way of marking “don’t do anything; just decrement the stack pointer”.

    • When a function is first called, it should be allowed to allocate things that it expects to return to its parent.

      • For things returned to the parent, they should be marked.

      • They should still be deallocated on error, or specifically marked to not be so.

      • Once an allocation is given that will not be returned to the parent, marking children thus should not be allowed until a new function is entered.

      • This system provides no real way to not deallocate stuff on error, but I am okay with that.

  • Threads should have their own “helper stacks”.

  • Threads should have their own error handler stacks (see the error requirements), but they should be able to call their parent and ancestors’ handlers.

  • Applications should be able to establish context stacks.

    • Context stacks should include things like the current general allocator stack, current stack allocator stack, error handler stack.

    • Users should be able to specify their own stacks.

    • Each thread should have its own.

      • If a context stack is created when multiple threads already exist, it will probably have to create the stack in all threads. It will have to lock them all (individually) to do it.

      • Threads should be able to search parent stacks, like error handling.

    • Popping something off the context stack should be a destructor, so that it can be done automatically by the helper stack.

  • Add a way to guard things by a mutex (reader/writer?).

    • The mutex should “own” the item (allocate enough space for it on the stack, for example).

    • Need to ensure alignment for the item (go to 2*sizeof(long) for C99?).

    • Need two types:

      • One for borrowers. The Destructor should unlock the mutex.

      • One for the owner. The Destructor should destruct the subitem.

        • Should not be able to lock the mutex.

        • If the owner wants to lock the mutex, it should convert it to a locking version.

        • This version should not be passed to borrowers.

  • Accept functions that take two arguments:

    • The first is a pointer to closure data (how Yao closures will be implemented).

    • The second is a pointer to the data passed to the thread creation function.

  • Threads should be implemented with a thread function.

    • Thread function should wait for execution.

      • Use semaphores.

      • Function to execute should be part of thread-local data, set by calling thread.