Design

Rationale

I need to write a rationale for Yao, like Ada’s.

Goals

Yao’s biggest goals are:

  1. Correctness

  2. Convenience

  3. Performance

in that order.

To achieve safety, Yao is intended to eliminate any unnecessary dependency on unsafe-by-default languages, especially C/C++. This is an important point that will influence design decisions later.

File Extension

The file extension used for Yao code will be .y. Yes, I know that that extension is used by Yacc and Bison, and I don’t care. They are not going to be used with Yao code, since they generate C/C++, and they can use .yy extension if absolutely necessary.

However, Yao scripts will have the .yao extension.

Requirements

Security Through Capabilities

Yes, this is so important that it’s first, before even the actual functionality of Yao. It’s because no language in the world has the ability to reduce the blast radius of malicious code in a satisfactory way.

So Yao needs to have capabilities, which are items to reduce the permissions of specific pieces of code.

However, at this point, I do not have a way to enforce runtime restrictions in compiled code. This gives a quandry because I still want users to be able to reduce the blast radius of compiled code.

So I decided to split the capabilities into two types: static and dynamic.

Static capabilities can be enforced by the compiler at compile time. These include restricting:

  • Use of keywords.

  • Use of functions.

  • Use of types.

  • Accessing the private members of items. (Yes, this should be possible.)

  • Use of plain Yvm code. (This is equivalent to arbitrary code execution and should be considered the top capability to have.)

  • Use of custom compiler passes. (This is equivalent to arbitrary code execution and should be equivalent to use of plain Yvm code. Of course, if those passes are in Yao, the passes themselves might be restricted, and as such, this wouldn’t be as bad as plain Yvm code.)

  • Casting integers to pointers. (This can break protection around private data, which leads to arbitrary code execution.)

  • Casting pointers to integers. (This can break protection around private data, which leads to arbitrary code execution.)

These are not all of the examples; more will be added as I can think of them.

Because first-class functions (function pointers) could get around these restrictions by turning them into runtime issues, Yao needs to also restrict anything that cannot use specific functions from using first-class functions.

Alternatively, if possible, the compiler could make it so that the code that passes the function to another function must have a capability, a “delegation capability” to pass the capability needed by the passed-in function to the function(s) it is passing the function into. (This is probably the better design.)

Dynamic capabilities cannot be enforced either by the compiler or by an internal runtime at runtime. (Or at least, I can’t think of a way to do it yet.) These include restricting:

  • Exactly which files can be read/written (whitelist or blacklist).

  • Exactly which external commands can be run (whitelist or blacklist).

  • Basically anything that touches the outside world.

  • Exactly which integers can be cast to pointers.

  • Exactly which pointers can be cast to integers.

  • Basically anything that can get around protections, including bounds checks.

These can only be enforced by a sandboxing interpreter that can intercept any syscalls that might do anything outside of the given capabilities for that code.

Dynamic capabilities are subsurvient to static capabilities: if some code does not have permission to call the functions to open a file, then at runtime, the code does not have permission to open any file. Obviously, this is enforced at compile time by causing a compile error should that function be called in the source code.

Now, how should capabilities be handled? How should they be given? How should they be created?

First, should capabilities be given out per-function, per-package, or what?

Capabilities should be given per-package. The reason for this is because one package is the responsibility of one entity and should be one cohesive unit of code for particular tasks. If anything in a package needs a capability, more items will need that capability, with high probability.

In addition, a package needs the same capabilities as all packages it imports.

Second, should capabilities be created per-function, per-package, or what?

They should be created per-item. Items include: types, functions, keywords, etc. And this should be recursive: if a type has public internal functions, then other code needs to have a capability for the type to use the type and for the functions inside in order to call those functions on items of that type.

(Obviously, the default should be that if a capability is given for a type, then a capability is given for each internal function unless otherwise marked.)

Besides the default above, the default should be that capabilities are not given by default; they must be explicitly given. Well, rather, that should be the default with the yao interpreter command. There can be another interpreter command (yao-permissive/yaop?) that defaults to everything on. These two are separate because Yao can be used both by downloaded code (which need the restricted version) and by personal scripts (which could probably use the permissive command).

Also, these capabilities should include basic stuff like if statements, loops, and other keywords like that, as well as lexer modes, including the command parsing mode.

To register a static capability for a function, keyword, type, etc., there should be a keyword (permission?) used much like the pure or mut keywords on functions, which takes an identifier name. The name is the name by which the capability is known. For a package to be given that capability, there should a GAML file named the same as the fully-qualified reverse domain name of the package with a GAML array named permissions, and the capability name should be in the array.

To register a dynamic capability, which are generally per-function, there should be a keyword like the keyword mentioned above. It should take a name.

When restricting code, a function is pushed onto the context stack for that capability. That function should take the same number, order, and type of arguments as the target function and should return a boolean. The function should return true if the clients should be given the capability, and false otherwise.

To use dynamic capabilities, code should use context stacks. When the interpreter is asked to run a function that has registered a dynamic capability, it first runs every one of the functions in the context stack for that capability, using the arguments that would be passed to the function that registered the capability. If any of the functions return false, the capability is denied. If there is nothing in the context stack, the capability is denied. Otherwise, if there is something in the context stack, and all functions in the stack return true, then the capability is granted, and the function is run normally.

C Interoperability

Unfortunately, because of the way Windows, MacOSX, and even OpenGL on Linux work, C interoperability is still a must. Therefore, Yvm assembly will have the ability to declare external functions that will be assumed to have the C calling convention.

Testing

There must be a way for functions/classes/structs, etc., to specify an API for testing against their failure modes, e.g., malloc() returning NULL or I/O functions returning failure. See the SQLite testing document for details.

In fact, it is a design flaw in Yao if adding such automated testing support, for everything (including things that are considered inherently not automatable) is not only possible, but easy.

Make Yao support both integration (system) tests, regression tests, and unit tests, but make sure unit tests are only run when a unit is changed.

Documentation

On July 24, 2019, I was at a BBQ with some friends and got into a conversation with one of their other friends. He was also a programmer, and he said that a good programming language is “a language where the documentation is helpful, but not critical, to understand the language.”

He had a point. I need to design the language such that the documentation is unnecessary for an intuitive understanding of the language. This means that I should not be too radical with the design.

That said, Yao and its tools must be well-documented, including a tutorial for beginning and experienced programmers.

Compiler Requirements

There needs to be 2 classes of language rules: first, rules that cannot be broken without breaking the program, like syntax. Second, rules that can be broken, but when they are, they introduce bugs, like failing to use (or ignore) a value returned by a function.

Functional Requirements

  • Must be able to parse Yao.

  • Must be able to generate Yvm from Yao.

  • The bootstrapped compiler must be able to generate debug info in Yvm.

  • The bootstrapped compiler must have an Language Server Protocol mode.

  • Supported environment variables:

    • NO_COLOR.

Language Server Protocol

The compiler must be implemented from the start to enable incremental builds, and a language server.

Have an error sentinel to propagate and allow continuing.

Start with an on-demand style.

Make the compiler an API such that it could be called to do its thing in bits as the user is editing.

Use functional data structures to incrementally rebuild.

Use an error stack for detailed error messages.

Keep precise spans (location information) for whitespace and comments, use file contents to rebuild.

Also, this could be implemented with conditions and restarts. The restart should get the parser to a semicolon or right brace, then restart, and then the erroring function should just return with that error still (to allow the rest of the code to pick up there).

It may be necessary to split the server into multiple threads. If the server gets a document change notification, it might be useful to reallocate the buffer, but before the new data is copied in or deleted, another thread, the parsing thread, could start parsing, stopping on a lock if it reaches the point at which data is being added or removed, starting again once the data is in. This should be able to be done multiple times for multiple edits.

The above would allow the parser to get a leg up on response time, by about half the file on average.

Feature Flags

Compiler and Yao itself must have feature flags to enable or disable certain dialects and extensions.

Must have -fbetter-c, which should disable everything that hides function calls and such.

Must have -fno-extensions for telling compilers to disable all of their own extensions. This flag would enforce compilers only using Yao proper, kind of like the .POSIX target in Makefiles. Should be default.

Feature flags can either be -f<feature> or -fno-<feature>.

Certificates

Compiler should have an option to provide a certificate proving that the code was generated correctly.

This certificate should basically be a table of which source line was translated into what Yvm instructions.

Output

Compiler should allow the developer to output stack traces (and other output) in a structured format like XML, JSON, etc. See https://news.ycombinator.com/item?id=28017999 .

Error Messages

Compiler should just say “line XXX, syntax error”, like PCC. See https://news.ycombinator.com/item?id=28018667 . User should be able to turn on verbose mode after the fact without rerunning the compiler, so keep a log.

Also, there should be an option for the user to reverse the error messages, so first is output last, etc. This is so the first error is right in front of the user at the command-line.

Algorithmic Requirements

  • Algorithms:

    • Parsing Yao.

    • Generating Yvm.

  • Parsing Algorithmic Requirements:

    • Time: O(n) average case, O(n^2) worst case (only if absolutely necessary), where n is the size of the input file.

    • Space: O(n), where n is the size of the input file.

  • Generating Yvm Algorithmic Requirements:

    • Time: O(n), where n is the size of the input file.

    • Space: O(n), where n is the size of the input file.

Performance Requirements

  • Bootstrap compiler:

    • Scale: bootstrapped compiler.

  • Bootstrapped compiler:

    • Scale:

      • 10’s of millions of files.

      • Each billions of lines long.

      • With lines megabytes long.

    • Time: 1M LoC/s, non-optimized, but with debug info.

    • Space: 3x size of file.

    • Response to each LSP request under 5ms. (Users notice anything above 10ms, and the editor needs enough time to do its thing too.)

Security Requirements

  • Bootstrap compiler:

    • No threat model, no requirements.

    • It’s going to be internal use only, so I don’t care.

Safety Requirements

  • Bootstrap compiler:

    • Must not crash on compiling bootstrapped compiler.

    • Must not miscompile on compiling bootstrapped compiler.

Issues

  • How to Implement New Control Structures

  • How to Implement New Type Constructors (structs, SoA, etc)

  • How to Implement New Type Extensions

  • How to Add New Operators

  • How to Implement Operators

  • Conditions and restarts

  • How to create conditions (Yao exceptions).

  • How to create restarts.

  • Whether to allow a condition to specify a default restart.

Syntax

Yao’s syntax must always be parseable without lookahead. However, since Yao’s parser and lexer operate on the pull model (the semantic analysis pulls from the parser, and the parser pulls from the lexer), and because there are different lexer and parser modes, this means that the semantic analysis can use its context to switch modes in the lexer and parser. This means that parsing C+±style templates does not need lookahead, for example.

Unicode

Yao should support Unicode in identifiers, using C universal characters.

Semantics

Unsafe Code

The first, and most important, design aspect is how Yao will be able to handle everything that a low-level language must be able to, for Yao must be usable for writing all kinds of software, in order to possibly replace C/C++. That means that Yao will have unsafe code, like Rust’s unsafe. However, unlike Rust, Yao’s way of doing unsafe code will be harder to use, on purpose, because such code should only be used when absolutely necessary and making it harder to use than Rust’s unsafe would help to discourage any unnecessary use.

Therefore, since Yao will use its own compiler library (Yvm), which like LLVM, would have a type-safe assembly language, I can use that type-safe assembly as Yao’s equivalent of unsafe, especially since it will have the capability of including platform- and ISA-specific code.

This will have more than just the advantage of making it harder to use; it will also ensure that the semantics of Yvm are maintained, along with its safety guarantees. It would also force its use into its own files, separate from normal Yao code.

Therefore, all Yao unsafe code will be in .yvm files.

Expressions

Every “statement” is actually an expression and returns a value.

Blocks return the value of the last expression in the block. Loops return a list of items, each one the result of the loop body. if statements return the result of their bodies, whichever one ran, and if there is no else, returns the zero value of the type that the body returns when the if statement is not taken.

Constructors

Every type can have constructors, which can be defined outside of the type’s code, for extension.

The constructor is not special, except that it has the same name as the type. All constructors return a value of the type they construct.

Constructors do not have access to the fields of the type. Instead, they have a special syntax for “atomically” constructing a value of the type:

construct uint64(bool a) {
	if a
	{
		return uint64 { 1 };
	}
	else
	{
		return uint64 { 0 };
	}
}

This means that there is never a point at which constructors can access a partially initialized object.

Zero Values

Every type must either have a handwritten or generated zero argument constructor, which defines the “zero value” for the type.

Casting

Casting will always be explicit, and it will be done through constructors.

Memory Management

Yao’s memory management will be like C++’s RAII: types will have destructors, which will be called when the type is destructed, which will be at the end of their scope or when their parent is destructed.

On top of that, since Yao will use “extended” structured concurrency, and containers to prevent cycles. For more information, see the Extended Structured Concurrency.

However, all of the info to free all of the resources for a thread will be contained in the thread’s data. Each resource must be put onto the resource stack, so if the thread gets cancelled, it will automatically free all of its resources when it is next safe to do so (not in the middle of a syscall, for example). This means that to cancel a thread, it is not necessary to unwind its stack, which makes it so Yao avoids having exceptions.

Children should be able to get their parent allocation (such as the array they are contained in, or the parent struct). This should remove the need to store a pointer to the parent. (Or maybe make it possible to store references?)

Also, to implement things like linked lists, the list itself should be a parent container. That way, the parent owns the items in the list, and no item in the list needs references to anything in the list, removing the need for cycles.

with

Have with as a keyword for immediate use and release of objects/resources, like Python.

Extended Structured Concurrency

TODO

Threads

Yao needs to calculate the max stack size for each possible thread, if it can be calculated, and set them. This means that the stack size for the main thread needs to be set on startup to the calculated max (and maybe some guard memory).

Thread-Local Data

There must be some way for code to define globals/static fields that are thread local.

Hard Cancellation

Yao will provide hard cancellation, which will be per-thread.

This will be done by releasing all resources in the heap stack and exiting the thread.

However, graceful shutdown will be per program.

Time

Time must be a first-class concept in Yvm, and that will probably be done with measuring time in “machine units” as the HAL/S language does. This (more or less) corresponds to cycles.

The number of cycles for every instruction will be necessary for something even more important: implementing constant-time cryptography.

Having time as a first-class construct will allow Yao to build everything, up to circuits. (See the Esterel programming language.)

Packages

Yao will have packages. A package is defined by one or more files of code.

The main package file will have the name <pkg>.y, where <pkg> is the name of the package. If there is a folder with the same name as the main package file, except without the .y extension, that folder contains the subpackages.

The main package file will be like a script: code inside the file will be executed, as though it’s a script, on import of the package.

There is alse the build file, which will execute any package-specific build code (like adding lexer or parser stuff, special keywords, etc.). It also adds more package files to the package, if necessary. The script code in other package files will be executed in the order they are added to the package, and the build file will be able to add them to the package before the main package file.

Of those two main files, only the package file is required.

Execution

In a program, a package is made the “main” package, and the program starts by executing the package script file as though it’s a script.

When a package is import’ed, its package code in its script file is run in the same way, unless it has already been run.

Build

If the package needs special build code, like if it adds keywords, then the build code should be in a file named the same as the file, but without the .y extension and with the .build.yao extension.

It is also possible to use multiple files for a package in the same folder, but they should all have a name like <pkg>.<something>.y. This is possible because package names cannot have a period in them, since periods separate items in the package hierarchy.

The build file will include code to add other files in the same folder to the package. It can add some or all. This is how platform-specific code should be implemented: by having a Windows-only, Linux-only, etc. file alongside the main package file that is included in the package if the platform is the same.

In fact, by default (without a build file), Linux-only files should be named <pkg>.linux.y, Windows-only files should be named <pkg>.windows.y, Mac OSX-only files should be named <pkg>.apple.y, etc.

The build files are where compile-time code can be executed, such as defining keywords, setting constants, etc. It is code that is run by the build system, at compile time.

The build files are where custom optimizations can also be defined. See The Death of Optimizing Compilers by Daniel J. Bernstein.

This means that the optimizer will, in fact, need detailed information about the machine, like cache sizes and speeds, number of cycles for every kind of instruction, etc.

Overloading

Yao will have overloading for functions and for operators. However, this must be done in a way that allows C code to easily call Yao code and vice versa.

The way this is done is that every package, function, and type will be able to define their own suffixes. Prefixes will be the name of the item’s containing scope (for example, the package where a function is contained), including any suffixes. The full name of items will be their prefix plus a triple underscore plus the source name of the item plus a double underscore plus the suffix for the item.

This means that a function in the subpackage foo in the package baz will have a full name of baz___foo, but if that same function foo defines its suffix as random, its full name would be baz___foo__random. If baz set its suffix to bar, the full name of foo with the suffix of random would be baz__bar___foo__random.

Function Overloading

For function overloading, overloaded functions must specify a suffix.

To call a specific overloaded function, or a function from C, put its name in symbol form.

Operator Overloading

Operator overloading will work by defining operators and their attributes. One of those attributes will be the name of the function that must be defined for a type to implement the operator.

The reason that functions with names will be used is that, if a type has to define an add() function to use the + operator, then programmers will hesitate to define that function to mean something else like “concatenation.”

Reflection

Yao must have full type reflection. That information will include the size, the alignment, the destructor, etc.

In later iterations, the full type reflection should be able to be used to generate types.

Operators

TODO

Logical Operators

Yao will have non-short-circuit operators on top of the default short-circuit operators.

Arithmetic Operators

Yao should have trap, wrap, and saturate arithmetic operators on fixed-size types.

Numbers

Numbers, by default, should be arbitrary precision (rational).

Strings

Strings should be delimited by double quotes. Raw strings will use three or more double quotes (must have the same number of double quotes on both sides).

Interfaces

Instead of classes, Yao will use interfaces.

Interfaces can have default implementations of methods, and they can also extend other interfaces.

Interfaces must be implemented explicitly.

The methods that implement an interface do not need to have the same names; instead, during the implementation block, the real method can be associated with the method in the interface.

Function Annotations

There will annotations that should be able to be applied to functions:

  • pure: The function only depends on its arguments, not on any globals.

  • const: The function only depends on its arguments and what they point to. A weaker form of pure.

  • total: The function is pure and always returns a value for every possible argument, with the exception running out of memory or things like that.

Extendibility

Extendibility will be done with several things:

  • The Yao lexer should have an interface for specifying special sequences, or sequences of characters that need special handling. Examples include the start sequence of a block comment, periods (maybe?), and (for Yao specifically) the characters that start strings and/or a process fork (like a shell).

  • Custom parsing for the above.

  • A way to add keywords and custom parsing for them.

All of the above must have some way to return partial parses because of the requirement for LSP.

There should also be a way for users to create new types of types in such a way that the standard library functions and types can be used for them. This is how structs of arrays, reference counted pointers, etc, will be implemented.

Optimization

Everything stack based is alloca, everything immutable is Yvm register, unless allocated.

Aliasing

Yao will use strict aliasing, like C99.

Beyond that, it will either not have a keyword like restrict, or it will.

If it doesn’t, then strict aliasing as the default means that the compiler has to assume aliasing, except when it can prove otherwise. For profile-guided optimization, this might mean that the compiler does not optimize anything but the hottest loops, and for those, it generates code to check for aliasing, and if there is aliasing, runs a slow version, while running a fast version otherwise.

If it does, then I think the compiler should check for aliasing, at least in any debug mode, and maybe even in non-debug modes.

Pointers to different types are assumed to not alias, even when they are based on the same type. However, it might be that there is code to check for aliasing there as well, running the faster version if none, and the slower version if it exists.

Bitcasting

If a pointer to a type is cast to char*, that is fine, but for the other way around, there must be a way to specify alignment, like Zig.

Contexts

There should be a thread-local map of contexts, which can be defined by libraries.

These contexts will use stacks, and when a context is pushed onto the stack, it will automatically be popped off at the end of the scope.

These contexts will include:

  • Allocators.

  • Error handlers.

Error Handling

Error handling will use:

  • Panics.

  • Conditions and restarts.

  • try like Rust.

  • Resets.

Panics

These are for buggy conditions, like Midori’s abandonment (see The Error Model).

Panic does immediate abort() of whole process, maybe with a stack trace. However, this behavior should be overridable, for places that need other behavior.

Conditions and Restarts

Use keyword pass to use the next handler. That pass keyword causes a panic if it is run inside a non-handler function or there are no more handlers left to try.

Also, to wrap an error, pass could be used like return: pass the value to the next one in the chain or pass the wrapped value.

When creating a condition type, it must include what return type the condition needs to restart.

Use error or raise keyword to start looking for a handler and to return the value from the chosen handler, and if there is no handler that accepts, panic.

Conditions can provide restarts, which are functions that can handle the condition. Handlers can then handle the condition in their own way, or call one of the restarts.

try and Friends

Use try keyword to call functions that can return Result. If it is an error, automatically return it. The keyword try_none turns an error into None from Optional/Maybe, and try_assert does an assert.Also the keyword try_err will raise a condition, to allow use of restarts.

Use the keyword guard (like Swift) like try except that it also takes a code block with the error. Also guard_none to be applicable to Optional/Maybe.

Also have ?? operator (like Swift) to return a default value if Result is error or Optional/Maybe is None.

It might be useful to have interfaces and functions like the Go 2 error proposal. This includes Unwrap(), Is(), As(), ErrFormatf(), etc.

try keywords should use an interface that has an iferror() function and iserror() function. (From Go 2 Proposal Docs.) This would allow other types to be used as error types, but still allow use of the keywords.

Resets

These are like Midori’s aborts. (See The Error Model.)

Resets are basically like using setjmp()/longjmp() in bc. I can only do this if I can allocate all local variables on the heap part of the stack because of the requirement that local variables cannot be mutated, so any function with a reset catch should have all of its non-immutable locals allocated in the heap stack.

However, on top of the jmp_buf, there should be a piece of data marking what resources should be popped off the heap stack and freed.

These are unlike exceptions because:

  1. They will require an explicit throw (or something similar).

  2. There must be an explicit catch above it in the stack.

Design by Contract

Have Design by Contract with requires, ensures and more.

ensure would take a name (or a tuple of names for multiple return) to ensure return results could be named how the developers wanted.

invariant could be used to ensure invariants for data types. Eiffel docs say invariants have to be true at all times, except when a method is executing, so it can be said that invariants can be AND’ed with methods’ preconditions (requires) and must also be satisfied on method return like postconditions (ensures). The only methods that don’t apply invariants as preconditions are constructors.

However, invariants would also be public or private. Public invariants have to apply before and after public methods, while private invariants have to apply to all methods.

I will also have loop invariants and variants.

For better conditions, Yao will have an “implies” (=>) operator, and an “if and only if” (<=>) operator, as well as “for all” and “there exists” operators.

Put pre and post conditions before the function. This will allow programmers to apply them to arbitrary expressions and blocks, including loops and if statements. Then, the only special thing I would need beside old and result would be invariants for each iteration of a loop.

Type System

Dependent Types

TODO

Dynamic Types

Yao should have dynamic types with named (symbol) fields (like Clojure maps, see Effective Programs by Rich Hickey).

Scope Resolution

The period (.) operator is for scope resolution only, and items can only access items in their top level scope.

This means, for example, that methods can only access fields of the type because the type is the outer scope. Functions can access global variables in the package they are in.

If something imports a package, including a method, it can access items in that package. If a type imports a package, then methods can access that package.

Other than that, the only globals accessible by methods are static fields.

Generics

Yao will have generics, which will be done by having functions return types. In other words, generics will use reflection.

Generic functions and types must be able to restrict what types/functions are used by requiring interfaces.

Mutexes and Such

Deadlocks will be prevented by an API that groups and orders locks.

When taking multiple, the lock group needs to have memory barriers to ensure all locks are taken properly.

Code Tiles

Support code tiles. For more info, see http://250bpm.com/blog:147 and http://250bpm.com/blog:149.

Rig

Yao scripts should be able to import the rig package and get parallel execution easily. This will require Rig to have a “one-shot” mode.

Future

These are items to do in the future.

c2yao

Create a C to Yao translator.

This, and all translators, must be flexible and able to be customized per function and line of code to prevent bad translations like c2rust.

Persistence of Execution

From Houynhnm Computing and François René Rideau: persistence of execution.

This needs to extend to the level of recovering execution state.

This can be done by requiring that all parameters are passed on the stack, and when about to enter a new function, store all of the stack state in a memory-mapped area.

Then, when entering the function, store the current instruction pointer and parameters.

Store the state again when exiting a function.

This means that the state of a program can be completely recovered at every call. It should be default, but able to be turned off.

This will enable omniscient debugging, so I will build a custom debugger using the interpreter. This debugger must be able to record and replay, preferably being able to do both in the same execution.

First-Class Implementations

From Houynhnm Computing and François René Rideau: first-class implementations.

First-class implementations mean that the interpreter (whether actual interpreter, compiler, or hardware) of the program can be changed dynamically.

This can be done in Yvm because semantics are going to be precisely defined and the required safe points will be once each Yvm instruction is complete before the next instruction starts.

Implementation

Compiler Library

In order to compile Yao, a compiler library will be used, allowing other languages that use that library to interoperate with Yao.

However, the only real existing option is LLVM, and LLVM has a problem that makes it unusable: it allows undefined behavior.

Now, undefined behavior allows compilers to make assumptions that allow greater chances for optimizations, but such is not allowable in a language whose first goal is safety because the existence of undefined behavior removes safety, and all of Yao must be safe with one exception: explicitly marked unsafe code, which I will talk about shortly.

So Yao will use a compiler library, but it can’t use any existing ones. That means that I must write my own, called Yvm. And it will have its own type-safe assembly language. The text form of those assembly files will have the file extension .yvm

Libraries

Graphics/UI

  • Yao will have a library to abstract away OpenGL, Vulkan, Metal, and DirectX. Might just use Vulkan (since there is a real-time translation library from Vulkan to Metal) if Vulkan is supported by everyone else besides Mac OSX.

  • Yao will also have a GLFW replacement in the standard library.

  • Yao will also have a Wima replacement in the standard library.

  • One of the above two, probably the GLFW replacement, will be able to generate events for automated testing. This can be done through stdin maybe?

  • Any other graphics library must also be able to generate events for automated testing.

    • This can be done by allowing a sequence of events to be fed in through stdin or some other file.

    • It should also be possible to record a sequence of events, i.e., have a user do it, then be able to play back the events. That way reproducers can be automatically generated.

      • Also, if the software automatically records all events and writes them out to disk constantly. (Should also have a signal handler for writing out the last events in case of a crash.) The resulting file can then be sent to devs in case of a bug.

    • All of this should be available to users for another reason: allowing them to automate processes. See https://matduggan.com/why-we-need-better/.

      • They can even start by recording, then the initial program should be generated from that.

    • In fact, generating events should be woven throughout the design of the library such that apps find it easy to support.

      • Also add an easy way to program it? Such as adding something like Blender’s node editor in the library, so that apps don’t have to add it?

  • In fact, it is a design flaw in Yao if adding such automated testing support, for everything (including things that are considered inherently not automatable) is not only possible, but easy.

    • When doing the above, the code should be able to run in headless mode (no GUI), yet still know where everything is drawn and so still respond to “events”.

    • Recording events to be played back should also record the size/shape of the windows for exact reproduction.

  • Accessibility (a11y) should be builtin from day 1. See this blog post.

  • Localization (l10n) and Internationalization (i18n) should be builtin from day

    1. See this blog post.

  • Animations should be builtin from day 1. See this blog post.

  • Handle multi-touch and pressure on trackpads and mice from day 1. See this blog post.

  • Handle Input Method Editors stuff from day 1. See this blog post.

  • Seriously, just see this blog post.

Debugging

  • Yao needs to provide libraries to record execution and play it back.

  • There also needs to be an easy way to dump environment information to a file on crash, so a signal handler might help here.

POSIX Signal Handling

  • Signals in Yao must be more capable. To do this, Yao must install signal handlers that will just set variables or send data on pipes, like Slurm. Then the compiler will automatically generate checks for those variables/pipes and when it detects something, it will call the signal handler, which will then have full reign to do what it needs to do, including throwing an “exception”.

  • Here is how signal handling and async I/O will work:

    • Main thread will handle both with a poll() loop.

    • One of the fd’s will be a pipe hooked to the actual signal handler that will just write the number of the signal to the pipe and return.

    • One of the fd’s will be for async I/O requests.

    • The main thread will just loop and run the user signal handlers when needed and do async I/O when needed.

    • This comes from Laurent Bercot telling me about djb’s self-pipes. Laurent’s implementation is here.

    • When handling signals, the code must get a lock that the code will take every time it is about to do something that must not be interrupted.

  • The above is not going to be how it works. Instead, what will be implemented is something like bc’s signal handling: signals are locked if necessary. If signals are locked, the Yao signal handler sets a flag or pushes it onto a queue and exits. When code unlocks signals, it checks for signals and calls the user signal handler for them. On the other hand, if signals are not locked, the Yao signal handler calls the user signal handler. This is made possible through the POSIX standard that says that undefined behavior only happens with longjmp() only if a non-async-signal-safe function is interrupted. Yao just needs to make sure that such functions are not interrupted.

Files

From #s6-offtopic, skarnet said this about the POSIX close() function:

<skarnet> the syscall and the close() function are basically the same, the C function is a basic syscall wrapper
<skarnet> so handling the syscall directly won't change anything
<skarnet> the only thing you can do in Yao is make sure close() doesn't return a value
<skarnet> just as fd_close() does
<skarnet> for that, you need to decide what you're going to do if the syscall fails
<skarnet> my advice would be:
<skarnet> - on EBADF, ignore - it's just a programming error
<skarnet> you may abort instead if you have a debug mode and are in debug mode
<skarnet> - on EINTR, it's complicated: on Linux, do nothing special and return success because the fd was still closed. On other OSes, check with the OS whether the fd has been closed on EINTR; return success if it has, try the syscall again if it has not
<skarnet> - on any other error: either ignore the error (there's nothing the programmer can do about it) or abort the program
<skarnet> prj and crest seem to think it's better to abort the program
<skarnet> they may be right, I don't have a strong opinion about this
<skarnet> but whatever you do, do not return a value to the user, because the user can't do *** with it anyway

Concurrency

  • Figure out a way to have lock-free and not deal with concurrent interferences?

  • No. Lock-free is too hard to reason about.

  • However, lock-free constructs will be provided as long as they can be made to implement the semantics of Yvm loads and stores.

JIT and Interpreter

  • Make runtime code specialization easy.

  • Make interactive programming (hot swapping code during execution), from Handmade Hero Episodes 25-29, available in Yao.

    • Probably do this by Yvm.

Shell

Implement a shell in Yao.

This will use $ as the character to start a process fork (start a subprocess), something that will be available in Yao itself, though in the shell, it won’t require a semicolon to end the command, while in Yao itself, it will.

In Yao, if the command appears alone, then the subprocess’s stdine, stdout, and stderr should be attached to the program, meaning that the parent program will print everything in the child program.

However, the command can be assigned to an item or used in an expression. If that’s the case, then the item is a subprocess object that contains the output in stdout and stderr, enough information to weave the two, the exit code, and any other relevant info.

This means that the parent (or rather, the parent’s thread where the command is run) will always pause and wait for the subprocess to finish.

The shell should implement history for all commands run under that shell.

Calculator

Calculations are done on printing, just either do operations to infinite precision, or store operations in order until then.

Allow users to specify precision, number of significant digits, or both on printing.

Testing

Standard Library

In the standard library, for every function that returns a resource (in debug mode only), increment a counter for that resource, and when the resource is returned, decrement the counter. Can also maintain a list of calls, like Valgrind does to show what functions allocated memory. This will be for telling developers when they don’t have a drop (destructor) implementation when they should have one.

Formal Methods

Yao and Yc will include tools for formal methods.

Property-Based Testing

There should be a YaoCheck (QuickCheck-like) tool for doing property-based testing.

See the Yvm design documents for more details.

Bootstrap

TODO: Update this for storing C code in each version.

Yao is bootstrapped. This means that it is written in itself, but can be compiled from scratch with any system that can compile C11 programs with POSIX 2008 API’s or Win32.

The bootstrap process will be supported from the beginning, and it will remain supported as long as Yao exists; it will NOT go away. Thus, any system (OS, Linux distro, or other computing system) will be able to build Yao binaries from source, if desired.

Process

Dependencies

The only dependencies that the build process should have on POSIX systems are the following:

  1. C11 compiler.

  2. Linker.

  3. Equivalent of binutils.

  4. POSIX shell.

On Windows systems, the only dependencies should be:

  1. MSBuild or the ability to run batch scripts.

  2. MSVC.

  3. Win32.

Initial Development

The initial development bootstrap process is the following:

  1. A compiler written in C (stage0) is built using a C11 compiler and a POSIX shell script that has been hand-written to perform the job of make.

  2. stage0 then compiles the first version of the standard library, the Yao compiler, Yvm, and Rig (stage1).

  3. stage1 then compiles a new version of the standard library and Yvm (stage2) by generating C code and invoking a C11 compiler with a POSIX shell script to take the place of make.

  4. stage2 then recompiles itself, and the bootstrap process is complete.

stage0

stage0 will include a bare bones compiler that has only the features required to compile stage1. It will require the dependencies listed above.

stage1

This stage will include a more feature-complete compiler, along with an interpreter for implementing the build system.

Releases

The release bootstrap process will be different. For every release, the compiler will be run through itself, using the C backend, to generate C source for that version. Then, that C source will be put into the release tarball.

Thus, the release bootstrap process will look like this:

  1. The C source (yaoc) will be built using make and a C11 compiler.

  2. yaoc will then compile the Yao compiler written in Yao (yao1).

  3. If desired, yao1 will compile itself again to make yao2.

When this was suggested to me by rofl0r, I knew it would be used. It will make it so every version needs to compile itself, instead of needing the previous version to do so. It will also keep the bootstrap chain to a minimum. As a bonus, if I do not follow the initial development process above, it will not matter; users will still have an easy bootstrap.

Trusting Trust

The bootstrap process will ship with the ability to perform a diverse double compile on the compiler.