Rig is a build system and a package manager.
Rig’s basic goals are:
Ease of use.
There are four things that must be done right in order to accomplish those goals:
Rig’s implementation must be designed and executed correctly, and in this case, “correctly” means that Rig must have none of the following things:
Failing to utilize available resources.
In order to have both reproducible builds and correctness, Rig must be free of non-determinism, or rather, any non-determinism that exists must NOT affect the final result.
This includes cases where parallel execution is used. The inherent non-determinism in parallel execution must never cause results to differ from run to run, and they must never cause the build to be incorrect.
There are two kinds of bloat that apply to Rig:
Code bloat is when there is code in Rig that does not serve a purpose useful enough to justify its existence.
Execution bloat is when Rig’s execution wastes work for any reason.
Rig must be free of both kinds of bloat. This will help achieve performance.
No Failing to Utilize Available Resources¶
Rig must be designed to properly utilize the resources it is given. If it is given 16 CPU cores, it must use those CPU cores as well as possible. If it is given 256 GB RAM, it must use that RAM as well as possible.
Note that this does not mean that Rig should always use 256 GB RAM if it has it. It should not. In fact, it should use as little as possible to allow tasks to use it, and it should try to launch tasks to utilize as much of it as possible, while not starving other tasks.
The same goes for CPU resources.
In other words, for both time and space resources (see the Glossary), Rig should strive to launch tasks to use as much of the resources as possible while not starving other tasks.
This will also help performance.
Rig must be capable of restricting a build from doing certain things on a fine-grained level. This will be done using capabilities in Yao.
This will do two things. First, it will allow users to get started with very little knowledge, and second, it will ensure that builds cannot compromise users’ machines.
To help users get started with very little knowledge, capabilities will be used to restrict the accepted subset of Yao that Rig will accept. (Yao will have this same capability.) This will allow users to learn highly restricted versions of Yao, lowering the steepness of the learning curve until a steeper learning curve is desired. This will help improve the user experience by improving ease of use.
To ensure that builds cannot compromise users’ machines, Rig will use
capabilities to restrict builds from performing actions that are unnecessary.
Some good examples of such unnecessary actions are protestware in
straight-up malware in
npm that deletes files that have nothing to do with the
build. This is the first and most necessary feature for security.
Rig’s user experience should not be ad-hoc. It should be carefully designed for ease of use and ease of learning.
Ease of learning is already covered above in Permissions/Capabilities, but ease of use is harder.
Ease of use requires that the user interfaces to Rig be simple and discoverable.
There are three user interfaces to Rig:
Yao (build scripts).
Command-line (executing Rig).
Configuration user interface (configuration).
Designing Yao for ease of use is outside of the scope of this document (see the Yao design document).
The command-line for Rig must be familiar to those who use other command-line build tools, and it must be built in such a way to use the same discoverability tricks that command-line users are used to. In addition, it must be discoverable through extensive, yet concise and easily searchable documentation.
The configuration user interface will vary by platform. On Windows, it will
probably be a GUI, and on other platforms, it will probably be a curses-like
interface. In either case, the UI must be like CMake’s curses interface and like
nano: the user interface tells you exactly what keys and actions will do what.
Build systems are almost universally hated. I believe that is the case because everyone has to use them, but that they are not ideal for every situation they are put into, and I believe the reason for that is because they are not flexible.
Every build system in existence has some sort of restriction on the user. For
make, it’s that you can only check the
mtime for file freshness, as well as
the lack of a Turing-complete build setup language. For
ninja, it is the same
as the limitations of
make. For meson, it’s the lack of extensibility.
And as for CMake, well, there are too many to list.
Rig must not have any limitations that reduce its flexibility. Limitations are still necessary, of course, but they should be carefully enumerated and reduced such that they do not stop users from using Rig for any situation in which a build system might be used.
Not restricting file stamping.
Not restricting build configurations.
Not having only one build type.
Not restricting the language in both setup and execution of targets.
A build that has incorrect results with correct use IS AN INSTANT BUG.
Correct use means:
Specifying all dependencies correctly.
Not introducing non-determinism into the build.
Not modifying any files or data during the build.
Not breaking sandbox.
A build that has non-deterministic behavior with correct use IS AN INSTANT BUG.
From https://news.ycombinator.com/item?id=2106552, there are three competing properties that build systems must have (from most to least important):
Performance regressions are regressions.
Performance must be tested on every release.
Path handling on all platforms.
And file handling too. Implement buffers myself?
Allocators, both stack and heap.
Including heap stacks.
All resources allocated using the heap stack as the root.
stderrwith the outputs properly weaved and not weaved.
Catch errors from left side of pipes.
Lock groups to handle multiple locks and to prevent deadlock.
Struct wrappers around fd’s, handles, etc.
Locks that destroy item reference when unlocked.
To prevent referring to item without a lock.
Locks should also return the item reference when locked.
Signals to threads for cancellation.
longjmp()code should free all resources in the heap stack.
In Windows, suspend thread, free its heap stack, and terminate it.
In all cases, use volatile locks to stop the
longjmp()or suspend if unsafe.
Implement as a destructor, if possible.
Terminal UI from
Use multi-call binary.
For dev build, use
For curses to set options, use
For helping to add build options and presets,
For building for a release to a user, use
For installing into the “store”, use
rigkits.io, which I already own).
For creating a project, use
For running executables, use
For querying the build database, use
rig-gui, which will do everything through a GUI.
Why use separate commands and not subcommands to
rigneeds to take target names, which conflict with subcommands.
Provide aliases and completions for various shells.
Supported environment variables:
Required command-line arguments:
Location of source directory.
-dis provided, assume in source directory and location of build directory is required command-line argument.
Turn off automatic detection of compiler and flags, etc. This is to make rig useful for embedded development. (See the “Other” section at the end of this document.
Preset. If no preset provided, use default.
The end user should be able to run just
rigrto build a release.
Developer can set defaults for both dev and release.
He will get dev defaults unless he runs
rigr, then he’ll get release defaults.
Required command-line arguments: same as
Command-line options: same as
rig, except to add:
-c: Run config UI, as though running
Required command-line arguments:
Target to build, though there should be a default target for if there is no target given.
Target names cannot begin with a dash, but I think that’s a small price to pay.
Alternatively, I could just require that they be escaped with a double dash then space before them, like how GNU does arg escaping.
Targets given on the command-line have either no colon prefix (
all) for non-file targets, or a colon prefix (
:src/file) for file targets.
This is because there must be some way of distinguishing between the two on the command-line, and a colon was the suggestion of DarkUranium on IRC.
Since non-file targets are the default, then I can do command-line arguments because those targets cannot begin with a dash since Yao symbols cannot.
Another suggestion from IRC was that I could do
-xas command-line arguments and
+xas parameters to build scripts.
I’m not sure I like parameters to build scripts because if the parameters change, it’s an entirely different build.
Create and fill the build database.
Sort the most recently failed task first (subject to dependencies)? This would allow for faster iterating.
Schedule linking tasks for as soon as possible.
Default targets (should be overridable):
clean: Clean build files.
distclean: Clean build and config files.
list: List the available targets.
help: List the available targets with descriptions (if any).
-mor something like that, to act as a metabuild system for producing data for dumb build systems.
Requires Rig to be in
-e: Execute Yao code before starting the build.
-o: Output all of the build options, with defaults and descriptions, and exit. This is for discoverability by end users
This is to help people avoid writing the
Needs to be a curses-like GUI.
Information users should be able to query:
What targets have been built and in what build (how many builds ago).
I think it’s okay if the build database only stores the last time a target was built.
It could be good to have an easy way to query what targets were built in the last build.
How much boilerplate is okay? Very little. See top-level requirement of clarity.
Thus, there needs to be a way to generate the boilerplate.
riginitshould be able to generate scripts for top-level directories and subdirectories.
Generating a top-level script should also generate a top-level config and options file.
MUST INCLUDE EXPLANATION COMMENTS.
Generate a Rig minimum version and project name.
Require each project to set a project name, including a full Java-like namespace.
This is to make it easier to import other projects.
Be able to generate from Makefile?
If the directory is empty, I could have
riginitfill out the directory with default directories (
tests, etc.) and default
Run a named executable.
Requires ensuring that shared libraries are in the executable’s search path.
This should have elements to let the user do everything, from setting config to building, including what
This is the biggest reason all of the executables should be in the same binary; they will have to be for Windows.
In the future, allowing people to install packages would be great too.
Rig should support one-shot builds, with a command-line flag, perhaps.
These will not write to the build database, nor even require one to be present.
This is for using Rig for easy parallel execution, as a task runner.
Rig should support both in-source and out-of-source builds.
If in-source, the user gets no special protection.
If out-of-source, the build should be hermetic, if possible.
All input files from the source tree should be hard-linked into the build directory (or symlinked if hard-linking fails).
Make it so there can be multiple build directories at once, each one with different configurations. (From
There must be a way to specify which build configuration on the command-line.
There must be a default configuration.
This should only be used when building from the source directory. If building from the build directory, just build.
Build directories should not be allowed to contain the source directory.
Yao is the build language, with Rig-specific extensions.
Code should be split in two:
The top-level code is the build script code and runs sequentially when a build is requested.
targetblocks is target code and only gets executed when a target is executed.
Imports execute the code of the imported build script right away.
Targets can depend on targets that are not defined yet.
However, if those targets are never defined before the target must be run, it is an error because Rig will be in a deadlock otherwise.
Yao should be run in a sandboxed interpreter.
The only ways to touch the outside are to write to files in the build tree and to run commands.
Must allow multiple inputs and outputs per target.
However, unlike dependencies (see below), inputs and outputs must be static.
Otherwise, there would be too much dynamicism in the build.
Use the build model and algorithm from “A Sound and Optimal Incremental Build System with Dynamic Dependencies”.
Necessary because many build requirements outside of building software need them.
Also, technically, C and C++ need dynamic dependencies for having correct dependencies on headers.
There should also be a way to declare static dependencies.
If a target only has static dependencies, that would allow it to be treated differently for efficiency.
Will not allow dependency cycles, unlike the paper.
This is because since targets can execute arbitrary Yao code, they can use loops to loop until the inter-dependent outputs are stable.
If the stamp of an output file remains unchanged (usually from hash being the same) after target is built, don’t bother rebuilding its dependents.
Use the filename
Be able to split the build script into files, one in each directory that the user wants.
Procedures from parent directories should not be called directly. Instead, import directories somehow.
Importing a child directory should be able to be done more than once?
This is in case the same targets need to be built more than once with different configs.
Make it so users can add prefixes to targets on import, to separate different imports.
Need a keyword for dynamic dependencies.
Source files should be able to be added by globs and recursive globs.
Must have a way for query the dependents of a target?
No, that would be impossible, unless the querying target itself were a dependent of that target.
Also, such a query could not be in build script code.
Can also import other build files in subdirectories by referring to the subdirectory name.
Also be able to import modules that might define new targets types, etc.
Provide build groups, like waf.
This will be a convenience API.
Underneath, they will be all made dependencies of a PHONY target.
The API will return that target, which can then be used as a dependency of other targets.
They will also be all given the same tag, so that scheduler constraints can be used.
Targets should not be closures.
If info needs to be passed into targets, the build database must be used.
Have functions to set environments; don’t allow users to set it with shell commands.
The default environment should be empty, however, like Nix does with a bad
target: for defining a target.
tag: for applying a tag to a target (only in a target block).
need: for dynamic dependencies (only in a target block).
before: for setting reverse dependencies, that a target must be run before another. This cannot be used on a target inside its target block; it must be outside while still registering the target. Also, it is an error if the target that must come after has already been started (even if suspended).
priority: for setting the target priority (only in a target block).
offer: for exporting config data from targets (only in a target block).
use: for importing other Rig build files. Must also accept a block where build options for those files can be set. (See “Exports” section.)
finish: for finishing a target early. Still can be stamped to see if it changed.
changed: for finishing a target early and force marking it as changed.
unchanged: for finishing a target early and force marking it as unchanged.
platform: for returning a struct of information about the current platform.
rule: like ninja’s rules. Use functions instead for defining rules.
after: like OpenRC’s
after. Instead, implement that directly in
Make it possible to run a target’s stamper and use that to dynamically declare a dependency.
There are three kinds of files used by Rig:
GAML files, for data.
Build scripts, for defining targets and things like that.
Master script, for defining how a build should be done.
This script defines how the build will go.
Built-in master scripts (probably defined in C) will exist for different kinds of builds, such as:
Normal build (what everyone expects).
Hermetic build (like Bazel). Might need separate ones for each platform.
Quick build (for executing when saving files in an editor).
Nix-like build (for acting as a package manager).
Supervision build (for using in
urand init systems).
DevOps build (for using as a Kubernetes, Chef, Ansible, etc. replacement).
The master script can also do things like “lock” the build (by using a lockfile, for example) so that other builds cannot interfere.
It can also make it so the user can launch file scanning in parallel with the actual build.
These scripts mean that there will need to be an API for starting the build and starting the file scan.
Be able to parse and run GNU Makefiles.
Be able to parse and run ninja build files.
Be able to parse and run clang compilation databases.
Be able to parse and run CMake.
Be able to read CMake find package files.
Be able to parse configs from ldconfig, pkg-config, gtk-config, llvm’s config, etc.
Be able to import subprojects and build them.
Have a way of telling the build that some subproject has to have one of its build options set to a specific value. This would allow for projects to properly set things without needing recursive calls to rig.
Like the above, Yao should provide some way for users to say that it is okay if an option is missing, or if it is an error if an option is missing.
Likewise, for above, it must be possible for targets and configuration to be exported.
Waf and Bazel let a target say what include directories that dependents need to include. See https://anteru.net/blog/2017/build-systems-waf/.
This should be extended to any config at all.
However, dependents should be given a choice as to whether they import the targets/config or not.
When projects import subprojects, they can only allow their targets to depend on targets in the subprojcts.
Subprojects must be able to set a default target for projects to depend on.
For example, a library project could set the default to the library file.
This will allow projects to simply “depend” on just the subproject.
Be able to automatically produce pkg-config
Be able to automatically produce CMake
Using the same code that lets maintainers create build scripts with certain configs to be run by end users, generate:
ninja build file.
Windows batch file.
To do the above should require running Rig in a special mode where it imitates the behavior of a metabuild system.
Requires Rig to be in
conditional:modified:staticmode or less powerful.
Each subproject should have its own separate environment.
The parent should be able to set them on import.
However, no other project or parent/ancestor should be able to change the environment after that.
This is to prevent malicious subprojects from modifying the build of others.
Have output like
Reuse the same line (unless compiler warnings/errors happen).
Gather compiler warnings/errors from a step and output all at once, not a bit at a time. (Serialize the logs.)
I should also do this for recursive invocations, so that a submodule’s build warnings/errors are also grouped together. See https://apenwarr.ca/log/20181106.
I might even order the logs like the above post.
Rig should provide minimal output, like good Unix tools. The comments at https://news.ycombinator.com/item?id=28007593 talk about output fatigue.
Should not provide any output except command output if not hooked to a
Estimated completion time, like Shake? Probably.
Clear environment when executing commands.
TMPDIRset to temp directory in build directory.
Run all commands under a user that only has write access to build directory and nothing else.
All programs used and their paths
All input files found by scanning all command arguments for input and output files.
Each element of the command should be checked to see if it is a file (or directory).
Any ones found should be marked.
Any found ones that are not explicitly listed should have warnings?
Make sure there are no setuid/setgid executables in the
Make sure the linker for all executables is there.
Make sure all dynamically linked libraries for executables are there.
Allow user to set defaults with the environment (from DarkUranium on IRC). DarkUranium suggests some sort of config files, not environment variables, since env vars can only be strings.
These could be dot files in the user’s home directory as well as repo-specific ones.
Automatically detect platform and compiler.
Have a database of what each platform provides wrt POSIX and extra API’s.
This includes what types are available.
CFLAGSet al automatically based on the platform and compiler.
Have a database of compilers for command-line options for standard use flags like:
Output object file only.
Sanitizers and static analysis.
Output header dependencies.
This should be used for dynamic header dependencies.
If not available from the compiler, depend on all headers or use some heuristics?
Default to searching for
include), but have a setting for not searching for that (for speed).
Label all as dependencies, even if it’s too conservative. I want a fast and easily built detection rather than a more involved one that will skip unnecessary building. Most
#include’s will be used anyway.
Output file name.
Define or undefine macros.
Output static libraries.
Setting build tuple.
That database should also provide information about what standards are available and what options to use to access them.
If a user selects any of the above options, Rig should figure out the
CFLAGSfor them, but allow them to changed.
CFLAGSmust NOT be messed with unless they explicitly choose options like that.
Make it possible (and fairly easy) for users to change automatically added
CFLAGSand other such things. This is to prevent things like https://stackoverflow.com/questions/34575066/how-to-prevent-cmake-from-issuing-implib.
This will also take care of the embedded world.
When linking static libraries to dynamic libraries, the static libraries need to be compiled with
Have a single file for defining what the build options are.
There needs to be a curses GUI to set the options.
The file should be called
If it only has build options, no.
But if it has other Rig configs, like what language, stamper, and dependencies modes and minimum version of Rig required, yes.
Defaults should be required for every option.
This is so a
riginvocation will do the right thing.
Should encourage the default to be whatever users will want to build.
This includes release mode with sane defaults for project-specific stuff.
Cross-compilation should be supported from day 1.
This is actually necessary, since rig will need a host compiler, so it must know the difference between target and host.
DarkUranium thinks that what my
bcdoes (host compiler, host cflags) is sufficient, but I may also want to add host link flags and libraries.
To cross-compile, I could have an option to specify the cross-compile target.
Zig has made it so users expect a built-in cross compiler, so Rig should do the same thing.
Yes, this means that Rig needs to package a binary clang with sysroot stuff, like Zig.
Also, try to compress the information as much as possible, like Zig.
Support building for multiple arches/platforms at once (like for OSX universal binaries).
User needs to define the compilers for each arch.
This also means a file needs to be associated with multiple targets, i.e., the targets become aliases for the file and certain build configs.
This should be done much the same way that the Xcode build system does it.
Separate computed variable values from parameters set by the user. See https://news.ycombinator.com/item?id=24203503
Rig should have a build database.
Users should be able to store data in the build database and query it through Yao.
The build database should contain the following things:
Filenames, stamper results, etc.
Compiler/command invocations for every target.
All files that have been created by builds that have not been cleaned.
And the clean targets should use that information.
The build database should keep info about the last two builds.
This is for two reasons:
To debug broken builds. Users should be able to provide build database when bugs are found. Having the previous two means that even though the build started overwriting data, the build previous to it is still there.
It will be easier to just have two copies of the database and completely overwrite one when a build happens.
Because of dynamic target generation, the build database must store which targets have been done.
This includes completely disjoint targets. For example, two builds might have been done to deploy two separate machines. The build database should know about them both, even if Rig is not told to keep more than one previous build.
Rig should have different build types.
Built-in ones should include:
Normal build (incremental build).
Quick build (useful when saving files in a text editor; targets listed on the command-line should be files that are changed).
Full build (always do a full build).
Bazel build (imitate Bazel).
Nix build (imitate Nix).
Users should also be able to supply their own build types.
Build types should be able to change the stampers for file targets.
This is necessary for implementing things like Nix and Bazel modes.
If a file target needs something special, it can just implement its own logic internally by using
unchangedand dynamic dependencies.
Instead of a set way of timestamping files, use configurable “file stampers”.
These will be code that decides whether a target is out of date or not.
The default file stamper will use all file attributes from https://apenwarr.ca/log/20181113 to detect changes to a file.
Only one attribute needs to be different, even
There will be hashing stamps (hash the contents of a file).
Users can define others as well.
redostamper (default). Use whatever attributes on Windows make sense.
For supervision system:
Readiness notification by fd.
Readiness notification by scanning log for a particular line.
sshcheck on the target machine.
Check that latest commit matches latest built and tested.
Check that remote cache does not have build.
Non-file targets should be able to define their own stampers.
This is to implement things like what
Default stamper is one that says the target is unchanged if all of its dependencies are unchanged, or changed otherwise.
Have a way of defining target types, kind of like ninja’s rules.
Use Yao functions instead of specific syntax.
Predefine some types:
C/C++ file to object file.
Should include dynamic dependencies on header files.
Could also include some sort of cache check, like
Requires deterministic builds.
List of object files to executable/library/static library.
Generating a config header.
Generating any sort of file.
Using a file as data, as in
gen/. Also, allow some way to make changes to the data.
For supervision system:
Daemon start/stop/restart, etc.
Machine boot and deploy.
A target to pull, build, and test.
Build a package.
For remote caching of the above targets, a target type to check a remote server and download if possible, or build if not.
For canonical headers? See https://anteru.net/blog/2012/canonical-include-files-for-frameworks-libraries/.
Support PHONY targets.
They can have stampers as well, but their default stamper should just be if any of the hashes of all of their dependencies were changed?
Presets (sets of defaults) could be used to make it easy to build different versions.
DarkUranium on IRC doesn’t want presets to be mixed with build types (Release, Debug, RelWithDebInfo, MinSizeRel). He thinks this will lead to a combinatoric explosion of presets. I agree.
One thing we talked about was that a preset doesn’t have to have settings for all options. For the ones that it doesn’t have a setting for, it just uses the default.
Instead of splitting the build type presets from the regular presets, I could instead have a system where a user can specify more than one preset (split by colons on the command-line?). The first preset would set settings first, the next one would set them, overriding the first where they don’t agree, etc.
This will allow the build type presets to be separate but still use the same code while allowing arbitrary nesting of presets, if necessary.
Tasks can set their priority.
A task with higher user-set priority always wins, subject to scheduler constraints.
Handle negative dependencies, which are files that must be missing, such as headers in certain header search paths, if one has been found and is being used.
After reconsidering, it turns out that handling negative dependencies is not necessary if you have a build database combined with dynamic dependencies and a way to mark a target as unchanged as quickly as possible (the
Just one change is needed: any direct or indirect dependency of a target that can have dynamic dependencies must be run.
The reason this works is that, for example, if a header is added, then the dynamic dependency call for headers with
gcc -MMis going to pick up a different list, and since that list will be different, the target will be updated.
It’s necessary to run any direct or indirect dependencies that can have dynamic dependencies because those dynamic dependencies need to be checked before declaring the dependency up-to-date.
It should be possible to have dependencies on data, not just files.
In fact, file dependency is just a special case of data dependency; the data is the content of the file.
Based on https://apenwarr.ca/log/20181113, I could also make it so data is written to files and dependencies on those files are automatically created. This is probably the better solution since it will make all of the data explicit in the build directory.
However, should the interface to the user not worry about files?
There is only one type of dependency, despite my earlier design.
However, the build script(s) should be an implicit dependency of every target that each one contains.
Have an option for a “fully correct” mode:
Hash files, to ensure that any changes redo the build/configure.
Use of a C preprocessor to ensure that header dependencies are complete and accurate.
Ensure that all implicit dependencies are found.
Compiler and command invocations. (Should be checked by default.)
Binaries of all tools used (compiler, linker, ar, user commands, etc).
Environment variables. (Should be checked by default.)
Anytime a target file is renamed or deleted from a build script, delete the old file in the build directory.
Allow source files to list their dependencies, like
No. zv on IRC said, “I think [the] build system should not be part of source code.”
He elaborated: “refactoring code between files should not require me to update the build specification if everything is just being compiled into objects and linked later.”
Targets have a dependency on the Rig file they are defined in.
Everything has a dependency on the build config file (the file output by
rigcin the build directory).
The project-internal hash, like Go module checksums, should be calculated by the git hash, the hash of all build files, and the hash of the build config file.
The list of implicit dependencies I can think of are:
* Build scripts.
* Each target should have an implicit dependency on its build script and
any others higher in the filesystem hierarchy.
* Compiler and other tools.
* Command invocations/config.
* Environment variables. (Should be cleared by default in hermetic mode.)
* The target that generated the target, if that happened.
* The function and arguments used to generate the target, especially if by
-e command line option.
Profiling of build time?
Yes, per target. This will happen automatically, unless turned off. Then it will be used in the job scheduler.
Make it possible for users to time builds and profile them.
When profiling, include the time that rig takes to run.
Produce either HTML or terminal text showing the charts.
rigwhycommand (like waf’s
why) for people to be able to see why things needed to be built.
Other debug information to output can include:
Why something needs to be rebuilt.
The state of a task. (Whether it has run, has missing outputs, failed, skipped, or succeeded.)
The status of a task. (Not ready, skip, or ready.)
Be able to generate graphs of dependencies. (Will need an external tool for this, like dot, unless I can generate ASCII graphs in the terminal. Probably do both.)
Be able to provide a list of targets or files or data on the command-line that should be considered to be out-of-date, even if they are not.
This is for easy testing.
It’s also for debugging when I get a bug report so I don’t have to do garbage like modifying a bunch of files uselessly.
Should have an option to consider a file/target/data and its dependents out-of-date, so that Rig’s optimality can be ignored.
Should be able to read targets/files/data from file.
As the sort-of opposite of the above, there needs to be a way to say that only certain targets are expected to be out-of-date.
This could be tied to the save function in an editor.
Rig should use the build cache and database to start running the needed targets right away.
However, it should also scan all targets at the same time on other threads and error if any are actually out-of-date!
Have a mode to generate an event stream.
The event stream should be a list of events like:
They should be in the order they happened.
This is why starting and finishing things are separate events; if they weren’t, there would still be confusion since tasks and stampers don’t take an even amount of time.
This should not be enabled by default, but I can have users enable it for a build.
Then, there should be an option to play back a build using an event stream.
This would also require automatically forcing targets to be rebuilt based on which targets needed building for the build with the event stream.
Add an option for how many builds should be kept in the build database.
There should also be a command-line option to force keeping all builds, including the new one.
This is so users can keep the history and send it to me for debugging.
This will be especially useful for when users need to redo builds with event streams.
Be able to trace execution of build scripts.
Stack traces in Yao interpreter.
Add the ability for users to request debug info on the code.
This will be file, line, and column data for each bytecode instruction.
This will make it possible to have stack traces, which should be an option as well.
This is also necessary for breakpoints.
Use the debug info to stop at the first instance of reaching a line.
Use the pointer to the parent data, as well as the information to reach the data from the parent.
MUST INCLUDE EXAMPLES!
Such examples must be well thought-out and well commented.
They should also accomplish actual tasks.
Include comparisons to let people give to their managers to convince them to switch.
What NOT to do:
“And with that, they document the things Gradle does, not how to accomplish basic tasks using Gradle.” - From https://news.ycombinator.com/item?id=25807471
Should have API docs (what rig does) as well as functional docs (how to accomplish various certain tasks).
Write a tutorial like _why the Lucky Stiff’s Poignant Guide to Ruby.
Instead of forest animals, use something I know better. Maybe military stuff? Build complexity could be a bad guy or something.
This is so people can connect to it better and for it to be come more approachable. See https://lobste.rs/s/ff54p1/how_nix_nixos_get_so_close_perfect#c_rvkko4 and https://lobste.rs/s/ff54p1/how_nix_nixos_get_so_close_perfect#c_no0rzg.
Suggest using aliases for shorthands:
This is to save typing since a build system is one of the most-used commands.
Parallel processing by default, and a
--jobsflag. Maybe even
What is the ideal number of jobs? Equal to cores? 2x cores to prevent waiting on I/O?
File scanning should be as fast as possible. It should be heavily optimized.
Use file change notifications on platforms where I can.
FSEvents (Mac OSX).
Otherwise, a rig daemon that watches (polls) for changes?
Because file change notification may not be available, the file scanning subsystem should be self-contained and should be switchable, with the fallback being a full scan.
In a fresh build, don’t worry about scanning everything, but build a list of files and what order they are scanned in and what target caused them to be scanned.
In a non-fresh build, start
Nis the ideal number of jobs, and have them all scan files as fast as possible, using the list built during the fresh build.
This is to ensure that Rig is rarely blocked on I/O, or is as little as possible.
The build script thread should have higher priority than build threads.
The scheduler thread should have higher priority than the build script thread.
The threads that do file scanning should have higher priority than the scheduler thread.
Have a stamper for C and C++ files that takes the preprocessed output and hashes it while ignoring the preprocesser
#linejunk? I think that could be a good idea to prevent a lot of spurious compilation.
It should be possible to set the priority of tasks.
If possible, use that to also set the thread and process priority.
Higher priority tasks always get scheduled first.
If build profiling is on (and it should be by default), tasks that were longer in previous builds should have higher priority.
Scheduler constraints: use target tags.
User can limit how many of targets with a certain tag can run at one time.
Security-wise, Rig must use Yao’s capability design.
This includes the sandboxing interpreter.
Each Rig keyword must have a capability attached.
Each build setup (for separate projects) must be considered separate packages (for purposes of enforcing restrictions).
Restricting Yao and Rig keywords and constructs should be how various features are restricted.
This includes dynamic dependencies, which need the
Automatically refuse capabilities to open files outside of the build directory.
Have different modes for limiting the power of the language.
These modes should just be quick ways of enabling certain Yao keywords.
Those modes should be (in order from least to most powerful):
declarativeto do what POSIX
makeand ninja do.
functionalto add functions, but no recursion.
iterativeto add loops.
recursiveto add recursion and everything else.
conditionalshould be the default because it’s probably going to be used the most.
However, the tutorials should explicitly set
declarativeuntil more is needed.
declarativewill only allow registering targets and running commands in the targets.
recursiveis later. This is to make it so rules can be implemented with functions.
These should be enforced because people talk about how they like build systems that have restrictions on power because of incompetent colleagues.
It should still be possible to have Turing-complete scripts because sometimes it’s needed, and if it’s missing, there is literally nothing that can be done except change build systems.
Have two different modes for limiting the power of dependencies:
Again, this will just enable certain Rig keywords.
staticto make dynamic dependencies and tags impossible.
dynamicto add dynamic dependencies and tags and anything else that requires suspending tasks.
Have different modes for limiting the expressivity of stampers.
Again, these are just for quickly giving capabilities to call certain Yao functions to the stampers.
These modes should be (in order from least to most expressive):
modifiedfor accessing modified time of files.
statablefor accessing all (platform-relevant) attributes.
localfor allowing stamping to access anything in the local build directory, but not run commands.
remotefor allowing stamping to access the Internet.
executablefor running commands. Since this would give access to
ssh, which allows running commands remotely, this is more expressive than
The full mode Rig is in should be written in the form
Full mode should be project-wide.
Not including subprojects. Each subproject should be considered its own project, recursively.
Check that user was using Rig correctly. (See top-level “Requirements” section above.)
Have users provide build database.
Ask for build directory tree.
Ask for repo, and what state it was in during the previous build and during the buggy build.
Provide all of the following for users to easily bootstrap Rig:
Bootstrap should do the following:
Build Rig using one of the above.
Rebuild Rig with the newly-built version.
Run the build again (should be no-op).
Clean and build again with the newest build.
Run the build again (should be a no-op).
Test the bootstrap as part of release.
First and foremost: be able to have Nix-like packages and be able to install software and packages from anywhere.
This means that I still have to make it Nix-like, which means making packages that can be installed in something like the Nix Store.
However, it also means having the capability of running environments in containers/jails, etc.
Packages must be installable from version control, and the version must be per-package. This is like Nix flakes.
Checking the version should be the stamper.
Can do this with a lock file that references commits/tags per each package.
Package source must also be from version control. The packager does not upload it, they merely provide a link to the repo and the branch/tag name.
Follow Filesystem Hierarchy Standard.
Package store must be in
Database must be in
Config must be in
Must be able to tell what config version is installed and update it.
Follow XDG Base Directory Spec.
User configs must be
Produce pkg-config, CMake
find_package(), and other files for packages in the store.
This will allow users to use the package store even if they don’t use my build system.
Maybe this will help them gravitate to Rig and Kit.
Use module importation to bridge the gap between between building in the small and building in the large.
Allow the maintainer to create build scripts that can just be run by end users (or packagers). As in, the scripts would effectively be installers for a certain configuration.
Such scripts must NOT call Rig, if at all possible.
In essence, Rig should generate a compilation database and massage it into the right form.
This will require restricting Yao, stampers, and dependencies.
Must still be able to handle global data/config/binaries/libs.
Both install and rollback.
This means installing a global set of programs, etc. Then, if rollback is needed, reinstalling the previous set.
This will require a database. Use SQLite.
Use both content-addressed and input-addressed packages.
Content-addressed is how packages refer to other packages.
Input-addressed is how packages refer to themselves.
Use hard links.
Store a table that allows translation between the two in the database.
Isolate package builds.
Run package builds with server process to prevent users from writing to store.
Run package builds under separate UID’s, to prevent builds from writing to each other’s stuff.
Ensure no commands under those UID’s are left after the build.
Change output directory to global kit user.
Remove write permission and setuid, setgid.
Use sticky directory bit to prevent deleting or renaming directories that don’t belong to build UID’s, or
Require server process to make, rename, or delete directories.
Important sections in Nix thesis:
6.2: Local Sharing.
6.4.3: Building Store Derivations.
7.1.3: Ensuring Purity.
It must be possible for a package to have multiple output directories.
For example, the
glibcpackage could have
glibcwithout headers and
glibc-devcontains the headers.
Since packages will just be targets, use multiple target outputs and run the directory algorithm on all of them.
Nix-style builds should be treated differently from installing system software.
Installing system software should be detectable by outside processes, probably using a start target that activates a socket, and an ending target that deactivates the socket.
Packages must follow group-id, artifact-id, and version system from Maven.
group-id: vendor name, like
artifact-id: actual package name.
In other words, packages need to use reverse fully-qualified domain names (RFQDN), like Java. The package names need to come from the project names, which also need to be RFQDN.
Package and project names must be case-INsensitive. This is to prevent problems with the stupidity of Windows and Mac OSX.
I could also use “bang casing,” which is to put an exclamation mark before every uppercase letter, and then make the uppercase letter lowercase.
Vender names must have a key associated with them; there must be a human.
They should also have to answer a challenge, usually with the domain associated with them, or commit access to the GitHub/git repo.
Packages must be signed.
This makes it so if the domain name ever changes hands, it is known that it still comes from the original packager.
Because multiple versions of libraries can be installed, versions can be pinned, like Maven. Do it transitively.
That should be the default, but also allow latest patch version, latest minor version, and latest version.
Or not? Good arguments against that are in https://platformers.dev/log/2022-03-02-latest-literally-kills-puppies/ and https://news.ycombinator.com/item?id=30576443.
Allow private repos of packages.
Prefer them by default.
If they exist, don’t even go to the central server, by default.
Configurable, can be changed.
Maven also has something where child configs can inherit from parent ones.
Someone on Reddit said that that helps centralize package management for several packages, which means it’s a good thing.
Do it with rules.
Be able to provide a Software Bill of Materials with the package manager.
This will make it appealing to enterprise customers, hopefully.
Users should be able to exclude certain packages.
If an excluded package is needed for an install, either as direct or indirect dependency, then error.
This is to help with Software Bill of Materials stuff.
It is also to allow users to lock out downloading packages with known or possible vulnerabilities.
Rig should have a format for distributing libraries and executables.
The format should be a bundle of:
And one or more Rig scripts to build them and integrate them into builds.
These bundles should also be able to include code for optimization passes in Yvm, as well as LLVM if possible.
These could be frontend-specific, backend-specific, or user-specific.
The build script(s) included with a bundle should be as capable as Zig’s
This means that they should be able to generate code for clients.
They should also be able to specialize certain code.
They should be able to lower code to Yvm or LLVM.
They should be able to define the code for the optimization stage.
This includes the order and number of passes.
It also includes being able to run a pass again if another made a change.
It also includes being able to apply permanent optimizations, i.e., optimizations that actually change the source.
It also includes being able to define how long optimization should be done, whether none, or overnight, etc.
Basically, anything Turing-complete code can do.
This is what will allow Rig to accomplish Daniel J. Bernstein’s vision in “The Death of Optimizing Compilers”.
Implement server for package management, caching, and builds.
To be used for
Allow people to define their own overlays/channels with version-controlled package repos.
Server should detect, and be able to be told, when an upgrade to a minor version causes trouble for clients.
Give warnings for such things.
Show users data about the security of packages:
Whether the maintainer uses 2FA.
Whether the maintainer uses a Yubikey or equivalent.
Do not display this if the maintainer says no.
Show users data about the maintenance status (see this comment):
Maintenance statuses could be:
This is a toy I am sharing because I thought other people might find it amusing.
I don’t intend to support this project in the future, but I’d be happy if someone else took it over.
I don’t intend to support this project in the future, but I will help get someone else up to speed if they want to take it over.
I intend to support this project as a hobby.
I intend to support this project if someone is willing to pay me to do so.
My employer has promised to pay for my time supporting this project.
This project is supported by a team, so it is not vulnerable to a single programmer leaving.
This project is considered critical infrastructure by a major supporter who has promised to maintain support.
This project is part of the language’s core infrastructure and will not be dropped from the standard distribution without deprecation notices and a proposed replacement.
The status should be renewed on every release, or else it will drop one level by default.
Show users data about stability guarantees (see this comment):
API stability guarantees could be:
No stability guarantee.
API stable only for major versions.
API stable with deprecation notices.
API stable with no deprecations.
ABI stability guarantees could be:
No stability guarantee.
ABI stable only for major versions.
ABI stable with deprecation notices.
ABI stable with no deprecations.
The status should be renewed on every release, or else it will drop one level by default.
Implement three branches:
Rolling release, for bleeding edge while being as stable as possible.
Testing, for stabilizing the next release.
Release, for rock solid stable.
Release in May or June and November or December to take advantage of the release cycles of other software.
Users should be able to pick and choose software from all three, even the same software, though only one of the software should be globally installed.
Use Mate or Cinnamon?
Make it easy for non-technical users to use, and easy for power users to take control.
Have separate branches:
master, which is only merged once per day, like Gentoo’s.
dev, which is continuous.
The split is so that people can set their packages to
masteron a particular day, like https://news.ycombinator.com/item?id=30581593.
These are user profiles.
Profiles must be installable from version control.
Checking the version should be the stamper.
Define a new target type:
List of software to install, at what versions.
List of config files and where to deploy them.
How to handle deployment and software installation.
A way to check which version of the user profile is installed.
Machines must be installable from version control.
Checking the version should be the stamper.
Define a new target type:
How to specify a machine, including:
What is installed.
Services that will be brought up on boot,
How to send that info to an actual machine.
How to check the machine and its services are up (
sshchecks, services checks, etc.).
A way to check which version of the machine is installed.
Define a new target type:
A CI build target, which just takes a list of targets to run for:
Stamper should be if the current commit is the same as the previous tested commit.
An important comment from https://ofekshilon.com/2016/08/30/cmake-rants/:
cmake is a hilarious cludge (today) in which most of the industry is using it because there is nothing better. It won out. I want to $!^& every single cmake zealot who tells me “once you get used to it, it’s great.” If people adopt this mindset (which majority do), then progress on a (new) build system will never be made. We should strive for making new build systems that take away the pain for NEWCOMERS, not learning to live with our existing GARBAGE. Yes, cmake IS garbage. FACT. It did what it was supposed to 20 years ago and people came along the way, saw issues developing and tried to make new build systems. Unfortunately, they never won, because by that point people were already complacent with cmake.
There exist a few build systems with more modern ideas; however, most turn out to be toys or don’t scale well or have that one fatal flaw that keeps the idea from flying. Tup is a good example. Monitoring the file system in a server like scheme is a cool concept and allows for ultra fast incremental builds. However, the fatal flaw is that it relies on OS specific hooks to accomplish that functionality, which could fail to work in the next version of the OS. Additionally, when tup was conceived, mainstream blazing fast solid state drives and multi core CPUs were not wholly the norm yet. Granted, I think it was around 2015, but come on… Even the majority of medium to largish size projects running on main stream developer CPUs and solid state drives will take only a second or two to scan a directory structure of thousands of files. Computing speeds will continue to increase, but the number of files in projects will not increase at a matching rate. Basically, cool concept, but not really valid for today’s computing speeds of scanning a massive directory structure, therefore why bother with the OS specific hacks to make it work? And trust me, they are hacks. I work on windows exclusively and the features I needed were broken due to the fact tup is using dll injection on windows.
When did a build system stop being about actually building the code in a sane manner and become all this dependency management garbo? When projects got more complicated? How about this idea for size. How about we make a simple program, A TOOL, that can just compile actual code in an incremental and parallel fashion using standard markup format? Why does the build tool have to do dependency management AND compile C code? I think the reason we get these crappy DSLs that are hard to work with for compiling code is because we try to mash together actual compilation and dependency management which needs all these edge cases and special scripts to make work. Split the problem, the entire build process, into smaller modular tools, instead of trying to do every thing and the kitchen sink in one tool. Each individual tool can handle one part of the overall build process much better than the whole. You then string all the tools together using a top level script.
I work in embedded and the unfortunate truth is that there are no dedicated build tools made for embedded. Embedded build tools are an afterthought, so people just try to make build systems for embedded by hacking cmake to make it work with those projects. We are stuck with cmake, when its complete overkill for 99% of the projects. There are no dependencies. LIbraries and headers are manually copied from project to project. I literally just need the build system to COMPILE code. You know the most important part of any C build system? COMPILING code. Not dependency management. Not checking if my compiler works on my dev platform. NOT producing *** code for windows, mac, or linux machines. CROSS COMPILATION ONLY. I just want a tool that doesn’t make ANY assumptions about compiler flags or compilers or how my project directory structure is laid out. I can do the leg work of inputting all the exact parameters and build configurations into the tool. I just need the tool to incrementally compile my code in a parallel fashion without HACKS and makefiles and being a “build system generator”. Believe it or not, incremental compilation is not a hard problem to solve. cmake would have you think it is, and that is part of the fracking reason its a build system generator and can’t do the fracking work of actually compiling code itself.
TLDR: Working on a “compilation driver” tool that just compiles code in an incremental and parallel fashion without “rules” and without hacks. Just pass the tool a bunch of configuration files that add source files to the build and it will incrementally compile the project. Source file dependencies on header files? EVERY modern compiler provides a header dependencies option. Funny how cmake still tries to do its own parsing of header files. Here is an idea. How about you just tell my tool what the dependency command line option is and the format of the output file it generates. Then my tool can parse that file and store the information in a database. Novel idea that has been around for what? 15 years? FUNNY how cmake still drags around its legacy header dependency management, because its a piece of GARBAGE.
This is important because I want Rig to be useful in the embedded world. This means that, while I would like to add dependency stuff, it needs to be easy to do cross-compilation and to set extremely specific build options, especially compiler flags.
Another comment, from Reddit:
CMake isn’t terrible. I think a lot of other comments on here have captured the syntax. But I also find a some of the generators pretty subpar.
I work with Xcode so I need to have CMake generate Xcode projects, the what it generates is usually either kind of kludgy, or totally unworkable. I’ve had only a few projects generate mostly clean Xcode output. Usually if object libraries are involved the output is nearly unusable because object libraries are not something Xcode understands. I’ve talked with the CMake maintainers about this, and proposed changes like just mapping object libraries to static libraries. But the maintainers seem set on supporting some object file specific functions even if object libraries as a whole in CMake’s Xcode generator are kind of a train wreck.
Xcode is also inherently a multi-Apple-platform build environment, and a multiple-architectures-per-build environment and CMake does not like that, especially idiomatic CMake. We spend a lot of time patching with toolchain files but CMake developers generally work with the understand they know the destination ABI and architecture up front instead of that being a variable in the generated output.
We’ve given up integrating CMake directly into our build systems because of these issues. Which is a shame, and feels like part of the promise of CMake. CMake gets turned into makefiles which get turned into binaries which we then ingest into our Xcode projects.
So it’s not awful. If you’re doing traditional, simple, single target, single architecture development it’s workable. Maybe even better than the alternatives. But it doesn’t live up to the dream, and it definitely grinds my gears a bit when CMake developers bring up the generators as the solution to everyone’s problems and build systems.
This is why Rig builds need to be multi-platform, if possible.
Use blocking I/O.
Each file is scanned by one thread.
Use file change notifications per-platform.
Make the subsystem switchable.
Implement dumb lexer and parser to bytecode.
Use the Python bootstrap compiler for help.
Implement line and character numbers (later).
This is for stack traces and debugging.
Implement targets as functions.
Implement interpreter as a stack- and register-based VM.
When suspending execution of a target, save the stack and execution stack.
Every time a dependency is added to a target (either statically or dynamically), check for a dependency cycle.
This should be done with integers for “levels” in the dependency tree.
The default target should be the max signed integer (but still use unsigned), and everything that it depends on should be less.
If a target is depended on by something with a smaller index, its index should be decreased to 1 minus the smallest index of its dependents.
If doing so would bring it below its dependencies, mark it and still do so, and recursively apply it.
If coming around to a marked one, there is a cycle. Error.
Use a vectorized linked-list for the scheduler.
It should be an append-only vector (using the same allocation code as Yc’s maps) with two subvectors. The first should be the tasks, and the second should be a vector of two indices per task, one pointing to the previous task, and one pointing to the next task.
Use 32-bit numbers as indices; 4 billion should be enough. Document the limit.
Have three locks: reader/writer, change, append, which must always be taken in that order.
If a file scanning thread finds an outdated file, it should append it to the vector, along with the targets that (recursively) depend on it.
The appending thread should take the append lock, but not the read lock. However, it should update the “next” pointer of the previous element atomically.
Readers, of course, take the reader/writer lock.
If the scheduler wants to change the order of tasks, due to dependencies, priorities, etc., it needs to take the reader/writer lock as a writer, then the change lock. If it needs to change the last element, it should also take the append lock.
Keep track of the end of the point between finished tasks and non-finished tasks.
Static dependencies should be sorted before their dependents because they should be considered as needing to be satisfied before dependents start.
Dynamic dependencies should be sorted after their dependents because they should not be considered as needing to be satisfied before dependents start.
In fact, their dependents should start before running them.
Task dispatch should be based on an extra “head”, the task that is in front of all ready tasks.
Finished tasks should all be before the first ready task.
To dispatch, take the task, check that it does not violate constraints.
If it does not, dispatch it.
If it does, go to the next task and repeat the checks and try again.
When tasks are finished, they should be moved ahead of the first ready task.
When tasks are made “not ready” (by asking for dynamic dependencies, etc), their dynamic dependencies should be directly dispatched.
Do sorting when there are less than
Ntasks running (including file stamping), where
Nis the number of cores.
Output the sort order to the build database after the build.
For incremental builds, have several vectors of bits.
They should be in the sorted order of tasks from the database.
One vector should be if the task is ready.
Another should be if the task is unfinished.
If any of the sort is invalidated during the build, the bit vectors should be destroyed and the normal task dispatch should be used.
Otherwise, to find the next task to dispatch, find the first bit set of the bitwise
&of the vector indices, looping through the indices until finding an appropriate task.
Manages the build database.
Because there are two builds in the database, it handles which of them it goes to.
There should be two files that can be the build database.
Which one is used should be swapped on each build.
Both should use an auto-incrementing build ID. The one with the higher ID is the newer one.
Put the ID first in the file.
Then make sure GAML can look for a specific item and return it early.
This is so that it is not necessary to scan the entirety of both build databases.
The definitions of these terms apply only to this document.
A right to execute certain items. See the Object-Capability Model.
An allowance to something to accomplish a certain task.
An item that can be run.
Execution of a target.
Any resource that cannot be used concurrently with other instances of the resource. (See this blog post.)
Any limited resource where instances of the resource can be used concurrently with other instances of the same resource. (See this blog post.)