an optimizing compiler to a binary turing machine.
the syntax is fairly simplistic and we are stealing the perl lsp. this is still in early devlopment so any bugs caught would be very apreshated.
currently for x64_linux and a small native Arm (currently bugged). however nothing is inherently linux based in the compiler itself. the build system just assumes bash instead of powershell. it should be easy to fix if needed. if there is demand I would open a windows brench
The arm branch has slightly diffrent performance charcturistics and is tested less thoroughly. cross compilation testing shows that it does fail in some cases and thus needs to be fixed. currently we are using a crude method which should usually work (although i am not sure about the performance implications)
syntax exmples can be found in code_tests/tasks and in tests/code_samples we use perl highlighting and a relativly free form of syntax.
as of now there are 3 major tools:
- tape_tool: used to mainpulate .tape files
- run_turing: runs a turing machine described in .t on a .tape file and writes the output to anoter .tape file
- tmc0/tmc1/treemc/tmc2: would compile a .t file into a binary that acts like run_turing
this languge is really not ment to be written by hand for long periods of time. the main intent is very big autogenerated machines. in code_tests/code_gen.py you can see some examples of this.
currently using export PATH=$(pwd)/bin:$PATH does not work with tmc because the path to io.o is derived programatically. this can be fixed in the future by directly basking io.o's binary into compiler.o as a string. or by using some sort of install system
we depend on nasm and ld and libc. nothing more
runtime+ gnumake gcc and bash.
build+ python (no packages)
the idea is to wrap all of the syscalls into io.o and have that be a stable abi on all platforms. then we generate intel asembly files that would need to be assembeled with nasm (I debated using opcodes but having it this way lets users read the assembly) linking is done with ld
this means there is no gcc dependency!!! or any other compiler for that matter.
we also want to have code comments on the generated assembly for clarity. the idea is to make it as human readble as possible. names from the source file can be found in the assembly output.
make bench
would run the benchmarks.
for selecting what you run go to code_tests/time.py
and define the timers
dictionay.
for the tmc0 test we had
timers={'run_turing_no_stop':time_run_turing_no_stop,'run_turing':time_run_turing,'tmc0':time_tmc0}
the core measurment is a simple null hypothesis test. is A faster then B in most cases? this lets us measure if a preformance improvment is meaningful or not. without assuming a distribution for the noise. it also means we dont need to depend on numpy.
even the O0 code is consistently faster than the interpeters. this is very promicing since there is a lot wrong with tmc0 in terms of preformance. it is made to be as simple as possible not to be fast
from the tests I ran on the interpeter it apears that not keeping track of the counter has a small yet semi consistent advantage. the advantage could very well be caused by the need to parse another argument or something else that is as silly. for most purposes it would probably not matter. the reason I am not using it is mainly because it simplifies some of the logic for the compiler.
from playing around with code vs tape sizes its become pretty clear that the IO part can mess with the benchmark. because of this it is recommanded to use smaller tapes for measuring.
length does seem to matter a lot. I am thinking this is related to branch prediction, or the instruction cache. essentially the compiled code has a major advantage because it is stored in Li and not Ld. the order of the unrolled loops seems to not be relevent.
tmc1 is slightly faster than tmc0. on the hand made code the diffrence is not statistically significant.
from what I seen the code_proning only really matters for when state order is paticularly bad. having it reordered so that the first states apear first has a significant speed advantage. removing dead code has basically no effect...
On the other hand some of the improvments Of jump eliminations can have a positive effect in some cases. The aligment of the extending functions is also super critical,
tmc2 is faster than tmc1 and tmc0 on the handmade code. the diffrence is barely there but its measurble and statistically significant (on the first working test...)
tmc2 is also faster on the larger pool and it makes significantly shorter code EVEN THO it inlines all of the comperisons.
this compiler was used as an intermidate step for devloping tmc2. its faster than tmc0 but slower than tmc1. It does do unique things compared to tmc1 and it can be faster in some cases. just not in most cases
the make test/bench system deletes the compiled files. since everything compiles under a second its not an issue. if this is not desirble test.py and other python tests would work by themselves.
in code_tests all the generated code testing is handeled. tests is where unit/integration tests can be found.
it is recommended to add a small calling script into test.py for every test added.
if you notice our register use does not care to restore regiters at all. this is because we are using _start and not main. so there is actually no one to return to and thus no point preserving registers.
also note that rsp is not treated as a special case in the C++ classes... this means that we allow using it as a temporery register. probably not the most ideal setup but it does make sense if for some reason we ever want to move to a no stack setup.
this is mostly a C project but as a learning exprince I am adding C++ for O2 we will see how it goes.
linking the arm tool chain properly is extremly complicated... we are kind of forced into using gcc as the linker with main instead of start. now this may be a result of virtualization but i am suspecting that its deeper
- add a way to benchmark old+new version of the compiler to asses changes
- unify the cli tools
- add better error handeling with a dedicated struct.
maybe_inline causes UB... need to look into it.
our Exit nodes cause ub when freed after they have moved away from the original state...