Lesson 6: Static Single Assignment #351

sampsyo · 2023-08-21T20:20:34Z

sampsyo
Aug 21, 2023
Maintainer

⚠️ Warning: Implementing the into SSA and out of SSA transformations can be trickier than it looks!

bennyrubin · 2023-09-25T20:37:19Z

bennyrubin
Sep 25, 2023

Summary

For this week I implemented a bril translation to SSA form and a corresponding translation from SSA form back to normal form. This was significantly more difficult and time consuming than the past assignments. This was because of extra complexity in the implementation, but also because of additional complexity in debugging and manually looking through the bril instructions. I had to cache (in my mind) not only the instructions themselves but also what the dominator tree and dominance frontier (for each node) looks like. I wish there was an easier way to debug these low level optimizations on machine instructions...
The process of turning the pseudocode from lecture into a working implementation, was fairly straightforward with a few hiccups. Notably, remember the part in the assignment about certain paths not initializing certain variables being something difficult we have to handle? That took me a lot of extra time to figure out what to do in that situation, and why it was even happening in the first place.
The pseudocode also did not mention renaming for the phi instruction destinations themselves, so I had to realize I needed that and add it in.

Testing

The support in brili for executing programs with Phi instructions made testing significantly easier, allowing me to independently test my implementations of to_ssa and from_ssa. I can imagine the mess it would create trying to only test the composition of the two. My general approach was to get it working on a few small examples, where I could clearly see mistakes in my logic and then move on to brench for the core benchmarks directory. I had a pipeline for just to_ssa and then a separate one for to_ssa followed by from_ssa. Surprisingly, by the time I moved onto brench, my solution worked for almost all of them. I dug into the examples that didn't work to find some small edge cases/errors I missed and fixed those.

Difficulties

There were plenty of those this week. The biggest difficulty I faced was that I found a bug in my dominator tree code while testing ssa. I suppose it's only right because I heavily tested my dominators function, and didn't come up with a good way to test my dominator tree function (assuming that any test would just use the same logic that I used to build the tree). For the sake of time, I decided to just use the correct implementation of dominators from the bril repo, instead of re-doing the assignment from last week. This is definitely a good lesson for the future on sufficiently testing my code, especially functions I will call on for future implementations.
As mentioned earlier, I had a lot of trouble with the case where a variable is not defined along some path and then the translation from SSA would have an error where a variable was undefined. I fixed this by inserting an arbitrary initalization into the block where the variable was supposed to come from in the phi instruction. This also required adding an uninitialized marker in my phi instruction for nodes that are in the dominance frontier but don't define the variable. After doing this, all the brench tests passed.

0 replies

keikun555 · 2023-09-26T20:28:14Z

keikun555
Sep 26, 2023

Summarize what you did.

Lesson 6 Task
bril2ssa Tool: This tool converts Bril code from stdin to SSA-form.
ssa2bril Tool: This tool converts Bril code with Phi operations to standard Bril.
Turnt Testing Suite: Test directory derived from Bril programs in the Bril repo.

Explain how you know your implementation works—how did you test it? Which test inputs did you use? Do you have any quantitative results to report?

This turnt.toml file does differential analysis. It evaluates the original Bril program and makes sure its output is equal to the evaluations of its SSA form and the Bril program derived from the SSA form.
The output of the turnt file is in this turnt.out file and it was generated with

❯ turnt -j $(find . -iname "*.bril") | tee turnt.out

The test file also runs the is_ssa script in the ssa-verify environment. It runs through grep to find the string yes, which means that if the grep fails turnt will report it as a not ok because the return code will not be 0.
The turnt test suite passes for all Benchmark programs

❯ rg "^not ok.*benchmark" turnt.out
❯

It also succeeds on other test files in the repo, except a few because of well-formedness, overridden CMDs and RETURNs, and/or unsupported syntax (such as linking in the linking directory and type-less value operations in the type-infer directory).

What was the hardest part of the task? How did you solve this problem?

There were three main pain points in this task.

Handling function arguments: Function arguments are "hidden" variables in that its definition might not show up in the instructions list. At first I tried handling them differently from value instructions and did not associate labels with them. However, when I was later determining Phi node labels, I found that there might not be labels to refer to the function argument in the very first block of the function! To circumvent this problem, I created a new entry block with the function arguments and inserted it at the top of the basic blocks list of the function using this function.
Determining Phi instruction arguments: The part that took me a long time to get right was the arguments corresponding to the predecessor labels in the Phi instructions. I read through the CMU slides and found that we need to

use the closest def such that the def is above the use in the D-tree

so I used a tree traversal helper function to find which argument to use in the Phi instruction.

Determining values for undefined paths: The last part that was difficult was part of coming out of SSA form. I found that the problem of "dealing with variables that are undefined along some paths" surfaced here. In particular, removing phi instructions made the destination variable undefined for some paths. To counteract this, if some phi instruction block predecessors don't have variables defined, I defined them in the predecessor with default values during the code insertion step.
I found all three pain points above using my test suite and extensive scrutiny of SSA-form Bril code produced by my algorithms to find infinite loops and possibly undefined variables.

0 replies

bcarlet · 2023-09-27T23:48:11Z

bcarlet
Sep 27, 2023

Summary

SSA Conversion

Details

I implemented the into-SSA and out-of-SSA conversions using the algorithms from lecture. In particular, I used the basic out-of-SSA algorithm without extra copy propagation or other optimizations.

Testing

I first used the is_ssa.py script to ensure that the conversion produced valid SSA. I then used brench with brili to test both the SSA form and the round-trip form for correctness. These tests were run on all the benchmarks in the bril repository.

brench also gave me the dynamic instruction counts for each form. I observed a fairly high overhead for both the SSA and round-trip forms. Average overheads for the various benchmark suites were as follows:

suite	baseline	roundtrip	ssa
core	0%	43.27%	36.20%
float	0%	50.36%	43.71%
mem	0%	45.70%	37.48%
mixed	0%	52.02%	45.76%

Difficulties

There were a fair number of edge case to deal with, and, as others have noted, debugging was more difficult for this assignment since it is harder to look at a buggy SSA program and immediately see what's wrong. Testing with the small examples in the examples/test directory did help identify most of the subtleties, though.

0 replies

matth2k · 2023-09-28T18:41:03Z

matth2k
Sep 28, 2023

Summary
- To From SSA
  - Source Files
    - to_ssay.py and from_ssa.py The main driver code for transforming a program to and from
    - butils Directory for my utility code
      - butils/cfg.py An API to build control flow graphs from programs. Has a Block api as well.
      - butils/ssa.py Takes in a CFG and returns a CFG is SSA form.
      - butils/dominance.py The generic dominance info framework
Implementation Details
- Overall, my SSA implementation follows the pseudo-code fairly closely. I construction the control-flow graph, gather dominance info, then kick off the SSA renaming.
- I used GitHub Copilot for some boilerplate code, and I used the baseline implementation of SSA in the Bril repo to help debug some bugs when going roundtrip.
- To manage the phi nodes, I subclassed my Block class into an SSABlock
  - def update_phi_dest(self, var: str, type: Any, ssa_name: str = None) creates or updates a phi node for variable var with the assignment ssa_name
  - def update_phi_arg(self, var: str, label: str, ssa_name: str, insert=True) adds or updates an argument to a phi node for block label with the name ssa_name.
  - def insert_explicit_jmp(self, successors) inserts a jmp operation at the block when the control-flow falls through the block.
  - def pop_phi_nodes(self) deletes the phi nodes and gives an iterator to the deleted nodes
Evaluation
- I evaluated my to and from SSA routine by running every benchmark on it. First, I shot for functional testing, then I also tested that it is indeed in SSA form with the is_ssa.py script.
  - In the "to" direction, I can handle every single bril program in the repo, both in functionality and actually meeting SSA form.
  - In the "from" direction, I can handle all but a handful of the 120 programs. However, there is no program that I fail on that the baseline implementation provided in examples/from_ssa.py doesn't also fail on. In fact, I'm happy that my solution fails on less than the baseline.
    - I experimented with forcing variables to be initialized with some success, but I don't really know what implications this has for correctness. So I dropped it. What's the right thing to do here? Maybe having an undef?
Anything Hard or Interesting?
- This was the hardest task so far, with many pitfalls:
  - My stack was held as a dict[list[Block]], but I was not popping it correctly because all the dictionaries had the same references to the underlying lists. I needed to deepcopy the variable or else I would still have the unpopped stack.
  - Some control flow graphs have unreachable blocks if you parse them naively, like having unreachable jumps or return instructions, and this interfered with my algorithm I modified my CFG class to sniff out these unreachable blocks and just drop them.
  - My SSA code failed on cases where the entry block contains a phi node, because the start the program has no previous block to switch on. In this case, if the entry block has a back edge I reanchor the control-flow graph with a new dummy entry block.
  - When I first implemented control-flow graphs I bundled consecutive labels into one basic block. I didn't like having so many blocks that have zero instructions in them. However, this decision hurt me when implementing SSA. I ended up with two alternative workarounds for this:
    - Just change the block formation step to no longer bundle consecutive labels together
    - Insert jump instructions between every consecutive label, where the jump takes you to the "real" block. Like so:
      .endif.139: jmp .endif.31; .endif.132: jmp .endif.31; .endif.125: jmp .endif.31; .endif.118: jmp .endif.31; .endif.111: jmp .endif.31; .endif.104: jmp .endif.31; .endif.97: jmp .endif.31; .endif.31:
  - Finally, it took me the longest time to realize that I needed a separate varCounter dictionary separate from the stack. Otherwise, we won't pull a fresh variable name when branching in the dominator tree. This to me was the detail missing the most from the pseudo code.

0 replies

Enochen · 2023-09-28T22:58:13Z

Enochen
Sep 28, 2023

Summary

Into SSA
Out of SSA
The Glue

I wrote logic that turns a CFG into SSA as well as back out of SSA. My out of SSA implementation can handle undefined values in phi nodes.

Implementation

One problem I faced and solved was keeping track of the original variable name of a phi node. My initial attempt at going into SSA involved literally inserting the phi instruction into the CFG and then performing the rename step. However, this didn't work because I would end up renaming the phi instruction destinations before I get to the part where I update their args based on their original name using a stack lookup; I've lost their original names at that point because of the rename.
I also wasn't able to keep a simple mapping from phi instruction to the original string because I had the "float" feature on for the brilrs crate, meaning that Instructions weren't hashable 😭.
I ended up keeping track of the phi instructions on the side using a custom struct (allowing me to remember the original names) and then inserting these instructions at the end (after rename), as opposed to what the pseudocode suggested.

On the out of SSA side, I implemented the pseudocode without much problems. (except for Rust borrowing issues, since the pseudocode wanted to mutate via deleting phi nodes + inserting "normal" instructions - I just ended up with two passes, which is suboptimal but 🤷).
However, the pseudocode couldn't support undefined values in phi nodes, so I had to do my own thing.
First, I tried to simply not define variables whose values were undefined. This worked for some cases, but in others it led to some variable being used before it was defined.
The first counterexample I saw was a variable being assigned to itself. A bandaid fix I did was to extend the "undefined" logic to also when a variable would get assigned to itself, but that didn't fix all the problems.
The root problem here was that despite a variable never "technically" being used (given that its value is undefined), it would still be referenced within other phi instructions and thus lead to a phantom reference.
I solved this naively by creating a instruction but with a dummy value (ie 0 for int) to make the variable a real usable value. Most of these artifacts ended up being cleaned up by my DCE, so it wasn't so bad after all. And this made everything work!

Also I had to introduce an empty entry block in the CFG to address loops. And to make things easier to implement I inserted jmp instructions even for blocks that simply fall through.

Testing

I used is_ssa.py in addition to inspection to make sure I was actually producing things in SSA form.

To test correctness, I ran both modes (into-only and into+outof) on the bril repo benchmark suite as well as custom tests for various edge cases I discovered. Since the interpreter was able to evaluate the program in SSA form, I was able to verify correctness just by comparing outputs.

I evaluated the instruction counts against the benchmark suite:

run	% inst count increase
baseline	0%
ssa-into	44.00%
ssa-full	46.81%
ssa-into-dce	21.63%
ssa-full-dce	23.83%

It seems like there are quite a lot of overhead introduced when converting into SSA with the given algorithms, and a bit more tacked on when converting back out.

Difficulties

The biggest challenge when implementing this task (especially with into SSA) was Rust's borrowing shenanigans. When following the pseudocode, there were many parts at which I was supposed to add/remove/modify an instruction but Rust was like "we don't do that here".
To fix this I had to do some borrow checker gymnastics with splitting vectors into separate mutable refs and putting things into RefCells to do dynamic borrow checking.

0 replies

obhalerao · 2023-09-28T23:33:20Z

obhalerao
Sep 28, 2023

I worked with @SanjitBasker on this project.

Summary

Repo link

Implementation Details

In this project, we implemented a transformation in C++ for converting an arbitrary Bril program into SSA form with phi nodes enabled, and an extension to that transformation (also in C++) that removes phi nodes to take the program out of SSA.

The first method we implemented was a robust way to generate fresh variable names; we did so by finding a short string that is not a prefix of any variable name in a given function. This proved to be useful when dealing with variable renaming. We also used this functionality to generate variables with placeholder values to put at the top of each function; for use when converting out of SSA form when dealing with variables of each type that are undefined along some paths. For bools, we used a placeholder value of false; for ints and floats, we used 0, and for each pointer type, we allocated memory of size 1, immediately freed it, and used that as the placeholder pointer. We made sure to only generate placeholder values for types that actually existed in the program.

After those preliminaries, we first added phi nodes for every variable to the blocks on the dominance frontier of wherever that variable was defined. We also made sure to include the function arguments in this as being defined in the entry block. Funnily enough, through this process we were able to uncover some bugs in our implementation of both the dominator tree and the dominance frontier, which were surprisingly subtle to spot on our first pass through. We are now, though, fully confident that our implementation is correct. (As an aside; here's a useful fact we discovered: a block's immediate parent in the dominator tree is the block's strict dominator that has the most dominators itself.)

Next, we implemented variable renaming. This is where our addition of global undefined variables for each type came in: wherever we encountered a variable that was undefined on a path into a block containing a phi node corresponding to that variable, we made that phi node read from the assigned undefined placeholder variable for its type. This ended up solving a lot of the issues that we were having down the line with variables showing up as undefined during the out-of-SSA conversion. Aside from that, we followed the pseudocode very closely, with one exception. When initializing the stacks of variable names, we put the placeholder undefined variables at the bottom of the stack (to deal with the case where the variable is not defined). Then, we added the function argument variable names to the stack (again to deal with that special case). Then, we proceeded as normal. To pop all the names we pushed onto the stack after running the renaming for a given block, we kept track of how many times each variable's stack was pushed onto, and then popped off those many elements.

Lastly, we implemented the out-of-SSA transformation. Though it took us awhile to reason through how to implement adding the relevant instructions in an efficient manner, we eventually found a way. For each variable name in a given phi node, collect all the predecessor blocks for it defined in the phi node. Then, run a multi-source BFS from all those blocks to the block where the phi node was initially defined, adding renaming instructions for that variable for all blocks seen along this BFS. This ensures that no variable in a given node is renamed more than once, thus avoiding duplicate computation.

Testing

For testing, we used brench heavily to verify that both the to-ssa and out-of-ssa transformations worked as intended. All the brench tests (aside for ones that either time out or error on my machine normally) pass. In addition, we wrote a simple bash script that checks that every output of our program is indeed in SSA form by using the is_ssa.py script provided to us.

Results detailing the slowdowns of our SSA and roundtrip transformations (which we labeled as into_ssa and out_of_ssa respectively) can be seen below.

Clearly, there is significant overhead when converting to SSA and even more so when performing the roundtrip pass. However, we did not implement any additional optimizations to our SSA generator. We believe that implementing certain optimizations, such as copy propagation and dead code elimination, have the potential to decrease this overhead greatly.

Difficulties

The primary challenge for this project was debugging. As others have mentioned before, it is less clear where exactly an error is in a program turned into SSA form than with other optimizations, since nearly every variable in the program is modified. In addition, coming across bugs in our implementations of the dominator tree and the dominance frontier was unfortunate, but in hindsight not unexpected, since we did not perform adequate testing on those particular utilities in the previous assignment. We also ran into difficulties figuring out what the exact configuration of the variable stacks should be in the phi node renaming process, though we were eventually able to do so successfully.

0 replies

ryanwmao · 2023-09-29T01:41:06Z

ryanwmao
Sep 29, 2023

@xalbt and I worked together on lesson 6.

Summary

core files
core/ssa and core/leave_ssa

Implementation

We used C++ for this lesson's tasks, using the infrastructure we built for the previous tasks.
We implemented a pass to convert to SSA form and another pass to leave SSA
Our phi nodes are incorporated as part of the data within each basic block, i.e. each basic block has a list of phi variables
And we also did an extra step where we remove dead phi nodes
We have O(n^2) implementation for this task, where n is number of basic blocks. Ideally we can see this being sped up to something like O(nlogn)
We compute only the live phis by propagating livein information
- when processing a bb that is livein, we look for its nearest dominating def; if it is a phi, it is live, and if we haven't already marked it live, we process its predecessors and mark them as livein
Overall, there were many different parts to this task and many different details to pay attention to. We found the implementation for this task to be pretty intensive.

Difficulties

We initially had a really aggressive phi node insertion scheme, where we inserted phi nodes for every live variable on crossing the dominance frontier. On further analysis, we realized this wasn't necessary and we reduced the number of phi nodes significantly.
We also had some trouble with leaving ssa form. Our logic for inserting instructions to replace phi nodes was initially pretty flawed and we had to run through several different implementations before arriving at one that passed our testing.

Testing

The extra functions and tools provided were very helpful in debugging. We made extensive use of both interpreting our SSA form program in addition to the is_ssa.py script to actually verify that we were correctly in SSA form. We ran our compiler on the core test cases in both SSA form and in the form after leaving SSA and continued debugging until we passed the benchmarks in both forms.

orders benchmark to/from ssa with no optimization:

@orders(u.1: int, n.1: int, use_lcm.1: bool) {
._bb.0:
.for.cond:
  u.2: int = phi u.1 u.3 ._bb.0 .for.body.print;
  is_term.1: bool = phi is_term is_term.2 ._bb.0 .for.body.print;
  lcm.1: int = phi lcm lcm.3 ._bb.0 .for.body.print;
  ordu.1: int = phi ordu ordu.4 ._bb.0 .for.body.print;
  gcdun.1: int = phi gcdun gcdun.3 ._bb.0 .for.body.print;
  one.1: int = phi one one.2 ._bb.0 .for.body.print;
  is_term.2: bool = eq u.2 n.1;
  br is_term.2 .for.finish .for.body;
.for.body:
  br use_lcm.1 .lcm .gcd;
.lcm:
  lcm.2: int = call @lcm u.2 n.1;
  ordu.2: int = div lcm.2 u.2;
  jmp .for.body.print;
.gcd:
  gcdun.2: int = call @gcd u.2 n.1;
  ordu.3: int = div n.1 gcdun.2;
.for.body.print:
  lcm.3: int = phi lcm.2 lcm.1 .lcm .gcd;
  ordu.4: int = phi ordu.2 ordu.3 .lcm .gcd;
  gcdun.3: int = phi gcdun.1 gcdun.2 .lcm .gcd;
  print u.2 ordu.4;
  one.2: int = const 1;
  u.3: int = add u.2 one.2;
  jmp .for.cond;
.for.finish:
  ret;
}

orders benchmark with optimization, it removes the phi nodes whose defs never reach a use

@orders(u.1: int, n.1: int, use_lcm.1: bool) {
._bb.0:
.for.cond:
  u.2: int = phi u.1 u.3 ._bb.0 .for.body.print;
  is_term.1: bool = eq u.2 n.1;
  br is_term.1 .for.finish .for.body;
.for.body:
  br use_lcm.1 .lcm .gcd;
.lcm:
  lcm.1: int = call @lcm u.2 n.1;
  ordu.1: int = div lcm.1 u.2;
  jmp .for.body.print;
.gcd:
  gcdun.1: int = call @gcd u.2 n.1;
  ordu.2: int = div n.1 gcdun.1;
.for.body.print:
  ordu.3: int = phi ordu.1 ordu.2 .lcm .gcd;
  print u.2 ordu.3;
  one.1: int = const 1;
  u.3: int = add u.2 one.1;
  jmp .for.cond;
.for.finish:
  ret;
}

chart (sorry very ugly)

0 replies

MelindaFang-code · 2023-09-29T03:44:53Z

MelindaFang-code
Sep 29, 2023

summary

SSA implementation branch
For this task I mainly implemented the to_SSA function and from_SSA function.

To implement to_SSA, I first iterate through all the code blocks and insert phi nodes when needed. Then I use a stack to rename variables using the newly given names.

To implement from_SSA, it has simpler logic. I need to insert code into the phi-containing block’s immediate predecessors along paths, and then remove all the phi node instructions

hardest part of implementation

Most of the logic is covered in lecture, and pseudocode is given. However, it is still quite hard to implement since logically, the first step is to insert phi nodes, but actually it is impossible to find all the needed information for creating a new phi node in one pass as we iterate though the blocks, so for design I choose to first populate a map from block name to phi_node map {var: [[label list][arg list]]}. And then in renaming variables, populate the rest of information for phi nodes

testing

for testing I first run the translated code (SSA versioned code) through the testing script is_ssa.py script to make sure output of the “to SSA” pass is actually in SSA form
check programs do the same thing when converted to SSA form and back again: I run the bril programs before ssa and after ssa transformations to see if the output is the same
bril % bril2json < ../examples/test/to_ssa/if.bril | python3 SSA.py | brilirs true

0 replies

20ashah · 2023-09-29T03:50:01Z

20ashah
Sep 29, 2023

Summary

@JohnDRubio and I worked on writing python scripts to convert a bril program to SSA form, convert from SSA form back to a runnable program by removing phi nodes, and lastly calculating the overhead of these operations by comparing the dynamic instruction count of the original program, and after converting to and from SSA form.

Implementation

The first function that we implemented was one to add Phi nodes where necessary in our blocks, but with just generic variable names (ex. a = phi a a .label1 .label2). As shown in the algorithm in class, we used the dominance frontier to calculate where to place the phi nodes. Once we used this to figure out where to place a phi node, the next thing was to figure out which of the predecessors for that block were actually reachable from that specific variable definition. To accomplish this, we used our helper function that we used in the previous Lesson tasks called getPaths. This function took a specific start node and destination node and returned a list of paths from that start node to the destination node (Again, the source for where we got this algorithm is here). Then, for the particular block in which we want to add the phi node for the variable x, we loop through all of the paths from the entry block to this block, and for each path which never defines a variable x, we do not count that predecessor as a valid argument in our phi instruction.

Once we implemented this, we had all the phi nodes in the correct spots in every block, and the next step was implementing the rename function to actually generate unique names for each definition in our cfg. To do this, we implemented the recursive rename algorithm described in class, starting by renaming the entry block. Most of the algorithm was fairly straight forward based on the pseudo-code, but there were a few areas where we added some logic that wasn't explicitly mentioned that is worth noting:

In the first for loop of looping through instructions to replace each argument and come up with a new destination name / push it on the stack, we specifically did not rename the arguments for phi instructions, since this would taken care of in the second for loop for the successors of the block. Instead, we only updated the destination for phi nodes in this first for loop, because again, we only took care of renaming the arguments in phi nodes in the second for loop (not the destination) for the successors.
The second thing was related to how we updated the arguments for the phi nodes in the second for loop for the block's successors. The pseudo-code simply says for each variable v in the phi node, make it read from the stack of the variable v. When we implemented it this way, we noticed (as expected) that the phi node arguments would be the same name all the time, which doesn't make sense. Instead, when renaming the phi nodes, we only rename the argument which corresponds to the label which we just came from. In this way, since the phi nodes are updated multiple times in this algorithm (since multiple labels point to them), and we update the arguments independently for each dominator label, we ultimately rename the whole phi instruction by the end correctly.
Lastly, another thing we noticed was that when recursively calling rename on the blocks that were immediately dominated by the current block, the order in which we process these blocks changed the orderings, but not in a way that changed the logic of the algorithm, which was cool for us to see.

At this point, we had written a function that converted a bril program into SSA form. The next function we implemented was the one that would convert a function already in SSA form back to one that could be executed (by removing the phi nodes and adding in extra blocks). To do this, we simply following the algorithm presented in class, which was much simpler than converting to SSA. To do this, we started by looping through every phi node in our program. For each one, we created two new blocks before the block containing the phi node. These blocks simply contained an id instruction from the corresponding argument in the phi instruction to the destination of the phi instruction as well as a jump to the block containing the phi instruction. The next thing was looking at the previous immediate predecessor of the current block with the phi instruction, and changing any control flow instructions to the newly created block instead of the current block (We didn't have to worry about if the predecessor fell through to the current block, because now it will fall through to the newly created block). Finally, we remove the phi instruction.

Testing

The first simple tests that we ran were running our programs on the simple examples shown in the video for lesson 6. Once we verified that our code worked with these examples, we started doing more complicated tests involving the benchmarks. A main part of our testing strategy was using the provided is_ssa.py file to make sure that our programs after running the toSSA function on them were actually in SSA form.

Overhead

For looking at the overhead of our algorithm, we first looked at the dynamic instruction count of the original program for each benchmark in the core directory, and then after running toSSA and fromSSA, looked at the dynamic instruction count again to compare them. We reported these results in a spreadsheet. This contains the number of dynamic instructions for each test in the core benchmarks before and after our transformation along with the percent increase.

Difficulties

The area that caused us the biggest difficulty was the rename function. With the other functions (fromSSA, inserting phi nodes), the pseudo-code was pretty straight forward in the algorithm. The fact that the rename function was recursive made it slightly difficult in itself to reason about, in addition to the pseudo-code being a little less complete and a lot more edge cases that we had to think about. Ultimately, I think that these difficulties were good looking back on it, because resolving them helped us gain a better understanding of the algorithm.

There are a few bugs that we are ironing out regarding arguments to functions. Besides this, it seems like our toSSA function works correctly right now, since running is_ssa.py on the output seems to be correct for all the benchmarks. However, our fromSSA is giving us differing results than the original program, which we are trying to work through.

0 replies

rcplane · 2023-09-29T03:58:41Z

rcplane
Sep 29, 2023

I worked with @zachary-kent on this project and entered the wonderful world of Haskell.

Summary

repo link

SSA transformation was a complex but rewarding program transformation to implement. We observed successful transformation of many benchmarks and recorded observed overheads.

Implementation

SSA.hs

A new pass option for ssa program transformation was added to the existing Haskell toolchain.

Various program representation and manipulation routines required modification in support of phi instructions, and ssa transformation through recursive dominance tree traversal. Building upon existing control flow graph and dominator tree computing methods, phi node insertions and variable renamed instructions were accumulated in a modified cfg and then recapitulated as a transformed Bril program.
Phi node insertion was fairly straightforward, following from the pseudocode. For every block B of in the dominance frontier of some block defining variable x, we insert a phi node x <- phi x .l_1 ... x .l_n where every l_i is the label of a predecessor of B.
We also declare that the artificial entry block defines all of the arguments to a function; that is, for every argument x, Def[x] is initialized to be { entry }.
One other tricky aspect of the pseudocode is the fact that you're not really "iterating" over the blocks defining a variable, as these are being updated during the loop itself. We implemented this more akin to a worklist.
Implementing renaming in a functional language was actually significantly easier to get right, as it eliminated the need for stacks of variable names. When renaming a block B, the stack of variables names is only needed so that the renames occurring in one child of B in the dominance tree don't affect the renames of another. However, with immutable maps, this was simply not a concern, and we could implement renaming with a simple map from old names to new new names instead of a map from old names to a stack of new names.
We also had to deal with cases where some paths into a block define a variable, while others do not. Consider the step of renaming block B with label l_j where you rewrite the arguments to some phi instruction x <- phi x .l_1 ... x .l_j ... x .l_n. If x is known to be undefined at B, then there will be no entry for x in the rename mapping. In this case, we rewrite the phi instruction to x <- phi x .l_1 ... x .l_n, deleting the argument for l_j. This is also supported by the implementation of brill; if brill executes this instruction, it will leave x undefined as desired.
One other notable point is that all parameters must be defined in the original rename mapping.
Out of SSA was trickier than we expected. We still followed the general algorithm defined in class, but had to insert default initializations for undefined variables. Specifically, if we are processing node x <- phi x .l_1 ... x .l_n at basic block B with a predecessor not referenced in the phi instruction, we insert a default initialization x <- 0 in the predecessor.
We cleaned up the code produced from out of SSA using the DCE pass Zak developed for the L4 tasks, which had a very substantial impact.

VSCode with Haskell plugin was our main tool of choice for IDE. Opening the bril-hs directory is important.
Generative AI usage - In VSCode Insiders latest, Github Copilot v1.119.448 release version was used for code suggestion, but on the whole failed to generate useful Haskell code suggestions. It did provide some good explanations of what various methods and symbols do in context. Also, ChatGPT4 September 25 2023 version was used to generate a short python csv parsing and markdown generating script for the overhead table below resulting from brench ssa.toml.

Testing

ssa test running bril programs
is_ssa.toml

cd bril-hs
brench test/ssa.toml
brench test/is_ssa.toml
brench test/no_dce_ssa.toml

As usual, we tested our transformations exhaustively on all benchmarks using brench.
After implementing into SSA conversion, we used brench to transform every benchmark into SSA and then interpret it, ensuring the output agreed with the original program.
We targeted some hand crafted examples from lecture notes, notable special cases first such as branching control flow, assignment in only one path of a diamond cfg, and mutation of function parameters
We also used brench to run is_ssa.py on every SSA'ified benchmark, ensuring that they actually are SSA. Using is_ssa.toml we evaluated that ssa transformed programs execute correctly compared to before transformation and record the dynamic instruction count for overhead calculation.
After implementing out of SSA conversion, we then implemented a "round trip" transformation into and out of SSA and ensured running it produced the same output as the original program.
We continued our regular regression testing using stack test in the bril-hs directory.
We were able to achieve 100% correctness over all Bril benchmarks for both into and out of SSA transformations!

Results

SSA with DCE percentage overhead

Bril Benchmark Name	% Overhead	Percentage Overhead Bar Chart
quadratic	5.35%	0+++++
primes-between	13.98%	0+++++++++++++
birthday	9.30%	0+++++++++
orders	19.47%	0+++++++++++++++++++
sum-check	39.90%	0+++++++++++++++++++++++++++++++++++++++
palindrome	39.26%	0+++++++++++++++++++++++++++++++++++++++
totient	36.76%	0++++++++++++++++++++++++++++++++++++
relative-primes	7.38%	0+++++++
hanoi	0.00%	0
is-decreasing	11.02%	0+++++++++++
check-primes	4.31%	0++++
sum-sq-diff	13.23%	0+++++++++++++
fitsinside	0.00%	0
fact	-0.44%	0
loopfact	14.66%	0++++++++++++++
recfact	-0.96%	0
factors	22.22%	0++++++++++++++++++++++
perfect	30.60%	0++++++++++++++++++++++++++++++
bitshift	2.40%	0++
digital-root	13.36%	0+++++++++++++
up-arrow	36.51%	0++++++++++++++++++++++++++++++++++++
sum-divisors	37.74%	0+++++++++++++++++++++++++++++++++++++
ackermann	0.00%	0
pythagorean_triple	12.80%	0++++++++++++
dot-product	13.64%	0+++++++++++++
euclid	11.01%	0+++++++++++
binary-fmt	0.00%	0
lcm	5.59%	0+++++
gcd	32.61%	0++++++++++++++++++++++++++++++++
catalan	14.93%	0++++++++++++++
armstrong	21.80%	0+++++++++++++++++++++
pascals-row	3.42%	0+++
collatz	10.06%	0++++++++++
sum-bits	19.18%	0+++++++++++++++++++
rectangles-area-difference	14.29%	0++++++++++++++
mod_inv	10.04%	0++++++++++
reverse	26.09%	0++++++++++++++++++++++++++
fizz-buzz	0.03%	0
bitwise-ops	26.92%	0++++++++++++++++++++++++++
cholesky	12.23%	0++++++++++++
mat-inv	4.02%	0++++
function_call	bas&ssa problem	no bar
ray-sphere-intersection	0.00%	0
conjugate-gradient	15.86%	0+++++++++++++++
n_root	33.02%	0+++++++++++++++++++++++++++++++++
newton	11.98%	0+++++++++++
euler	1.94%	0+
riemann	18.12%	0++++++++++++++++++
mandelbrot	1.02%	0+
norm	26.53%	0++++++++++++++++++++++++++
cordic	19.92%	0+++++++++++++++++++
pow	5.56%	0+++++
sqrt	11.49%	0+++++++++++
quickselect	19.71%	0+++++++++++++++++++
sieve	14.04%	0++++++++++++++
bubblesort	11.86%	0+++++++++++
primitive-root	4.92%	0++++
adler32	47.53%	0+++++++++++++++++++++++++++++++++++++++++++++++
adj2csr	16.12%	0++++++++++++++++
csrmv	30.01%	0++++++++++++++++++++++++++++++
major-elm	19.15%	0+++++++++++++++++++
max-subarray	17.10%	0+++++++++++++++++
mat-mul	13.70%	0+++++++++++++
fib	22.31%	0++++++++++++++++++++++
vsmul	4.76%	0++++
quicksort-hoare	16.78%	0++++++++++++++++
quicksort	14.77%	0++++++++++++++
two-sum	-8.16%	--------0
eight-queens	5.50%	0+++++
binary-search	-3.85%	---0

Difficulties

Reaching an intermediate point to test the SSA program translation was an area of difficulty, especially notable with the need to now account for phi instructions in our program representation, parsing and json conversion. This was mildly exacerbated by the difficulty of debugging in Haskell due to lazy evaluation pruning. We mitigated this concern with partial implementations that could return after incomplete or no-op transformations, and piping into bril2txt for printing before more ambitious brili execution.
As noted by other groups, there are multiple cases of program structure that place not immediately obvious implementation demands on state tracking and phi node manipulation. We approached these by manually reasoning about cfg possibilities in whiteboard design and successively broadening our testing from simple to notable structures and later more complex programs.

cfg block self loops debugged with dot-product.bril
digital-root cfg larger loop back, varied test to set a fixed input and reduce loop to smaller readable size
Adjusted to check for a block in its own dominance frontier, these difficulties were addressed.
Undefined variables were quite annoying to deal with. We thought we had dealt with them completely during the into SSA transformation, but this was not the case. Eventually, we were able to confirm that this was indeed the issue by feeding the failing benchmarks into the Bril web playground and examining the SSA produced for them -- every single one had special case handling for undefined variables.

0 replies

AliceSzzze · 2023-09-29T04:00:05Z

AliceSzzze
Sep 29, 2023

SSA folder in repo

Summary

I implemented a basic version of the “into SSA” and “out of SSA” transformations on bril functions in this week's assignment.

Difficulties

There were many difficulties and edge cases in this assignment. I started off by closely following the pseudocode, but realized that there were lots of small details that I needed to get right for the transformations to work.
I realized that I had been computing dominance frontiers wrong! I totally missed the word "strictly" in the definition "A’s dominance frontier contains B iff A does not strictly dominate B, but A does dominate some predecessor of B." Thanks to Collin and Kei for discussing this on Zulip.

Testing

For the from-SSA and to-SSA transformations, I used brench to run the different forms of the program on all the benchmarks in the Bril repository.
I also used to is_ssa.py script to verify that the code is in SSA form

Overhead

still trying to work through some bugs in some of the cases, but here are some overhead stats in the benchmarks that ran correctly.

benchmark	SSA	baseline	SSA/baseline
ackermann	1464231.0	1464231.0	1.000000
adler32	11862.0	6851.0	1.731426
armstrong	189.0	133.0	1.421053
binary-fmt	127.0	100.0	1.270000
binary-search	78.0	78.0	1.000000
birthday	921.0	484.0	1.902893
bitshift	187.0	167.0	1.119760
bubblesort	330.0	253.0	1.304348
catalan	1003827.0	659378.0	1.522385
check-primes	15916.0	8468.0	1.879547
collatz	254.0	169.0	1.502959
conjugate-gradient	3811.0	1999.0	1.906453
csrmv	182358.0	121202.0	1.504579
digital-root	340.0	247.0	1.376518
dot-product	128.0	88.0	1.454545
euclid	878.0	563.0	1.559503
euler	2174.0	1908.0	1.139413
fact	229.0	229.0	1.000000
factors	128.0	72.0	1.777778
fib	202.0	121.0	1.669421
fitsinside	10.0	10.0	1.000000
fizz-buzz	10261.0	3652.0	2.809693
gcd	76.0	46.0	1.652174
hanoi	129.0	99.0	1.303030
is-decreasing	190.0	127.0	1.496063
lcm	3359.0	2326.0	1.444110
loopfact	215.0	116.0	1.853448
mat-inv	1264.0	1044.0	1.210728
mat-mul	3359295.0	1990407.0	1.687743
max-subarray	270.0	193.0	1.398964
mod_inv	1032.0	558.0	1.849462
n_root	1201.0	733.0	1.638472
newton	269.0	217.0	1.239631
norm	829.0	505.0	1.641584
palindrome	498.0	298.0	1.671141
pascals-row	278.0	146.0	1.904110
pow	62.0	36.0	1.722222
primes-between	877571.0	574100.0	1.528603
primitive-root	11824.0	11029.0	1.072083
pythagorean_triple	108148.0	61518.0	1.757990
quadratic	1401.0	785.0	1.784713
quicksort	408.0	264.0	1.545455
quicksort-hoare	38707.0	27333.0	1.416127
ray-sphere-intersection	142.0	142.0	1.000000
recfact	104.0	104.0	1.000000
rectangles-area-difference	16.0	14.0	1.142857
relative-primes	2668.0	1923.0	1.387415
reverse	82.0	46.0	1.782609
riemann	469.0	298.0	1.573826
sieve	5402.0	3482.0	1.551407
sum-bits	101.0	73.0	1.383562
sum-check	8021.0	5018.0	1.598446
sum-divisors	258.0	159.0	1.622642
sum-sq-diff	5664.0	3038.0	1.864384
two-sum	148.0	98.0	1.510204
vsmul	102426.0	86036.0	1.190502

0 replies

NgaiJustin · 2023-09-29T04:01:01Z

NgaiJustin
Sep 29, 2023

Summary

Repo (link)
SSA Implementation (link)

basic-phi	basic-ssa

loop-phi	loop-ssa

Details

SSA

I began my implementation by strictly adhering to the pseudocode provided in the lecture notes but I ran into to manyyy issues along the way. I initially extended my Node class to contain an optional PhiNode which would contain a mapping from pre_ssa_vars to (node_source, new_var) pairs (stored as a dictionary). WIth this implementation, I extended the suite of visualizations I had previous worked on to also display the PhiNode information (as seen in the post-phi-injection) above.
However, there was one big issue with this Node-based implementation. When reconstructing the bril output and injecting phi instructions, I need to recover a label of the basic block that assigned to the variable. But my node level granularity meant that each block contained only one instruction (with labels being a Node themselves). Updating the phi nodes of the successor was also challenging since I chose to split up the CFG at a per-instruction level, this meant that a node that does not assign to a variable could have an “immediate successor with a phi_node” that reads from this variable. Instead of rehauling everything to operate at a block level, I used an additional set to keep track of the path (set of node IDs) that are in the recursive stack of renames, this allowed me to pinpoint which path the rename was coming from and update the phi_node accordingly.
After integrating this funky traversal logic, I managed to get this working for all acyclic cases but in the presence of specific loops (e.g. when there is a variable defined outside the loop that is overridden inside the loop followed by a jump out), I had observed missing sources in the phi-nodes at the entry of the loop (specifically the sources that dominate the entry of the loop i.e. defined before even entering the loop).
The final straw was when I realized that the translation back to bril with the additional of phi instructions would be non-trivial since the instruction takes labels and not the location of instructions. I consider a few methods—adding a label in front of each instruction, opportunistically injecting a label right before where the phi-node would want to jump to, adding a post-process pass to replace all the instruction locations with the closest label etc.—but all these methods would either add far too much overhead or dicey to implement.
I was pretty deep into the implementation by this point, but eventually, I decided to switch to a block-based (multiple instructions per block) CFG which had the awesome invariant of each block being tagged with a label (in scenarios where the block doesn't start with a label, I simply just assign a unique one to it). This not only made the phi-node insertion pass much, much simpler, but it also made the translation back to bril much easier since the (node_source, new_var) pairs that I store would simply be a (label, new_var) pair that I could simply add to the phi instruction. However, this also meant redoing all the CFG construction, dominator analysis, and updating my SSA logic to leverage this new CFG structure.

Testing/Evaluation

Debugging the phi nodes and renaming bugs was very annoying. To make the process a bit smoother, I wrote some disjoint examples that disjoint examples such as lesson_tasks/l6/basic.bril, lesson_tasks/l6/repeat.bril , etc. (phi node injection without rename, rename without any phi nodes), and debugged exclusively on those before merging the sections together and testing on the larger suite of tests.
I managed to use my dominating_frontier visualizations from the previous lesson tasks to greatly help with debugging efforts
I ran my ssa pass on all .bril files in core: bril2json < xyz.bril | python3 ssa.py -to | brili

Difficulties

There were many edge cases and due to previous architecture choices (as mentioned above) I further confused myself. Notably handling missing definitions was challenging, not because of the case itself but because the appearance of these cases lead me to doubt the correctness in the other steps of the algorithm rather than handle the case directly.
Despite the warning, I started significantly later than I did for the previous assignments (a large oversight on my part).

0 replies

yxd97 · 2023-09-29T04:07:46Z

yxd97
Sep 29, 2023

Summary

Transform a program into SSA form to_ssa.py
Transform a program out of SSA form from_ssa.py
Both passes work on the tests from task4 (dataflow), task3 (local analysis), and programs in the bril core benchmark except two.
- For the two failed tests, the to_ssa pass can output a program in SSA form, but the output porgram never finish.
I did not use any AI tools

Implementation

The to_ssa pass follows the pseudocode discussed in lecture.
- When adding the phi nodes, I will put all labels of the predecessors of the block, regardless if the variable is defined.
- In the renaming process, the variable name stack can be seen as a call stack for the renaming function because we pop all names added to the stack before return. Therefore, all names in the stack is local to that block and it's dominating blocks. As a result, if some argument of a phi instruction is not defined, the corresponding stack will be empty.
- When that happens, I will mark that argument with {arg}.NOT_DEFINED. This special name will be used in the from_ssa pass.
The general idea of the from_ssa pass is to insert an id instruction in the predecessors of a phi node to change the name of the variables so that we can safely remove the phi nodes. However, there are several issues to note:
- If a phi argument is {arg}.NOT_DEFINED, we skip the corresponding predecessor block because adding an id there causes a reference to undefined variable.
- In some corner cases, there are still id instructions referring to undefined variable (e.g., pythagorean_triple.bril, where two nested loops share the same variable for the exit condition). I wrote another pass to eliminate these ill-formed ids. The pass will check every id in the program, and only keep it if one of the following three conditions are met:
  - there is a definition in one of its dominators
  - there is a definition in all predecessors of one of its dominators
  - it is a function argument

Testing

I tested the passes on three sets of test cases:

Test programs of the local analysis in task3. These are simple tests to help me quickly build the porgram.
Test programs of teh dataflow analysis in task4. These programs are small but contains various CFG topologies so that they are perfect for debugging the passes.
Programs from the bril core benchmark. These programs are diverse enough to trigger corner cases.

Challenging part

As the course website says, dealing with missing definitions is a huge headache. There are simply too many corner cases. Also, since the two passes are connected, I have to investigate which one (or both) has problems before debugging. I would say testing is the key to this challenge. The benchmark problems contributed lots of interesting CFGs like nested loops, self loops, empty blocks,... etc., whcih are really useful to search for potential bugs.

However, there are still two programs that falls into infinite loop after converting to SSA. I cannot figure out what is the problem before the deadline.

0 replies

stephenverderame · 2023-09-29T04:22:41Z

stephenverderame
Sep 29, 2023

Summary

Repo
SSA conversion
Out of SSA conversion with semi-aggressive coalescing
is-ssa tool that panics if a program is not in SSA when it's supposed to or if it contains phi nodes when it shouldn't

Implementation

To implement SSA, I first started by inserting phi nodes as we discussed in class. I realized I actually didn't get a chance to test my dominance frontier function from last time (although I tested the rest of the dominator tree) and discovered, as one would expect, that there were bugs. I ended up adding phi nodes in an iterative fashion. Since adding phi nodes creates new definitions that may expand the dominance frontier, I kept iterating, adding, or updating the phi nodes until convergence.

To handle when certain variables didn't exist along a given path, I decided to simply drop that path from the phi node. So for example, if A, B, C merge into D, but only A and C define v, then in D I would construct a phi node like

v = phi v v .A .C

For renaming variables, I followed the approach in class, but took an immutable approach. So instead of having a map of stacks, I used the call stack itself.

For going out of SSA, I first started with the basic approach of inserting copies at the end of predecessor blocks of phi nodes, and then removing those phi nodes. The out-of-SSA transformation actually revealed a few issues with my into ssa pass.

Once that was working, I took a look at the paper Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency. I didn't follow the paper exactly, but I took a few ideas from it. On top of the basic out-of-ssa transformation, I applied move coalescing exactly as performed in Chaitin style register allocation. One little extra is that I decided to handle the extra case where we have something like:

v = id a;
u = id a;

Since I stubbornly didn't feel like first transforming to CSSA and changing my original out-of-ssa pass, I decided to handle this by first performing an available copies analysis. Then, two values with overlapping live ranges do not interfere if they are copies as determined by the available copies analysis. The extra hitch is that on one path, this extra condition might find two values to be non-interfering, but on another, they may interfere. I handled this by basically ensuring that if values interfere on any path, they will interfere in the interference graph. This ends up being a bit more complicated than the paper because of the fact that when I perform the coalescing, the program is in almost, but not quite ssa form. In particular, it is SSA except for the case where multiple blocks merge into one block where a phi node used to be. In that case, the old phi definition variable can have multiple definitions, one at the end of each predecessor. I believe this extra condition should be able to correctly determine when more aggressive coalescing can be performed.

Here is an example of checkPrimes function of check_primes.bril

CFG: (Original):

CFG: (In SSA):

CFG: (After out-of-ssa):

Interference Graph:
Dashed lines denote move-related nodes

Coalesced Interference Graph:

Final Result:

Testing

Testing was done with Turnt on every benchmark I have, as usual. As I usually do, I also implemented everything in stages, checking correctness at each stage. For example, into-SSA was checked before I started out of SSA, out-of-SSA was checked before I started coalescing, etc. I also performed some manual checks with the available copies analysis and interference graph by displaying each in graphs.

In terms of performance, I collected the following metrics compared to a dummy-pass, that is, these take into account how I reorder my basic blocks:

Metric (vs reorder)	SSA+Coalesce	SSA
Average Speedup	1.231x	0.837x
Median Speedup	1x	0.838x
Avg Instr % increase	-12.825%	24.291%

Both SSA passes measured were round-trip. I was somewhat surprised to see how coalescing improved performance, however, this makes sense as many benchmarks are generated from frontends which
make a lot of copies, and coalescing is essentially a more aggressive copy propagation pass.

This definitely shows that SSA has quite some overhead though, and I was pretty happy with how coalescing basically dealt with this completely.

Difficulties

There were quite a few in this assignment, some involving the actual algorithms, and others because I made silly decisions. Besides the challenges I mentioned in the implementation section, another big problem was that in past assignments, I didn't output a label if nothing jumped to the particular block. This caused a problem for phi nodes in cases where one of the predecessors was a fallthrough and wasn't given a label.

Another issue was dealing with function arguments. My solution to this was to just insert a new start block that would copy each function argument into itself. Ie, if the function as arguments a and b, this new block would contain the instructions:

a = id a;
b = id b;

Then, performing SSA as I already implemented would work because this would get renumbered to:

a.0 = id a;
b.0 = id b;

A particularly dumb decision I made was when writing the available copies dataflow analysis for more aggressive coalescing. For some reason, I thought to myself: "Available copies, I can do this from memory, in one try, without testing first, what could possibly go wrong?" Needless to say, it had two bugs (discovered so far): when a new write occurred, I forgot to remove all copies of the overwritten variable, and I made a mistake in meet regarding handling how I was representing the top element of the lattice.

0 replies

vivianyyd · 2023-09-29T04:33:05Z

vivianyyd
Sep 29, 2023

SSA Code

Summary

We (Will and I) did our implementation in Kotlin (built on top of the code from previous lessons).
We implemented methods to translate CFGs into and out of SSA form.
This was the hardest one yet! The testing is not fully complete, but we will try to get 100% correctness on all the
benchmarks.

Implementation details

The overall algorithm for getting into SSA was similar to that shown in class. The code for getting out of SSA was nearly
identical to pseudocode from class.
Undefined variables: If the stack for a variable is empty when renaming the RHS of phi nodes of a node's successors,
then the variable is undefined on that path, so we don't add any names to the phi node.
We place phi instructions at the beginning of each block or directly below the label if there exists one.
Brili throws errors when it is possible to reach a phi node before passing two labels, so we subverted this by
adding a dummy labeled entry node. This is fine because when converting out of SSA, all copy instructions are placed in
blocks from which each definition "originated".

How did you test it? Which test inputs did you use? Do you have any quantitative results to report?

Testing was straightforward using our brench setup from last time. Our into-ssa form had around 90+% correctness on
benchmarks using the phi extension for brili. Out of ssa had slightly less correctness; we plan to nail down correctness
for into-ssa before getting correctness all the way for out of ssa.

What was the hardest part of the task? How did you solve this problem?

A very tricky part of this assignment was how to handle when variables are not defined on particular paths. We noticed
this happens when the stack is empty during renaming, so we decided to "prune" out the assignments for which this
happened.
We also noticed lots of small caveats that weren't in the pseudocode. For example, we had to rename the destination of
the phi node.
Finally, testing was also pretty tricky since we had several bugs that only popped up when CFGs were complicated
(which made them harder to test by hand)

1 reply

willwng Oct 20, 2023

Accidentally deleted this comment instead of editing, adding it again
Just a quick update, we passed all benchmarks for both into SSA and out of SSA. We were still having issues with undefined variables, so our hacky fix (based on some suggestions here - thanks!) was to create a default value for each type and just assigning to this value. We're still wondering if there's a better solution, but this seemed to patch up our bugs.

A quick brench run demonstrates that our average percent of additional instructions was around 60%, which seems pretty daunting but we're confident that when we combine dead code elimination with our live variable dataflow analysis, this number will be much lower.

evanmwilliams · 2023-10-01T03:16:50Z

evanmwilliams
Oct 1, 2023

Summary

For this task, @emwangs and @he-andy and I implemented SSA translation:

Into SSA: You can find the code here
Back from SSA: You can find the code here under the Lesson 6 folder

Implementation Details

We used a mix of C++ and Rust on this assignment. The C++ was for the out of SSA pass and the Rust was for the into SSA pass. This is mainly because it was an easier division of work for us
This was a very demanding assignment! SSA has a lot of edge cases that make it very difficult to implement it. To an extent, the pseudocode in the notes was very useful but there are also quite a few lapses in the logic in the pseudocode to the logic in the actual implementation
Debugging was quite hard because there are a lot of small edge cases that are not easy to detect when actually coding up the implementation. Then, when running against the entire benchmark, it'd be revealed.
Our implementation works on most of the benchmarks. We are getting different outputs between the baseline and the SSA execution on about 9
In general, the dynamic instruction count goes up quite a bit when using SSA form, but it definitely makes some of the other optimizations a lot easier to implement

Challenges

Like I said, the bugs in this assignment were endless. Namely, it was really hard to know how to map the pseudocode from lecture into something actually usable in our code base. We had to do a lot of refactoring to add extra functionality to our CFGs and dominators because we noticed that we needed it for SSA conversion
Small ambiguities made our lives a bit difficult. Things like how to specify phi instructions in Bril and where to put label instructions caused a lot of headache
If a variable isn't defined along a certain path, the program would crash after SSA conversion. Guaranteeing that this wouldn't happen required careful reconstruction of the program when taking into account the paths through the CFG execution takes. This was pretty hard to get right, and we added a lot of manual checks to deal with this case

Results

benchmark	run	result
quadratic	baseline	785
quadratic	to_ssa	1403
quadratic	to_from_ssa	1423
primes-between	baseline	574100
primes-between	to_ssa	877740
orders	baseline	5352
orders	to_ssa	8341
relative-primes	baseline	1923
relative-primes	to_ssa	2670
hanoi	baseline	99
hanoi	to_ssa	137
hanoi	to_from_ssa	128
check-primes	baseline	8468
check-primes	to_ssa	15917
sum-sq-diff	baseline	3038
sum-sq-diff	to_ssa	5665
sum-sq-diff	to_from_ssa	5847
fact	baseline	229
fact	to_ssa	230
fact	to_from_ssa	230
loopfact	baseline	116
loopfact	to_ssa	216
loopfact	to_from_ssa	217
recfact	baseline	104
recfact	to_ssa	105
recfact	to_from_ssa	105
factors	baseline	72
factors	to_ssa	129
factors	to_from_ssa	133
perfect	baseline	232
perfect	to_ssa	436
perfect	to_from_ssa	441
digital-root	baseline	247
digital-root	to_ssa	341
up-arrow	baseline	252
up-arrow	to_ssa	470
sum-divisors	baseline	159
sum-divisors	to_ssa	259
ackermann	baseline	1464231
ackermann	to_ssa	1464232
ackermann	to_from_ssa	1464232
pythagorean_triple	baseline	61518
pythagorean_triple	to_ssa	108152
euclid	baseline	563
euclid	to_ssa	879
euclid	to_from_ssa	890
binary-fmt	baseline	100
binary-fmt	to_ssa	137
binary-fmt	to_from_ssa	150
lcm	baseline	2326
lcm	to_ssa	3359
lcm	to_from_ssa	3483
gcd	baseline	46
gcd	to_ssa	76
gcd	to_from_ssa	85
catalan	baseline	659378
catalan	to_ssa	1003828
catalan	to_from_ssa	974303
armstrong	baseline	133
armstrong	to_ssa	190
armstrong	to_from_ssa	204
pascals-row	baseline	146
pascals-row	to_ssa	280
pascals-row	to_from_ssa	267
collatz	baseline	169
collatz	to_ssa	241
collatz	to_from_ssa	255
sum-bits	baseline	73
sum-bits	to_ssa	101
sum-bits	to_from_ssa	107
rectangles-area-difference	baseline	14
rectangles-area-difference	to_ssa	17
rectangles-area-difference	to_from_ssa	19
reverse	baseline	46
reverse	to_ssa	83
reverse	to_from_ssa	82
fizz-buzz	baseline	3652
fizz-buzz	to_ssa	10262
bitwise-ops	baseline	1690
bitwise-ops	to_ssa	2408

CFG for `gcd`

0 replies

collinzrj · 2023-10-01T15:13:16Z

collinzrj
Oct 1, 2023

Summary

For this task, I implemented to_ssa and from_ssa in task6.py.
I have also implemented a test script that automatically compares the result with and without the to_ssa and from_ssa to make sure the result after converting to and from ssa is still correct. I didn't use brench to benchmark that since I encountered some problems when using brench for testing, and I find it easier to write a testing script than debugging that problem.

Implementation

I encountered several difficulties while implementing this task. First of all, this task reveals some of the mistakes I didn't realize when I implement the global analysis. For example, I didn't realize that a block can be in the dominance frontier of self, and that produced some errors in my to_ssa implementation. A second difficulty is to handle the case when the block A and B are both predecessors of the block C, while a value is only defined in block B. In that case, we may still have to insert a phi node to block C, but when the program visits block C after block A, it will crash since the phi node doesn't contain a value from block A. To handle this problem, I insert __undefined as a placeholder, and remove all phi nodes with __undefined in their arguments as a post-processing step.

Testing

In my testing script, the script first runs a program in bril benchmark without my transformation and with my transformation. Then I compare these two different results to see if they are the same. I have tested on all the programs in bril benchmark. For the to_ssa transformation, all test cases passed, but for the to_ssa and from_ssa transformations, 4 cases over 80 failed, I wasn't able to solve that problem because of time constraints.

Result

Here is the result for my implementation, each cell represents the total_dyn_inst of that transformation. If there is a bug in that program, it shows missed.

name	original	to_ssa	to_and_from_ssa
quadratic.bril	785	827	827
primes-between.bril	574100	780151	654355
orders.bril	5352	6584	6394
palindrome.bril	298	415	415
totient.bril	253	395	346
relative-primes.bril	1923	2113	2065
hanoi.bril	99	99	99
check-primes.bril	8468	9078	8833
sum-sq-diff.bril	3038	3440	3440
mytest.bril	4	4	4
fact.bril	229	228	228
loopfact.bril	116	133	133
recfact.bril	104	103	103
factors.bril	72	88	88
perfect.bril	232	298	303
bitshift.bril	167	171	171
digital-root.bril	247	291	280
up-arrow.bril	252	370	344
sum-divisors.bril	159	192	219
ackermann.bril	1464231	1464231	1464231
pythagorean_triple.bril	61518	69269	69394
dot-product.bril	88	98	100
euclid.bril	563	625	625
binary-fmt.bril	100	100	100
gcd.bril	46	61	61
catalan.bril	659378	757792	757792
armstrong.bril	133	162	162
pascals-row.bril	146	151	151
collatz.bril	169	186	186
sum-bits.bril	73	87	87
rectangles-area-difference.bril	14	15	16
mod_inv.bril	558	684	614
reverse.bril	46	58	58
fizz-buzz.bril	3652	6019	3653
bitwise-ops.bril	1690	2142	2145
cholesky.bril	3761	5669	missed
mat-inv.bril	1044	1078	1086
function_call.bril	59809726	54208816	54208816
ray-sphere-intersection.bril	142	142	142
conjugate-gradient.bril	1999	2738	2316
n_root.bril	733	975	975
newton.bril	217	243	243
euler.bril	1908	1945	1945
riemann.bril	298	352	352
mandelbrot.bril	2720947	2811666	missed
cordic.bril	517	2444	620
pow.bril	36	38	38
sqrt.bril	322	395	359
sieve.bril	3482	3946	3971
bubblesort.bril	253	283	283
primitive-root.bril	11029	11490	11664
adler32.bril	6851	10078	10108
adj2csr.bril	56629	72604	missed
max-subarray.bril	193	226	226
mat-mul.bril	1990407	3183564	missed
fib.bril	121	148	148
vsmul.bril	86036	90133	90133
quicksort.bril	264	293	303
eight-queens.bril	1006454	1075320	1061840
binary-search.bril	78	75	75

0 replies

Lesson 6: Static Single Assignment #351

sampsyo Aug 21, 2023 Maintainer

Replies: 17 comments · 1 reply

Summary

Testing

Difficulties

Summarize what you did.

Explain how you know your implementation works—how did you test it? Which test inputs did you use? Do you have any quantitative results to report?

What was the hardest part of the task? How did you solve this problem?

Summary

Details

Testing

Difficulties

Summary

Implementation

Testing

Difficulties

Summary

Implementation Details

Testing

Difficulties

Summary

Implementation

Difficulties

Testing

summary

hardest part of implementation

testing

Summary

Implementation

Testing

Overhead

Difficulties

Summary

Implementation

Testing

Results

Difficulties

Summary

Difficulties

Testing

Overhead

Summary

Details

SSA

Testing/Evaluation

Difficulties

Summary

Implementation

Testing

Challenging part

Summary

Implementation

Testing

Difficulties

Summary

Implementation Details

Challenges

Results

CFG for gcd

Summary

Implementation

Testing

Result

sampsyo
Aug 21, 2023
Maintainer

Replies: 17 comments 1 reply

CFG for `gcd`