Replies: 17 comments 1 reply
-
SummaryFor this week I implemented a bril translation to SSA form and a corresponding translation from SSA form back to normal form. This was significantly more difficult and time consuming than the past assignments. This was because of extra complexity in the implementation, but also because of additional complexity in debugging and manually looking through the bril instructions. I had to cache (in my mind) not only the instructions themselves but also what the dominator tree and dominance frontier (for each node) looks like. I wish there was an easier way to debug these low level optimizations on machine instructions... TestingThe support in brili for executing programs with Phi instructions made testing significantly easier, allowing me to independently test my implementations of to_ssa and from_ssa. I can imagine the mess it would create trying to only test the composition of the two. My general approach was to get it working on a few small examples, where I could clearly see mistakes in my logic and then move on to brench for the core benchmarks directory. I had a pipeline for just to_ssa and then a separate one for to_ssa followed by from_ssa. Surprisingly, by the time I moved onto brench, my solution worked for almost all of them. I dug into the examples that didn't work to find some small edge cases/errors I missed and fixed those. DifficultiesThere were plenty of those this week. The biggest difficulty I faced was that I found a bug in my dominator tree code while testing ssa. I suppose it's only right because I heavily tested my dominators function, and didn't come up with a good way to test my dominator tree function (assuming that any test would just use the same logic that I used to build the tree). For the sake of time, I decided to just use the correct implementation of dominators from the bril repo, instead of re-doing the assignment from last week. This is definitely a good lesson for the future on sufficiently testing my code, especially functions I will call on for future implementations. |
Beta Was this translation helpful? Give feedback.
-
Summarize what you did.
Explain how you know your implementation works—how did you test it? Which test inputs did you use? Do you have any quantitative results to report?
What was the hardest part of the task? How did you solve this problem?There were three main pain points in this task.
so I used a tree traversal helper function to find which argument to use in the Phi instruction.
|
Beta Was this translation helpful? Give feedback.
-
SummaryDetailsI implemented the into-SSA and out-of-SSA conversions using the algorithms from lecture. In particular, I used the basic out-of-SSA algorithm without extra copy propagation or other optimizations. TestingI first used the
DifficultiesThere were a fair number of edge case to deal with, and, as others have noted, debugging was more difficult for this assignment since it is harder to look at a buggy SSA program and immediately see what's wrong. Testing with the small examples in the |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
SummaryI wrote logic that turns a CFG into SSA as well as back out of SSA. My out of SSA implementation can handle undefined values in phi nodes. ImplementationOne problem I faced and solved was keeping track of the original variable name of a phi node. My initial attempt at going into SSA involved literally inserting the phi instruction into the CFG and then performing the rename step. However, this didn't work because I would end up renaming the phi instruction destinations before I get to the part where I update their args based on their original name using a stack lookup; I've lost their original names at that point because of the rename. On the out of SSA side, I implemented the pseudocode without much problems. (except for Rust borrowing issues, since the pseudocode wanted to mutate via deleting phi nodes + inserting "normal" instructions - I just ended up with two passes, which is suboptimal but 🤷). Also I had to introduce an empty entry block in the CFG to address loops. And to make things easier to implement I inserted jmp instructions even for blocks that simply fall through. TestingI used To test correctness, I ran both modes (into-only and into+outof) on the bril repo benchmark suite as well as custom tests for various edge cases I discovered. Since the interpreter was able to evaluate the program in SSA form, I was able to verify correctness just by comparing outputs. I evaluated the instruction counts against the benchmark suite:
It seems like there are quite a lot of overhead introduced when converting into SSA with the given algorithms, and a bit more tacked on when converting back out. DifficultiesThe biggest challenge when implementing this task (especially with into SSA) was Rust's borrowing shenanigans. When following the pseudocode, there were many parts at which I was supposed to add/remove/modify an instruction but Rust was like "we don't do that here". |
Beta Was this translation helpful? Give feedback.
-
I worked with @SanjitBasker on this project. SummaryImplementation DetailsIn this project, we implemented a transformation in C++ for converting an arbitrary Bril program into SSA form with phi nodes enabled, and an extension to that transformation (also in C++) that removes phi nodes to take the program out of SSA. The first method we implemented was a robust way to generate fresh variable names; we did so by finding a short string that is not a prefix of any variable name in a given function. This proved to be useful when dealing with variable renaming. We also used this functionality to generate variables with placeholder values to put at the top of each function; for use when converting out of SSA form when dealing with variables of each type that are undefined along some paths. For bools, we used a placeholder value of false; for ints and floats, we used 0, and for each pointer type, we allocated memory of size 1, immediately freed it, and used that as the placeholder pointer. We made sure to only generate placeholder values for types that actually existed in the program. After those preliminaries, we first added phi nodes for every variable to the blocks on the dominance frontier of wherever that variable was defined. We also made sure to include the function arguments in this as being defined in the entry block. Funnily enough, through this process we were able to uncover some bugs in our implementation of both the dominator tree and the dominance frontier, which were surprisingly subtle to spot on our first pass through. We are now, though, fully confident that our implementation is correct. (As an aside; here's a useful fact we discovered: a block's immediate parent in the dominator tree is the block's strict dominator that has the most dominators itself.) Next, we implemented variable renaming. This is where our addition of global undefined variables for each type came in: wherever we encountered a variable that was undefined on a path into a block containing a phi node corresponding to that variable, we made that phi node read from the assigned undefined placeholder variable for its type. This ended up solving a lot of the issues that we were having down the line with variables showing up as undefined during the out-of-SSA conversion. Aside from that, we followed the pseudocode very closely, with one exception. When initializing the stacks of variable names, we put the placeholder undefined variables at the bottom of the stack (to deal with the case where the variable is not defined). Then, we added the function argument variable names to the stack (again to deal with that special case). Then, we proceeded as normal. To pop all the names we pushed onto the stack after running the renaming for a given block, we kept track of how many times each variable's stack was pushed onto, and then popped off those many elements. Lastly, we implemented the out-of-SSA transformation. Though it took us awhile to reason through how to implement adding the relevant instructions in an efficient manner, we eventually found a way. For each variable name in a given phi node, collect all the predecessor blocks for it defined in the phi node. Then, run a multi-source BFS from all those blocks to the block where the phi node was initially defined, adding renaming instructions for that variable for all blocks seen along this BFS. This ensures that no variable in a given node is renamed more than once, thus avoiding duplicate computation. TestingFor testing, we used brench heavily to verify that both the to-ssa and out-of-ssa transformations worked as intended. All the brench tests (aside for ones that either time out or error on my machine normally) pass. In addition, we wrote a simple bash script that checks that every output of our program is indeed in SSA form by using the Results detailing the slowdowns of our SSA and roundtrip transformations (which we labeled as Clearly, there is significant overhead when converting to SSA and even more so when performing the roundtrip pass. However, we did not implement any additional optimizations to our SSA generator. We believe that implementing certain optimizations, such as copy propagation and dead code elimination, have the potential to decrease this overhead greatly. DifficultiesThe primary challenge for this project was debugging. As others have mentioned before, it is less clear where exactly an error is in a program turned into SSA form than with other optimizations, since nearly every variable in the program is modified. In addition, coming across bugs in our implementations of the dominator tree and the dominance frontier was unfortunate, but in hindsight not unexpected, since we did not perform adequate testing on those particular utilities in the previous assignment. We also ran into difficulties figuring out what the exact configuration of the variable stacks should be in the phi node renaming process, though we were eventually able to do so successfully. |
Beta Was this translation helpful? Give feedback.
-
@xalbt and I worked together on lesson 6. Summarycore files Implementation
Difficulties
TestingThe extra functions and tools provided were very helpful in debugging. We made extensive use of both interpreting our SSA form program in addition to the
|
Beta Was this translation helpful? Give feedback.
-
summarySSA implementation branch To implement to_SSA, I first iterate through all the code blocks and insert phi nodes when needed. Then I use a stack to rename variables using the newly given names. To implement from_SSA, it has simpler logic. I need to insert code into the phi-containing block’s immediate predecessors along paths, and then remove all the phi node instructions hardest part of implementationMost of the logic is covered in lecture, and pseudocode is given. However, it is still quite hard to implement since logically, the first step is to insert phi nodes, but actually it is impossible to find all the needed information for creating a new phi node in one pass as we iterate though the blocks, so for design I choose to first populate a map from block name to phi_node map {var: [[label list][arg list]]}. And then in renaming variables, populate the rest of information for phi nodes testing
|
Beta Was this translation helpful? Give feedback.
-
Summary@JohnDRubio and I worked on writing python scripts to convert a bril program to SSA form, convert from SSA form back to a runnable program by removing phi nodes, and lastly calculating the overhead of these operations by comparing the dynamic instruction count of the original program, and after converting to and from SSA form. ImplementationThe first function that we implemented was one to add Phi nodes where necessary in our blocks, but with just generic variable names (ex. a = phi a a .label1 .label2). As shown in the algorithm in class, we used the dominance frontier to calculate where to place the phi nodes. Once we used this to figure out where to place a phi node, the next thing was to figure out which of the predecessors for that block were actually reachable from that specific variable definition. To accomplish this, we used our helper function that we used in the previous Lesson tasks called getPaths. This function took a specific start node and destination node and returned a list of paths from that start node to the destination node (Again, the source for where we got this algorithm is here). Then, for the particular block in which we want to add the phi node for the variable x, we loop through all of the paths from the entry block to this block, and for each path which never defines a variable x, we do not count that predecessor as a valid argument in our phi instruction. Once we implemented this, we had all the phi nodes in the correct spots in every block, and the next step was implementing the rename function to actually generate unique names for each definition in our cfg. To do this, we implemented the recursive rename algorithm described in class, starting by renaming the entry block. Most of the algorithm was fairly straight forward based on the pseudo-code, but there were a few areas where we added some logic that wasn't explicitly mentioned that is worth noting:
At this point, we had written a function that converted a bril program into SSA form. The next function we implemented was the one that would convert a function already in SSA form back to one that could be executed (by removing the phi nodes and adding in extra blocks). To do this, we simply following the algorithm presented in class, which was much simpler than converting to SSA. To do this, we started by looping through every phi node in our program. For each one, we created two new blocks before the block containing the phi node. These blocks simply contained an id instruction from the corresponding argument in the phi instruction to the destination of the phi instruction as well as a jump to the block containing the phi instruction. The next thing was looking at the previous immediate predecessor of the current block with the phi instruction, and changing any control flow instructions to the newly created block instead of the current block (We didn't have to worry about if the predecessor fell through to the current block, because now it will fall through to the newly created block). Finally, we remove the phi instruction. TestingThe first simple tests that we ran were running our programs on the simple examples shown in the video for lesson 6. Once we verified that our code worked with these examples, we started doing more complicated tests involving the benchmarks. A main part of our testing strategy was using the provided is_ssa.py file to make sure that our programs after running the toSSA function on them were actually in SSA form. OverheadFor looking at the overhead of our algorithm, we first looked at the dynamic instruction count of the original program for each benchmark in the core directory, and then after running toSSA and fromSSA, looked at the dynamic instruction count again to compare them. We reported these results in a spreadsheet. This contains the number of dynamic instructions for each test in the core benchmarks before and after our transformation along with the percent increase. DifficultiesThe area that caused us the biggest difficulty was the rename function. With the other functions (fromSSA, inserting phi nodes), the pseudo-code was pretty straight forward in the algorithm. The fact that the rename function was recursive made it slightly difficult in itself to reason about, in addition to the pseudo-code being a little less complete and a lot more edge cases that we had to think about. Ultimately, I think that these difficulties were good looking back on it, because resolving them helped us gain a better understanding of the algorithm. There are a few bugs that we are ironing out regarding arguments to functions. Besides this, it seems like our toSSA function works correctly right now, since running is_ssa.py on the output seems to be correct for all the benchmarks. However, our fromSSA is giving us differing results than the original program, which we are trying to work through. |
Beta Was this translation helpful? Give feedback.
-
I worked with @zachary-kent on this project and entered the wonderful world of Haskell. SummarySSA transformation was a complex but rewarding program transformation to implement. We observed successful transformation of many benchmarks and recorded observed overheads. ImplementationA new pass option for ssa program transformation was added to the existing Haskell toolchain.
VSCode with Haskell plugin was our main tool of choice for IDE. Opening the bril-hs directory is important. Testingssa test running bril programs
ResultsSSA with DCE percentage overhead
DifficultiesReaching an intermediate point to test the SSA program translation was an area of difficulty, especially notable with the need to now account for phi instructions in our program representation, parsing and json conversion. This was mildly exacerbated by the difficulty of debugging in Haskell due to lazy evaluation pruning. We mitigated this concern with partial implementations that could return after incomplete or no-op transformations, and piping into bril2txt for printing before more ambitious brili execution.
|
Beta Was this translation helpful? Give feedback.
-
SummaryI implemented a basic version of the “into SSA” and “out of SSA” transformations on bril functions in this week's assignment. Difficulties
Testing
Overhead
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Summary
Implementation
TestingI tested the passes on three sets of test cases:
Challenging partAs the course website says, dealing with missing definitions is a huge headache. There are simply too many corner cases. Also, since the two passes are connected, I have to investigate which one (or both) has problems before debugging. I would say testing is the key to this challenge. The benchmark problems contributed lots of interesting CFGs like nested loops, self loops, empty blocks,... etc., whcih are really useful to search for potential bugs. However, there are still two programs that falls into infinite loop after converting to SSA. I cannot figure out what is the problem before the deadline. |
Beta Was this translation helpful? Give feedback.
-
Summary
ImplementationTo implement SSA, I first started by inserting phi nodes as we discussed in class. I realized I actually didn't get a chance to test my dominance frontier function from last time (although I tested the rest of the dominator tree) and discovered, as one would expect, that there were bugs. I ended up adding phi nodes in an iterative fashion. Since adding phi nodes creates new definitions that may expand the dominance frontier, I kept iterating, adding, or updating the phi nodes until convergence. To handle when certain variables didn't exist along a given path, I decided to simply drop that path from the phi node. So for example, if A, B, C merge into D, but only A and C define
For renaming variables, I followed the approach in class, but took an immutable approach. So instead of having a map of stacks, I used the call stack itself. For going out of SSA, I first started with the basic approach of inserting copies at the end of predecessor blocks of phi nodes, and then removing those phi nodes. The out-of-SSA transformation actually revealed a few issues with my into ssa pass. Once that was working, I took a look at the paper Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency. I didn't follow the paper exactly, but I took a few ideas from it. On top of the basic out-of-ssa transformation, I applied move coalescing exactly as performed in Chaitin style register allocation. One little extra is that I decided to handle the extra case where we have something like:
Since I stubbornly didn't feel like first transforming to CSSA and changing my original out-of-ssa pass, I decided to handle this by first performing an available copies analysis. Then, two values with overlapping live ranges do not interfere if they are copies as determined by the available copies analysis. The extra hitch is that on one path, this extra condition might find two values to be non-interfering, but on another, they may interfere. I handled this by basically ensuring that if values interfere on any path, they will interfere in the interference graph. This ends up being a bit more complicated than the paper because of the fact that when I perform the coalescing, the program is in almost, but not quite ssa form. In particular, it is SSA except for the case where multiple blocks merge into one block where a phi node used to be. In that case, the old phi definition variable can have multiple definitions, one at the end of each predecessor. I believe this extra condition should be able to correctly determine when more aggressive coalescing can be performed. Here is an example of checkPrimes function of check_primes.bril Interference Graph: TestingTesting was done with Turnt on every benchmark I have, as usual. As I usually do, I also implemented everything in stages, checking correctness at each stage. For example, into-SSA was checked before I started out of SSA, out-of-SSA was checked before I started coalescing, etc. I also performed some manual checks with the available copies analysis and interference graph by displaying each in graphs. In terms of performance, I collected the following metrics compared to a dummy-pass, that is, these take into account how I reorder my basic blocks:
Both SSA passes measured were round-trip. I was somewhat surprised to see how coalescing improved performance, however, this makes sense as many benchmarks are generated from frontends which This definitely shows that SSA has quite some overhead though, and I was pretty happy with how coalescing basically dealt with this completely. DifficultiesThere were quite a few in this assignment, some involving the actual algorithms, and others because I made silly decisions. Besides the challenges I mentioned in the implementation section, another big problem was that in past assignments, I didn't output a label if nothing jumped to the particular block. This caused a problem for phi nodes in cases where one of the predecessors was a fallthrough and wasn't given a label. Another issue was dealing with function arguments. My solution to this was to just insert a new start block that would copy each function argument into itself. Ie, if the function as arguments
Then, performing SSA as I already implemented would work because this would get renumbered to:
A particularly dumb decision I made was when writing the available copies dataflow analysis for more aggressive coalescing. For some reason, I thought to myself: "Available copies, I can do this from memory, in one try, without testing first, what could possibly go wrong?" Needless to say, it had two bugs (discovered so far): when a new write occurred, I forgot to remove all copies of the overwritten variable, and I made a mistake in |
Beta Was this translation helpful? Give feedback.
-
Summary
Implementation details
How did you test it? Which test inputs did you use? Do you have any quantitative results to report?
What was the hardest part of the task? How did you solve this problem?
|
Beta Was this translation helpful? Give feedback.
-
SummaryFor this task, @emwangs and @he-andy and I implemented SSA translation:
Implementation Details
Challenges
Results
CFG for
|
Beta Was this translation helpful? Give feedback.
-
SummaryFor this task, I implemented ImplementationI encountered several difficulties while implementing this task. First of all, this task reveals some of the mistakes I didn't realize when I implement the global analysis. For example, I didn't realize that a block can be in the dominance frontier of self, and that produced some errors in my TestingIn my testing script, the script first runs a program in bril benchmark without my transformation and with my transformation. Then I compare these two different results to see if they are the same. I have tested on all the programs in bril benchmark. For the ResultHere is the result for my implementation, each cell represents the
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions