-
Notifications
You must be signed in to change notification settings - Fork 0
/
DOCUMENTATION
60 lines (53 loc) · 5.86 KB
/
DOCUMENTATION
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The present software is of a modular design and has been written with OOP principles in mind. However, since SIC/XE has a complicated specification, my program is likewise multifaceted.
I will describe its operation in sequence and from a high level.
The first half of this document (describing pass one) is basically unchanged from the documentation submitted for a previous assignment.
First, we are given the input text file, which we read line by line.
Although we know the input file represents a SIC/XE program overall, the role of each line individually is initially a mystery.
The Program.TryParse method takes a path to the input file and attempts to return an instance of Program.
The "Program" type is little more than a list of Lines.
The most imoprtant action of Program.TryParse is to parse each line by calling Line.TryParse on all non-blank, non-comment lines.
In turn, Line.TryParse tries to parse the given string as either an instruction on an assembler directive--whichever works.
Once parsed, these are represented in my program as instances of the Instruction or AssemblerDirective types respectively, which are the two subclasses of the abstract type Line.
Once an assumption for the type of line is fixed, actually parsing an AssemblerDirective is quite simple.
Instructions have a similar grammar to AssemblerDirectives, but parsing them is slightly more complicated due to the variety among instruction formats and the need to recognize {+ @ # ,}.
Once Program.TryParse returns, we have a complete representation of the input program in memory, and we know whether or not it obeys the syntactic rules for SIC/XE programs.
If the input appears to be a valid program, we proceed by creating an instance of Assembler.
This instance will contain all the state we need to produce listing and object files for our particular program (including a reference thereto).
The PassOne method operates on the Program as before in a line-by-line fashion.
Pass one begins by calling PreprocessLiterals, a routine that that scans the program for literals and create BYTE directives for them.
These directives are inserted into the program after the following LTORG, or else at the end of the program, and they are assigned a line number of "0" to indicate that they did not appear in the original program,
but rather that they were generated by the assembler. Eventually, these directives appear in listing file at the relevant locations, but they do not interfere with the original line numbers.
Each line is inspected to determine whether it is an AssemblerDirective or an Instruction.
In the case that the Line is an AssemblerDirective, the mnemonic determines how to handle it.
RESB and RESW have similar meanings, but every other directive is treated as pretty much a special case.
Instructions, meanwhile, are much simpler to handle here.
Overall, when PassOne returns, its two main jobs of setting each line's address and building a symbol table have been done.
If pass one succeeds, pass two begins.
Pass two relies on helper methods to emit object code for SIC/XE instructions.
Similarly to pass one, pass two processes the program line by line, and how each line is treated depends first of all on whether it is an instruction or an assembler directive.
For instructions, pass two resolves symbol references using the symbol table generated in pass one.
For format 3 or 4 instructions, the helper method is given not only the source line but also the context that may be needed for assembly, including
-the program's base address
-the current value of the program counter
-the value of the most recent BASE directive, if any.
If one of these methods cannot assemble an instruction (say, because its reference to a symbol cannot be satisfied), an error will be reported and processing will halt.
The result of successful assembly of an instruction is an array of at most 4 bytes.
These bytes are appended to the current "segment" of the output binary. Segments will be discussed later.
BYTE and WORD directives are simple to process in pass two--just emit their arguments to the current segment as if they came from an instruction.
Meanwhile, assembling RESB and RESW instructions is non-trivial in pass two, which motivated the aforementioned "segments".
A segment represents a contiguous string of bytes that will be loaded into the SIC/XE machine's memory at a specified location.
An assembled binary that contains no RESB or RESW directives will have only one segment, but in general, a binary consists of a collection of segments.
When pass two encounters one of these directives, it creates a new segment and considers this the "current segment".
(The current segment is the one that is written to whenever object code is generated.)
The new segment has a base address computed by adding the base address of the previous segment to the size thereof, and then adding the number of bytes to be reserved for RESW or RESB.
The first segment's base address is given by the value of the START directive.
The binary's entry point is given by the END directive. (Segments do not have entry points; only the binary does.)
If the input file does not specify it with an END directive, the entry point is taken to be the program's base address.
This is in keeping with the behavior of UNF's 'sicasm', though I opine a better idea would be to take it to be the first instruction's address.
Finally, when pass two is complete, an OBJ file is written by for the output binary.
The routine to print this file simply iterates over each segment generated in pass two.
The result is an OBJ file just like the ones 'sicasm' produces, except that object code associated with each program line is not separated by a newline if it appears in the same segment.
This absence of newlines is immaterial to the loader.
The object file created will have the path of the input file with ".obj" appended.
The listing file created file will have the path of the input file with ".lst" appended.