Advance Compiler 2004 Fall
2004/09/15
- parallelizing compilter ⇒ seq input, output parallel intrucstion
- vector compiler
- GCC 3.3.1
- make bootstrap
- 3 stage, do the same thing.
- Stage 1: old gcc → new gcc(1)
- Stage 2: new gcc(1) → new gcc(2)
- Stage 3: new gcc(2) → new gcc(3)
- Only if 2 & 3 same, do make all
- Generate machine-dependent files: gen*.o, feed *.md, output insn-*.[ch]
- cpp
- libbackend
- cc1, language frontend
- xgcc → (gcc)
- cpp0
- Internals
- C → tree → RTL → assembly (Chap 7,8)
- flow of parser in Chap6
- trace
# TO tag FROM line in file/text 1 1 toplev_main 35 main.c 2 1 do_compile 5414 toplev.c 3 1 compile_file 5384 toplev.c 4 1 lang_hooks 2130 toplev.c 5 1 c_common_parse_file 57 c-lang.c 6 1 yyparse 159 c-lex.c 7 1 fndef 357 c-parse.y 8 1 finish_function 403 c-parse.y
- requirement
- clchen teaches half
- other half, report one part of gcc (include binutils…)
- no exam, grades from in class report / indivisual
- homework
- modify main() in main.c XD
Installation GCC 3.3.1 on FreeBSD 6-CURRENT (Sep 18)
- cd gcc-3.3.1
- mkdir build && cd build (the ports way)
- ../configure –prefix=/home/rafan/tmp/gcc –disable-nls –with-system-zlib –enable-languages=c
- make bootstrap (native compiler)
- make install
It's very fast (about 10mins) on my X31 (1.4G). To define a compile-time variable, I modified the CFLAGS, {BOOT,STAGE1}_CFLAGS. (The 3-stage bootstrap is very complicated, normal make -DBLAH doesn't work.)
2004/09/22
- gcc.c
- use gccspec to read args
- fork cpp (gccmain.c)
- fork cc1 (main.c)
- frontend
- machine-independent, also language-dependent
- backend
- machine-dependent
- main.c:5380 backend_init()
- get token path
# TO tag FROM line in file/text 1 1 toplev_main 43 gcc/main.c 2 1 do_compile 5414 gcc/toplev.c 3 1 compile_file 5384 gcc/toplev.c 4 1 yyparse 162 yyparse (); 5 1 yyparse 162 yyparse (); 6 1 YYLEX 4658 gcc/c-parse.c 7 1 yylex 2102 gcc/c-parse.c 8 1 _yylex 5918 gcc/c-parse.c 9 1 c_lex 5810 gcc/c-parse.c 9 1 last_token 5810 gcc/c-parse.c 10 1 cpp_ttype 5425 gcc/c-parse.c 11 1 TTYPE_TABLE 148 TTYPE_TABLE 12 1 TTYPE_TABLE 148 TTYPE_TABLE 10 1 cpp_get_token 680 tok = cpp_get_token (parse_in);
- -fdump-transition-unit
- finish_function: tree fndecl = current_function_decl;
- finish_fname_decls, finish_stmt_tree modify tree
- nested! in finish_function
- c_expand_body generates rtl from tree
- history: flag_inline_trees (tree_inline.c), good point to modify tree here
- homework
- trace yyparse(), not an easy task
2004/09/29
- tree (Internal Chap 7)
- tree.def
- DEFTREECODE, macro (tree.h)
#define DEFTREECODE(SYM, STRING, TYPE, NARGS) SYM, enum tree_code { #include "tree.def" LAST_AND_UNUSED_TREE_CODE /* A convenient way to get a value for NUM_TREE_CODE. */ }; #undef DEFTREECODE
-
-
- c-lang.c
-
#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE, const char tree_code_type[] = { #include "tree.def" 'x', #include "c-common.def" }; #undef DEFTREECODE
- statement is NOT handled by tree.def (language-indepenent), it's in c-common.def
enum c_tree_code { C_DUMMY_TREE_CODE = LAST_AND_UNUSED_TREE_CODE, // "link" to tree_node !! #include "c-common.def" LAST_C_TREE_CODE };
-
- tree_node (tree.h), union of 13 kinds of struct
- array is put in the bottom in a structure. If we want more than 1, just allocate enough space.
struct tree_exp GTY(()) { struct tree_common common; int complexity; tree GTY ((special ("tree_exp"), desc ("TREE_CODE ((tree) &%0)"))) operands[1]; };
-
- tree.c
- make_node
- homeowork
- precedence in yacc
2004/10/06
- tree op0 is the right one, op1 is the left one
- build_stmt → make_node → ggc_alloc_tree build 'if' node
- c_expand_start_cond puts 'cond' body by IF_COND (if_stmt) = cond;
- c_common_truthvalue_conversion # convert true/false value here!
- optimization, -d x, generates file.\d\d.xxxx (dump RTL) or -d a
- RTL
- c_expand_body
- init_function_start
- expand_function_start
- walk_tree
- expand_main_function
- expand_stmt
- *lang_expand_function_end
- expand_function_end
- RTX (type # ptr_prev# ptr_next#
#define RTX_CODE enum rtx_code enum rtx_code { #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS) ENUM , #include "rtl.def" /* rtl expressions are documented here */ #undef DEF_RTL_EXPR LAST_AND_UNUSED_RTX_CODE}; /* A convenient way to get a value for NUM_RTX_CODE. Assumes default enum value assignment. */
- rtl.c: rtx_format, rtl.h: rtl_def, macro, rtl.def
- homework
- precedence (yacc)
Each state contains a action * which is a linked list. In remove_conflict() called by make_parser() utilizes the precdence and association information which comes from symbol to determime whethere this action is applied first (SHIFT action)
In details, it uses a (action *) pref to record preferred action and (action *) p for current action. The loop is 2 level, first level is for each states, second is for each actions in that state. When it found an action with higher precedence, it modifies oldaction→suppressed to 2, i.e., the new action is dominated. (suppressed is used to indicated if this action is suppressed.)
yydefred, defreds()→ sole_reduction(), in this function, this handles '.' reduction, as the name ``sole'' implies.
-
- precedence (bison)
Oh, bison's source is more clear than yacc.
In main(), it calls conflicts_solve() → set_conflicts(state), it calls resolve_sr_conflict(state) (only if the action has prec) to solve shift/reduce conflicts by precedence and associcatively. Inside it, it first finds the current reduction action for this state then if a shift has higher prec. Once the solution is out, it calls log_resolution(cur_reduction, which action, shift/reduce/right/left/…) to record the solution and records curresponding things.
- read rtx
2004/10/13
- c_expand_body → expand_stmt(), tree travesel, linked list
- expand_expr enters machine dependent routines here: (there is a op table in assembly)
this_optab = ! unsignedp && flag_trapv && (GET_MODE_CLASS(mode) == MODE_INT) ? subv_optab : sub_optab; goto binop2; expand_binop()
- i386.md
- define_insn, a tree node → a RTL expr (one rtx), also describes RTL → assembly
- define_expand, a tree node → RTL exprs (many rtx)
- define_split, rtx → many rtx (optimization)
- deinfe_peephole2, many rtx → many rtx (optimization)
- define_insn_and_split, tree → one rtx, rtx → assembly (many rtx)
- homework
- i386.md (20K lines!!)
2004/10/20
- gensupport.c, readctl.c support for common routines for gen*
- gencodes.c:
- pattern matters!
- c_expand_body; rest_of_compilation() to do optimize timevar_push/pop to measure elapsed time in each optimization
2004/10/27
- Topics
- yacc? XD
- profiling?
- 1st level RTX
- INSN
- JUMP_INSN
- CALL_INSN
- BARRIER
- CODE_LABEL
- NOTE
- Basic block
- data structure is the key
- node, edge, traverse
- no branch inside!
2004/11/03
(define_expand "addsi3" [(parallel [(set (match_operand:SI 0 "nonimmediate_operand" "") (plus:SI (match_operand:SI 1 "nonimmediate_operand" "") (match_operand:SI 2 "general_operand" ""))) (clobber (reg:CC 17))])]
- peephole optimization
- modify particular pattern in a small window (2-3 instructions)
- constant folding, multiple → shift, null op
- use decision tree to match all patterns (i386 has 64 peephole2)
- use basic block as the small window
- recog.c, insn-recog.c, insn-emit.c
2004/11/10
- rtx → asm
- md describs how to convert a rtx pattern to a output asm pattern
2004/11/24
if conversion
- When?
- one of then or else part is empty
- condition can be decided in compile time
- same condition in nested if
- How?
- control dependency → data dependency
- Implemetation
- use ce_if_block struct to store then, else part
- convert until convengence
- find if → can convert? → do convert
sibling call
- tail call / tail recursion
- different to inline is that you don't do return ( return ( return … ) ) )
- use call_placeholder to decide to use sibiling call or tail recursion optimization
- parameter # is matter (since caller's frame size have to be larger than callee's)
- RTFS: toplev.c: static const lang_independent_options f_options[] =
2004/12/01
Loop Optimization
- terms
- basic induction variable, X = X + c or X = X - c
- induction variable A = a * B + b, B is basic induction variable.
- strength-reduction
- loop unrolling
Control Flow Graph (CFG) and Jump Optimization
- copy loop header, we can save one bb
- no new DS, lots of condition (no pattern, dirty code)
2004/12/08 profiling
- about 60mins
- topics (tentative)
- what's profiling
- how to use / example
- Chapter 9, implementation (rtx?)
- import point
- data structure
- before/after rtx !!
2004/12/29
- porting to .Net
- generate CIL code (intermediate language)