(字句解析ツール)
http://www.sw.it.aoyama.ac.jp/2017/Compiler/lecture5.html
© 2005-17 Martin J. Dürst 青山学院大学
flex
flex
exercisesBring your notebook PC (with flex
, bison
,
gcc
, make
, diff
, and m4
installed and usable)
These all have the same power, describe/recognize regular languages, and can be converted into each other.
getNextToken()
)Choices:
flex
)flex
lex
, with various extensionslex
: Lexical analyzer generator available with Unix
(creator: Mike Lesk)bison
(reminder)
ls
: list the files in a directorymkdir
: create a new directorycd
: change (working) directorypwd
: print (current) working directorygcc
: compile a C program./a
: execute a compiled program(reminder)
C:\cygwin
C:\cygwin
can be reachedC:\cygwin\home\user1
/home/user1
with
pwd
C:\cygwin
:cd /cygdrive/c
(change directory to the C drive of MS
Windows)flex
Usage Stepsflex
(a (f)lex file), with the
extension .l
(example: test.l
)flex
to convert test.l
to a C program:flex test.l
lex.yy.c
)lex.yy.c
with a C compiler (maybe together with
other files)flex
Call the yylex()
function once from the main
function
Repeatedly call yylex()
from the parser, and return a token
with return
In today's exercises and homework, we will use 1.
In the second half of this cours, we will use 2. together with
bison
flex
Input Formatint num_lines = 0, num_chars = 0; %% \n ++num_lines; ++num_chars; . ++num_chars;
%% int main(void) { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } int yywrap () { return 1; }
flex
Exercise 1Process and execute the flex
program on the previous slide
test.l
and copy the contents of the previous
slide to the filelex.yy.c
withflex test.l
a.exe
withgcc lex.yy.c
./a <file
flex
Input Formatdeclarations,... (C program language)
declarations,... (C program language)
%%
regexp statement (C program language)
regexp statement (C program language)
%%
functions,... (C program language)
functions,... (C program language)
flex
Input FormatMixture of flex
-specific instructions and C program
fragments
Three main parts, separated by two %%
:
#include
s, definition and initialization of global
variables, definition of regular expression componentsNewlines and indent can be significant!
flex
Caution: A manual is not a novel.
flex
(lex.yy.c
)flex
for different inputsflex
(flex
also uses
lexical analyzis, which is written using flex
)flex
-v
)flex
flex
:
.l
file.l
fileflex
Processes its Input.l
fileflex
Exercise 2The table below shows how to escape various characters in XML
Create a program in flex
(for this conversion, and) for the
reverse conversion
Raw text | XML escapes |
---|---|
' |
' |
" |
" |
& |
& |
< |
< |
> |
> |
flex
Exercise 3: Detect NumbersCreate a program with flex
to output the input without changes,
except that numbers are enclosed with >>>
and
<<<
Example input:
abc123def345gh
Example output:
abc>>>123<<<def>>>345<<<gh
Hint: The string recognized by a regular expression is available with the
variable yytext
flex
Exercise 4 (Homework):Deadline: May 25, 2017 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
(start early, so that you can ask questions on May 19)
Format:
.l
fileCollaboration: The same rules as for Computer Practice I (計算機実習 I) apply
flex
Exercise 4 (Homework):Using flex
, create a program for lexical analysis of Ruby
programs. Output the tokens below on a single line.
kind | explanation | example | output (example) |
instance variable | starts with @ | @abc |
Instance variable: @abc |
class variable | starts with @@ | @@def |
Class variable: @@def |
method name | ends with single !, ?, or = | nil? |
Method: nil? |
method or local variable | starts with lower-case letter | my_var |
Method or variable: my_var |
global variable | starts with $ | $global |
Global variable: $global |
constant | starts with upper-case letter | String |
Constant: String |
integer (decimal) | starts with 0d, 0D, or any digit except 0 | 1_234 |
Decimal Integer: 1234 |
integer (octal) | starts with 0 | 0765 |
Octal Integer: 501 |
integer (hexadecimal) | starts with 0x or 0X | 0xAB_D3 |
Hex Integer: 43987 |
integer (binary) | starts with 0b or 0B | 0b1010_1100 |
Binary Integer: 172 |
comment | from # to end of line | # hello |
Comment: hello |
[error] | all other single characters except whitespace | \ |
Error: '\' |
Advanced problem: Deal with other Ruby lexical tokens (e.g. operators, strings, regular expressions, floating point numbers,...).
flex
.l
file, flex
may
still run without errors
Solution: Always start with flex
:
> flex file.l && gcc lex.yy.c && ./a
<input.txt
.l
file (before the first
%%
), C program fragments have to be indented by at least one
space int yywrap () { return 1; }
yytext
putchar(yytext[0]);
(output the first character of
the matched text)\
or quoted within ""
There will be a minitest (ca. 30 minutes) next week. Please prepare well.