(字句解析ツール)
http://www.sw.it.aoyama.ac.jp/2018/Compiler/lecture5.html
© 2005-18 Martin J. Dürst 青山学院大学
flex
flex
exercisesConstruct the state transition diagram for the NFA corresponding to the
following grammar
S → εA | bB | cB | cC, A → bC | aD | a | cS, B → aD | aC | bB | a, C
→εA | aD | a
(Caution: In right linear grammars, ε is not allowed except in the rule S →
ε)
(Hint: Create a new accepting state F)
Convert the result of last week's homework 3 (after rewriting, see this
handout) to a right linear grammar
Construct the state transition diagram for the regular expression
ab|c*d
(write down both the result of the procedure explained during this lecture
(with all ε transitions) as well as a version that is as simple as
possible)
Bring your notebook PC (with flex
, bison
,
gcc
, make
, diff
, and m4
installed and usable)
These all have the same power, describe/recognize regular languages, and can be converted into each other.
getNextToken()
)Choices:
flex
)flex
lex
, with various extensionslex
: Lexical analyzer generator available with Unix
(creator: Mike Lesk)bison
(reminder)
ls
: list the files in a directorymkdir
: create a new directorycd
: change (working) directorypwd
: print (current) working directorygcc
: compile a C program./a
: execute a compiled program(reminder)
C:\cygwin
C:\cygwin
can be reachedC:\cygwin\home\user1
/home/user1
with
pwd
C:\cygwin
:cd /cygdrive/c
(change directory to the C drive of MS
Windows)flex
Usage Stepsflex
(a (f)lex file), with the
extension .l
(example: test.l
)flex
to convert test.l
to a C program:flex test.l
lex.yy.c
)lex.yy.c
with a C compiler (maybe together with
other files)flex
Call the yylex()
function once from the main
function
Repeatedly call yylex()
from the parser, and return a token
with return
In today's exercises and homework, we will use 1.
In the second half of this course, we will use 2. together with
bison
flex
Input Formatint num_lines = 0, num_chars = 0; %% \n ++num_lines; ++num_chars; . ++num_chars;
%% int main(void) { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } int yywrap () { return 1; }
flex
Exercise 1Process and execute the flex
program on the previous slide
test.l
and copy the contents of the previous
slide to the filelex.yy.c
withflex test.l
a.exe
withgcc lex.yy.c
./a <file
flex
Input Formatdeclarations,... (C program language)
declarations,... (C program language)
%%
regexp statement (C program language)
regexp statement (C program language)
%%
functions,... (C program language)
functions,... (C program language)
flex
Input FormatMixture of flex
-specific instructions and C program
fragments
Three main parts, separated by two %%
:
#include
s, #define
sNewlines and indent can be significant!
flex
Caution: A manual is not a novel.
flex
(lex.yy.c
)flex
for different inputsflex
(flex
also uses
lexical analysis, which is written using flex
)flex
-v
)flex
Works.l
file).l
fileflex
Works.l
fileflex
Exercise 2The table below shows how to escape various characters in XML
Create a program in flex
(for this conversion, and) for the
reverse conversion
Raw text | XML escapes |
---|---|
' |
' |
" |
" |
& |
& |
< |
< |
> |
> |
flex
Exercise 3: Detect NumbersCreate a program with flex
to output the input without changes,
except that numbers are enclosed with >>>
and
<<<
Example input:
abc123def345gh
Example output:
abc>>>123<<<def>>>345<<<gh
Hint: The string recognized by a regular expression is available with the
variable yytext
flex
Exercise 4 (Homework):Deadline: May 31, 2017 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
(start early, so that you can ask questions on May 25)
Format:
.l
fileCollaboration: The same rules as for Computer Practice I (計算機実習 I) apply
flex
Exercise 4 (Homework):ISO 8601 is the standard format for dates and times (see Wikipedia article).
Detect dates and times in ISO 8601 format, and output them one line per item.
As an example, if you find 2018-05-18
in the input, then output it
as:
year: 2018, month: 5, day: 18
Make sure your program ignores impossible dates (e.g. 2018-02-29, 2018-04-31, 2018-05-32,...). NEW for future: USE regular expressions to distinguish impossible dates; USE C ONLY for output.
Detect all the date and time formats in the table below.
形式 | 項目 | 実例 |
YYYY-MM-DD | year/month/day | 2014-05-09 |
YYYY-MM | year/month | 2014-05 |
YYYY | year | 2014 |
YYYY-DDD | year/day | 2014-094 |
YYYY-Www-D | year/week/weekday | 2014-W15-3 |
HH:MM:SS | hour/minute/second | 12:23:47 |
HH:MM | hour/minute | 12:23 |
Advanced exercise (発展問題、加点): Also detect other ISO 8601 formats (combinations of dates and times, times with timezones, durations)
flex
.l
file, flex
may
still run without errors
Solution: Always start with flex
:
> flex file.l && gcc lex.yy.c && ./a
<input.txt
.l
file (before the first
%%
), C program fragments have to be indented by at least one
space int yywrap () { return 1; }
yytext
putchar(yytext[0]);
will output the first character
of the matched text\
or quoted within ""
There will be a minitest (ca. 30 minutes) next week. Please prepare well!