Lab 10 - RegExp

Lab Session #10 #

Agenda #

  1. RegExp to (N)FSA: Thompson’s Construction
  2. FSA to RegExp: Kleene’s Algorithm

From Regular Expression to (N)FSA. #

The Thompson’s Construction #

  • It is an algorithm for transforming a regexp into an equivalent (N)FSA.

  • I This (N)FSA can be used to match strings against the regular expression.


The Algorithm #

  • The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the (N)FSA will be constructed using a set of rules (see below).

Rule: the Empty Expression #

The empty-expression \(ϵ\) is converted to:

1


Rule: a symbo #

A symbol a of the input alphabet is converted to

2


Rule: Concatenation Expression #

The concatenation expression st is converted to

3

N(s) and N(t) are the (N)FSA of the subexpression s and t, respectively.


Rule: Union Expression #

The union expression s|t is converted to

4

N(s) and N(t) are the (N)FSA of the subexpression s and t, respectively.


Rule: Kleene Star Expression #

The Kleene star expression \(s ^∗\) is converted to

5

N(s) is the (N)FSA of the subexpression s.


Example 1 #

Build a (N)FSA for \((1 | 01)^*\)

Solution:


Exercises #

Build a (N)FSA for:

  1. \(01^*\)
  2. \((0 | 1)01\)
  3. \(00(0 | 1)^*\)

Solution. #


FSA to RegExp #

Kleene’s Algorithm: from FSA to Regular Expression #

It transforms a given finite state automaton (FSA) into a regular expression. Description: Given an FSA \( M = (Q, A, δ, q_0, F)\) , with

\(Q = {q0, . . . , qn}\) its set of states, the algorithm computes

  • the sets \(R^k_{ij}\) of all strings that take M from state \(q_{ij}\) to \(q_j\) without going through any state numbered higher than k.

  • each set \(R^k_{ij}\) represented by a regular expression.

  • the algorithm computes them step by step for \(k = −1, 0, . . . , n.\)

  • since there is no state numbered higher than n, the regular expression \(R^n_{0j}\) represents the set of all strings that take M from its start state \(q_i\) to \(q_j\) .

  • If \(F = {q_i , . . . , q_f }\) is the set of accept states, the regular expression \(R^ n _{0i} | . . . | R^ n _{0f}\) represents the language accepted by M.


Kleene’s Algorithm #

The initial regular expressions, for k = −1, are computed as

  • \(R^ {−1} _{ij} = a_1 | . . . | a_m\) if \(i = j\) ,where \(δ (q_i , a_1) = . . . = δ (q_i , a_m) = q_j\)

  • \(R^ {−1} _{ij} = a_1 | . . . | a_m|ϵ\) if \(i = j\) ,where \(δ (q_i , a_1) = . . . = δ (q_i , a_m) = q_j\)

After that, in each step the expressions \(R^k_il\) are computed from the previous ones by

\(R^k_{ij}\) = \(R^{k-1}_{ik}\) \((R^{k-1}_{kk})^*\) \(R^{k-1}_{kj}\) | \(R^{k-1}_{ij}\)


Kleene’s Algorithm: Example #

Give a regular expression that describes the language accepted by:

11

Initial Regular Expression (Step -1)

\(R^{-1}_{00}\) = 0 | \(ϵ\)

\(R^{-1}_{01}\) = 1

\(R^{-1}_{10}\) = 0

\(R^{-1}_{11}\) = \(ϵ\)

Step 0

\(R^{0}_{00}\) = \((0 |ϵ)\) \((0 |ϵ)^*\) \((0 |ϵ)\) | \((0 |ϵ)\) = \(0^*\)

\(R^{0}_{01}\) = \((0 |ϵ)\) \((0 |ϵ)^*\) 1 |1= \(0^*1\)

\(R^{0}_{10}\) = 0 \((0 |ϵ)^*\) \((0 |ϵ)\) | 0 = \(00^*\)

\(R^{0}_{11}\) = 0 \((0 |ϵ)^*\) 1 | \(ϵ\) = \(00^*1\) | \(ϵ\)

step 1

\(R^1_{00}\) = \((0^*1)\) \((00^*1| ϵ)^*\) \((00^*)|0^*\) = \((0^*1)\) \((00^*1)^*\) \((00^*)|0^*\)

Do we need to compute the rest? (i.e \(R^{1}_{01},R^{1}_{10},R^{1}_{11}\) )


Exercises #

Give a regular expression that describes the language accepted by:

16

solution #


Homework #

  1. Give a regular expression that describes the language accepted by:

18

  1. Build a (N)FSA for: \((11)^*(0 | 1)\)