当前位置：首页 › 语言学习 › Python语言相关正则定义及其实现

Python语言相关正则定义及其实现

语言学习 • 377 • 0

edisoncgh 2年前

python 编译原理

正文索引 [隐藏]

0、题目重现：
一、Python标识符的常见规范
二、Python语言标识符的正则定义
三、Python语言正则定义的状态转换图
四、Python语言状态转换图的代码思路（基于Python）

Python语言标识符及运算符的正则定义

0、题目重现：

写出Python语言标识符、运算符及标点符号的正则定义，画出状态转换图并用程序实现

一、Python标识符的常见规范

命名规则
1. 由26个英文字母大小写、0-9或下划线组成
2. 数字不可以开头，如：int 3ab = 1; // 错误
3. 不可以使用关键字和保留字，但能包含关鍵字和保留字.
4. 严格区分大小，长度无限制. 如 int totalNum = 10; int n = 90;
5. 不能包含空格，如 int a b = 90; // 错误
命名规范
1. 包名 : aaa.bbb.ccc // 如com.ctgu.cn
2. 类名、接口名：XxxYyyZzz
3. 变量名、方法名 : xxxYyyZzz
4. 常量名: XXX YYY ZZZ

二、Python语言标识符的正则定义

标识符

//包含下划线在内的26个字母大小写
letter -> A|B|C|D|E|...|a|b|c|d|e|...|_ 
// 0-9共计十个数字
digit -> 0|1|2|...|9 
// 由字母或下划线开头的，并由字母、数字及下划线组成的不限长度的非空串
id -> letter(letter|digit)*

无符号数

// 单个数位为0-9任意一个数字
digit -> 0|1|2|...|9
// 正整数，即若干个数字组成的不限长度的非空串
digits -> digit(digit*)
// 可能存在的小数（分数）部分是由小数点"."开头不限长度的可空串
optionalFraction -> . (digits|ε)*
// 可能存在的指数部分是由"E"开头的，正负号居中，不限长度数字结尾的可空串
// Ex即10^x
optionalExponent -> (E(+|-)digits)|ε
// 一个无符号数即由以上定义串连而成
number -> digits optionalFraction optionalExponent

运算符

// 加法运算包括自增运算符
plus -> +|(+|ε)
// 减法运算包括自减运算符
minus -> -|(-|ε)
// 乘法运算包括乘方运算符
times -> *(*|ε)
// 除法运算
divide -> /
// 关系符号
relation -> =|>|<
// 逻辑运算符
logical -> &|!|^||
// 括号
bracket -> )|(|]|[|}|{
// 取模
mod -> %

标点符号

punctuation -> :|,|?|.

三、Python语言正则定义的状态转换图

avatar

四、Python语言状态转换图的代码思路（基于Python）

程序流程图
实现代码

from calendar import IllegalMonthError
import sys
import string
import re

# 文件路径
dir = "test.in"

# 关键字
keywords = [
    'False', 'True',
    'finally','class','return','def','global','yield','lambda',
    'print','sum','int','str',
    'isNone','and','not','with','as','or','except','in',
    'continue','for','from','while','break',
    'elif','if','else',
    'pass','import','assert','except'
]

# 符号
symbols = [
    '+','-','*','/',
    '&','|','!','^',
    '>','<','=',
    '(',')','[',']','{','}',
    ',','?',':','.',
    '\'', '\"'
]

# 状态编码
keywords_size = len(keywords)
symbols_size = len(symbols)
is_variable = keywords_size + symbols_size + 1
is_constant = is_variable + 1
illegal_word = is_constant + 1

def process_irrelevant_words(sourcefile):
    sourcefiletext = sourcefile.read()
    sourcefiletext = sourcefiletext.replace("\n", " ")
    wordsDict = sourcefiletext.split(" ")
    return wordsDict

def classification_words(wordsDict):
    result = []
    for word in wordsDict:
        if word in keywords:
            result.append(standardize(word, keywords.index(word)))
        elif word in symbols:
            result.append(standardize(word, keywords_size + symbols.index(word)))
        else:
            for key in keywords:
                while key in word:
                    result.append(standardize(key, keywords.index(key)))
                    word = word.replace(key, "", 1)

            for symbol in symbols:
                while symbol in word:
                    result.append(standardize(symbol, keywords_size + symbols.index(symbol)))
                    word = word.replace(symbol, "", 1)

            if re.findall(re.compile(r'[A-Za-z]',re.S),word):
                if word[0].isdigit():
                    result.append(standardize(word, illegal_word))
                else:
                    result.append(standardize(word, is_variable))
            else:
                if word != '':
                    result.append(standardize(word, is_constant))

    return result

def standardize(letters, state):
    if state < keywords_size:
        return "keyword:" + '(' + str(state) + ', ' + letters + ')'
    elif state < symbols_size + keywords_size:
        return "symbol:" + '(' + str(state) + ', ' + letters + ')'
    elif state == is_variable:
        return "variable:" + '(' + str(state) + ', ' + letters + ')'
    elif state == is_constant:
        return "constant:" + '(' + str(state) + ', ' + letters + ')'
    else:
        return letters + " is an illegal token!" 

def print_result(res):
    for el in res:
        print(el)

def main():
    with open(dir) as file:
        wd = process_irrelevant_words(file)
        res = classification_words(wd)
        print_result(res)

if __name__ == '__main__':
    main()

测试结果
测试样例

1abcd
import
for i in abcde
print(x)
if:
elif:
else
(a > b) ? a = b : b = a

测试结果

1abcd is an illegal token!
variable:(56, abcde)
keyword:(8, print)
symbol:(43, ()
symbol:(44, ))
variable:(56, x)
keyword:(26, if)
symbol:(51, :)
keyword:(25, elif)
symbol:(51, :)
keyword:(27, else)
symbol:(43, ()
variable:(56, a)
symbol:(40, >)
symbol:(44, ))
variable:(56, b)
symbol:(50, ?)
variable:(56, a)
symbol:(42, =)
variable:(56, b)
symbol:(51, :)
variable:(56, b)
symbol:(42, =)
variable:(56, a)

0 赏

请作者吃个鸡腿！

作者尚未添加打赏二维码！

‹ 上一篇

下一篇 ›

Python语言相关正则定义及其实现

Python语言标识符及运算符的正则定义

0、题目重现：

一、Python标识符的常见规范

二、Python语言标识符的正则定义

三、Python语言正则定义的状态转换图

四、Python语言状态转换图的代码思路（基于Python）

作者尚未添加打赏二维码！

相关文章

评论

with sifour for:

文章

最近访客

评论