正文索引 [隐藏]

Python语言标识符及运算符的正则定义

0、题目重现:

写出Python语言标识符、运算符及标点符号的正则定义,画出状态转换图并用程序实现

一、Python标识符的常见规范

  1. 命名规则
    1. 由26个英文字母大小写、0-9或下划线组成
    2. 数字不可以开头,如:int 3ab = 1; // 错误
    3. 不可以使用关键字和保留字,但能包含关鍵字和保留字.
    4. 严格区分大小,长度无限制. 如 int totalNum = 10; int n = 90;
    5. 不能包含空格,如 int a b = 90; // 错误
  2. 命名规范
    1. 包名 : aaa.bbb.ccc // 如com.ctgu.cn
    2. 类名、接口名 :XxxYyyZzz
    3. 变量名、方法名 : xxxYyyZzz
    4. 常量名: XXX YYY ZZZ

二、Python语言标识符的正则定义

  1. 标识符
//包含下划线在内的26个字母大小写
letter -> A|B|C|D|E|...|a|b|c|d|e|...|_ 
// 0-9共计十个数字
digit -> 0|1|2|...|9 
// 由字母或下划线开头的,并由字母、数字及下划线组成的不限长度的非空串
id -> letter(letter|digit)* 
  1. 无符号数
// 单个数位为0-9任意一个数字
digit -> 0|1|2|...|9
// 正整数,即若干个数字组成的不限长度的非空串
digits -> digit(digit*)
// 可能存在的小数(分数)部分是由小数点"."开头不限长度的可空串
optionalFraction -> . (digits|ε)*
// 可能存在的指数部分是由"E"开头的,正负号居中,不限长度数字结尾的可空串
// Ex即10^x
optionalExponent -> (E(+|-)digits)|ε
// 一个无符号数即由以上定义串连而成
number -> digits optionalFraction optionalExponent
  1. 运算符
// 加法运算包括自增运算符
plus -> +|(+|ε)
// 减法运算包括自减运算符
minus -> -|(-|ε)
// 乘法运算包括乘方运算符
times -> *(*|ε)
// 除法运算
divide -> /
// 关系符号
relation -> =|>|<
// 逻辑运算符
logical -> &|!|^||
// 括号
bracket -> )|(|]|[|}|{
// 取模
mod -> %
  1. 标点符号
punctuation -> :|,|?|.

三、Python语言正则定义的状态转换图

avatar

四、Python语言状态转换图的代码思路(基于Python)

  1. 程序流程图
    avatar
  2. 实现代码
from calendar import IllegalMonthError
import sys
import string
import re

# 文件路径
dir = "test.in"

# 关键字
keywords = [
    'False', 'True',
    'finally','class','return','def','global','yield','lambda',
    'print','sum','int','str',
    'isNone','and','not','with','as','or','except','in',
    'continue','for','from','while','break',
    'elif','if','else',
    'pass','import','assert','except'
]

# 符号
symbols = [
    '+','-','*','/',
    '&','|','!','^',
    '>','<','=',
    '(',')','[',']','{','}',
    ',','?',':','.',
    '\'', '\"'
]

# 状态编码
keywords_size = len(keywords)
symbols_size = len(symbols)
is_variable = keywords_size + symbols_size + 1
is_constant = is_variable + 1
illegal_word = is_constant + 1

def process_irrelevant_words(sourcefile):
    sourcefiletext = sourcefile.read()
    sourcefiletext = sourcefiletext.replace("\n", " ")
    wordsDict = sourcefiletext.split(" ")
    return wordsDict

def classification_words(wordsDict):
    result = []
    for word in wordsDict:
        if word in keywords:
            result.append(standardize(word, keywords.index(word)))
        elif word in symbols:
            result.append(standardize(word, keywords_size + symbols.index(word)))
        else:
            for key in keywords:
                while key in word:
                    result.append(standardize(key, keywords.index(key)))
                    word = word.replace(key, "", 1)

            for symbol in symbols:
                while symbol in word:
                    result.append(standardize(symbol, keywords_size + symbols.index(symbol)))
                    word = word.replace(symbol, "", 1)

            if re.findall(re.compile(r'[A-Za-z]',re.S),word):
                if word[0].isdigit():
                    result.append(standardize(word, illegal_word))
                else:
                    result.append(standardize(word, is_variable))
            else:
                if word != '':
                    result.append(standardize(word, is_constant))

    return result

def standardize(letters, state):
    if state < keywords_size:
        return "keyword:" + '(' + str(state) + ', ' + letters + ')'
    elif state < symbols_size + keywords_size:
        return "symbol:" + '(' + str(state) + ', ' + letters + ')'
    elif state == is_variable:
        return "variable:" + '(' + str(state) + ', ' + letters + ')'
    elif state == is_constant:
        return "constant:" + '(' + str(state) + ', ' + letters + ')'
    else:
        return letters + " is an illegal token!" 

def print_result(res):
    for el in res:
        print(el)

def main():
    with open(dir) as file:
        wd = process_irrelevant_words(file)
        res = classification_words(wd)
        print_result(res)

if __name__ == '__main__':
    main()
  1. 测试结果
    测试样例
1abcd
import
for i in abcde
print(x)
if:
elif:
else
(a > b) ? a = b : b = a

测试结果

1abcd is an illegal token!
variable:(56, abcde)
keyword:(8, print)
symbol:(43, ()
symbol:(44, ))
variable:(56, x)
keyword:(26, if)
symbol:(51, :)
keyword:(25, elif)
symbol:(51, :)
keyword:(27, else)
symbol:(43, ()
variable:(56, a)
symbol:(40, >)
symbol:(44, ))
variable:(56, b)
symbol:(50, ?)
variable:(56, a)
symbol:(42, =)
variable:(56, b)
symbol:(51, :)
variable:(56, b)
symbol:(42, =)
variable:(56, a)