Skip to content

🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 #6

@Grifcc

Description

@Grifcc

🐛 正则匹配并不能匹配到正确的词
♐ 这样可以,不知再有没有更好的解决方式

def textParse(bigString):
    # 用特殊符号作为切分标志进行字符串切分,即非字母、非数字
    # \W* 0个或多个非字母数字或下划线字符(等价于[^a-zA-Z0-9_])
    bigString=bigString.split()
    listOfTockens=[]
    for i in range(len(bigString)):
        listOfTockens.append(str("".join(list(filter(str.isalpha, bigString[i])))))
    # 除了单个字母,例如大写I,其他单词变成小写,去掉少于两个字符的字符串
    return [tok.lower() for tok in listOfTockens if len(tok) > 2]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions