How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" 

I have some large text files which have such characters and i wish to ignore such characters and proceede with the sentToVec conversion .. I see the below error , please help me fix this .
File "kfold1.py", line 34, in <module>
    model = Sent2Vec(LineSentence(sent_file), model_file=input_file + '.model')
  File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 800, in __init__
    self.reset_sent_vec(sentences)
  File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 809, in reset_sent_vec
    for sent in sentences:
  File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 1113, in __iter__
    yield utils.to_unicode(line).split()
  File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/utils.py", line 190, in any2unicode
    return unicode(text, encoding, errors=errors)
  File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions