From https://stackoverflow.com/questions/30922721/remove-all-characters-which-cannot-be-decoded-in-python read a file in python and skip dealing with unicode errors
with open('filename', 'r', encoding='utf8', errors='ignore') as f: ...