I found a CSV file created by Crystal Reports that had double-quoted fields, some of which contained commas between the quotes. We don't want to split at those commas, so the regular string.split(',') command wouldn't work. The code goes through the string twice: once to find the splice values for the various fields, and then a second time to do the splicing.
def splitStringWithBadlyFormedCSV(aString):
"""
Return list of fields in aString, splitting by commas while ignoring
commas that appear within double-quoted strings.
"""
ignore = False
splices = list()
anchor = 0
for cursor in range(len(aString)):
character = aString[cursor]
# Flag toggles each time it encounters a double quote.
ignore = not ignore if character == '"' else ignore
if character == ',' and not ignore:
splices.append((anchor, cursor))
# We want to drop anchor 1 past the comma.
anchor = cursor + 1
# Last field.
splices.append((anchor, len(aString)))
# Now we can strip out the stupid double quotes in each field.
fields = [aString[start: end].replace('"', '') for start, end in splices]
return fields