I have a foreign language to English dictionary that I'm trying to import into a sql database. This dictionary is in a text file and the lines look like this:
field1 field2 [romanization] /definition 1/definition 2/definition 3/
I'm using regex in python to identify the delimiters. So far I've been able to isolate every delimiter except for the space in-between field 1 and field 2.
(?<=\S)\s\[|\]\s/(?=[A-Za-z])|/
#(?<=\S)\s\[ is the opening square bracket after field 2
#\]\s/(?=[A-Za-z]) is the closing square bracket after the romanization
#/ is the forward slashes in-between definitions.
#????????? is the space between field 1 and field two
Aucun commentaire:
Enregistrer un commentaire