Separate Address number and name
Denis McMahon
denismfmcmahon at gmail.com
Wed Jan 22 12:35:22 EST 2014
On Tue, 21 Jan 2014 16:06:56 -0800, Shane Konings wrote:
> The following is a sample of the data. There are hundreds of lines that
> need to have an automated process of splitting the strings into headings
> to be imported into excel with theses headings
>
> ID Address StreetNum StreetName SufType Dir City Province
> PostalCode
Ok, the following general method seems to work:
First, use a regex to capture two numeric groups and the rest of the line
separated by whitespace. If you can't find all three fields, you have
unexpected data format.
re.search( r"(\d+)\s+(\d+)\s+(.*)", data )
Second, split the rest of the line on a regex of comma + 0 or more
whitespace.
re.split( r",\s+", data )
Check that the rest of the line has 3 or 4 bits, otherwise you have an
unexpected lack or excess of data fields.
Split the first bit of the rest of the line into street name and suffix/
type. If you can't split it, use it as the street name and set the suffix/
type to blank.
re.search( r"(.*)\s+(\w+)", data )
If there are 3 bits in rest of line, set direction to blank, otherwise
set direction to the second bit.
Set the city to the last but one bit of the rest of the line.
Capture one word followed by two words in the last bit of the rest of the
line, and use these as the province and postcode.
re.search( r"(\w+)\s+(\w+\s+\w+)", data )
Providing none of the searches or the split errored, you should now have
the data fields you need to write. The easiest way to write them might be
to assemble them as a list and use the csv module.
I'm assuming you're capable of working out from the help on the python re
module what to use for each data, and how to access the captured results
of a search, and the results of a split. I'm also assuming you're capable
of working out how to use the csv module from the documentation. If
you're not, then either go back and ask your lecturer for help, or tell
your boss to hire a real programmer for his quick and easy coding jobs.
--
Denis McMahon, denismfmcmahon at gmail.com
More information about the Python-list
mailing list