Apr-13-2022, 05:04 PM
I have some generated data files I want to format to XML:
1234=>item1:something11:
something11<COMMA>item4:something12:
12something<END_OF_OBJECT_LINE>
1238=>item8:something12:
something11:<END_OF_OBJECT_LINE>
2345=>item2:something12:
something11:<END_OF_OBJECT_LINE>
123=>item1:something1:
something11<COMMA>item2:something:
11something<COMMA>item4:something:
11something<END_OF_OBJECT_LINE>What I Tried to do is to replace some specified regular expression to make it look like XML: with open("OGfile.data", "r") as f:
with open("tempfile.data", "w") as fo:
# formating file to XML format
contents = f.readlines()
contents.insert(0, "<?xml version='1.0' encoding='UTF-8'?>\n<Module>\n<Object id='")
contents =[w.replace("<END_OF_OBJECT_LINE>\n", "'/>\n</Object>\n<Object id='") for w in contents]
contents =[w.replace("=>", "'>\n <Attribute name='") for w in contents]
contents =[w.replace('<COMMA>', "'/>\n <Attribute name='") for w in contents]
contents =[w.replace(':something', "' value='something") for w in contents]
# saving formated file to new file
contents = "".join(contents)
fo.write(contents)
# fixing invalid last line from formated file with open("tempfile.data", "r") as f2:
with open("finalfile.data", "w") as fo2:
contents2 = f2.readlines()
contents2 = [w.replace("<END_OF_OBJECT_LINE>", "'/>\n</Object>\n</Module>") for w in contents2]
contents2 = "".join(contents2)
fo2.write(contents2)and It works fine, I made it into:<?xml version='1.0' encoding='UTF-8'?>
<Module>
<Object id='1234'>
<Attribute name='item1' value='something11:
something11'/>
<Attribute name='item4' value='something12:
12something'/>
</Object>
<Object id='1238'>
<Attribute name='item8' value='something12:
something11:'/>
</Object>
<Object id='2345'>
<Attribute name='item2' value='something12:
something11:'/>
</Object>
<Object id='123'>
<Attribute name='item1' value='something1:
something11'/>
<Attribute name='item2' value='something:
11something'/>
<Attribute name='item4' value='something:
11something'/>
</Object>
</Module>BUT, there is one problem, I am changing contents =[w.replace(':something', "' value='something") for w in contents] just by taking this value but if it would start with something different instead of "something" i would be doomed. I have been thinking about using regex to take string between "Attribute name:" and "<COMMA>" or "<END_OF_OBJECT_LINE>", but my attemps failed misserably because I am quite new into programming and python. It could be also done much better if I could somehow insert convert this .data file into dictionary and then make it into xml in proper way, but I have no idea how to separate it corretly to dictionary. Do you have any suggestions?
