Jul-03-2020, 08:43 PM
How do I get full XPath extract using Python?
=====================================
Thanks for your response to my threads.
I am trying to use the pyhton code below. I am getting Abbreviated XPATH instead of FULL xpath.
What are the changes required to the code to get FULL XPATH?
When grab the XPATH of Node using xml_grep, I am getting.
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
What are the changes required to the code to get FULL XPATH?
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.
=====================================
Thanks for your response to my threads.
I am trying to use the pyhton code below. I am getting Abbreviated XPATH instead of FULL xpath.
What are the changes required to the code to get FULL XPATH?from lxml import etree, objectify
def parseXML(xmlFile, outputFile):
"""
Parse the XML function
"""
with open(xmlFile) as fobj:
xml = fobj.read()
f = open(outputFile,'w') #open write to file
root = etree.fromstring(xml)
f.write("%s|%s\n" %("Field", "Value"))
tree = etree.ElementTree(root)
for e in root.iter():
f.write("%s|%s\n" %(tree.getpath(e), e.text))
f.close()
if __name__ == "__main__":
print ('Loading variables...')
input = 'inputf.xml'
output = input + '.csv'
parseXML(input,output)I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in above posted code.Output: INPUTXML
<?xml version="1.0" encoding="UTF-8"?>
<DataFileFor>
<DataR>
<Id>5070022019330a0050hq</Id>
<NUM>30221730001019</NUM>
<Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
<TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>++++When grab the XPATH of Node using xml_grep, I am getting.
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
Output:xml_grep DataFileFor/DataR/Ret/W2 inputf.xml
<?xml version="1.0" ?>
<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">
<file filename="inputf.xml">
<W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">
<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>S</EmployerNameControlTxt>
<EmployerName>
<BusinessNameLine1Txt>String</BusinessNameLine1Txt>
<BusinessNameLine2Txt>String</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
<AddressLine1Txt>String</AddressLine1Txt>
<AddressLine2Txt>String</AddressLine2Txt>
<CityNm>String</CityNm>
<StateAbbreviationCd>AL</StateAbbreviationCd>
<ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>When I use this code, it is producing Abbreviated Xpaths instead of full XPath. The output XPATHS are likeOutput:[output]/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String[/output]What are the changes required to the code to get FULL XPATH?
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.
