6

I am trying to run the following command

import nltk
nltk.download('all')

But I am getting this error

Traceback (most recent call last):
  File "./update.py", line 3, in <module>
    nltk.download('all')
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 664, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 534, in incr_download
    try: info = self._info_or_id(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 508, in _info_or_id
    return self.info(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 875, in info
    self._update_index()
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 825, in _update_index
    ElementTree.parse(compat.urlopen(self._url)).getroot())
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

I am new to python, so I am not really sure what should I do. I looked into the source module reported above and noticed that it is trying to download the xml file. So i ran the below command and did not give me any error.

compat.urlopen('https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml')

So I presume there is no issue in the download, but in the parser. Can someone suggest how do I proceed from here?

3
  • Same problem here Commented Apr 14, 2017 at 13:48
  • I also got this problem Commented Apr 14, 2017 at 14:53
  • Started happening a few hours ago with me Commented Apr 14, 2017 at 15:10

2 Answers 2

6

index.xml had a typo. It is already patched. Just checked and nltk.download('all') works fine!

see: nltk/nltk_data#70

Sign up to request clarification or add additional context in comments.

Comments

1

The problem is with the XML that NLTK has returned.

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

At 23:143 we see the problem, a missing '=':

... unzip="1" unzipped_size"1917" url="https...

NTLK will surely fix this soon, until then I'm not sure what the best response is.

Comments

Your Answer

Draft saved
Draft discarded

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.