Message 326592 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tuxcell
Recipients	tuxcell, xtreak
Date	2018-09-27.20:36:23
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<327915814.549820.1538080574491@mail.yahoo.com>
In-reply-to	<1537857504.82.0.545547206417.issue34777@psf.upfronthosting.co.za>

Content
Thank you for the quick reply. You are correct about the difficulties of using a universally accepted list.This is one example that generates errors on the server side. Just for the record. #!/usr/bin/env python3 from urllib.request import Request, urlopenfrom urllib.error import URLError # process SSB dataurl1 = 'https://raw.githubusercontent.com/mapnik/test-data/master/csv/points.csv'url2 = 'https://gitlab.cncf.ci/kubernetes/kubernetes/raw/c69582dffba33e9f1c08ff2fc67924ea90f1448c/test/test_owners.csv'url3 = 'http://data.ssb.no/api/klass/v1/classifications/131/changes?from=2016-01-01&to=9999-12-31'headers1 = {'Accept': 'text/csv'}headers2 = {'Akcept': 'text/csv'}headers3 = {'Accept': 'tekst/cxv'}headers4 = {'Accept': '1234'}req = Request(url3, headers=headers4)resp = urlopen(req)content = resp.read().decode(resp.headers.get_content_charset()) # get the character encoding from the server responseprint(content) '''req = Request(url3, headers=headers3) urllib.error.HTTPError: HTTP Error 500: Internal Server Error req = Request(url3, headers=headers4) urllib.error.HTTPError: HTTP Error 406: Not Acceptable''' On Tuesday, September 25, 2018, 8:38:26 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote: Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment: Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838 Some sample requests from curl with invalid headers. curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv' { "args": {}, "headers": { "Accept": "/", "Akcept": "tekst/csv", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst' { "args": {}, "headers": { "Accept": "tekst", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code. Thanks ---------- nosy: +xtreak _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34777> _______________________________________

Thank you for the quick reply. You are correct about the difficulties of using a universally accepted list.This is one example that generates errors on the server side. Just for the record.

#!/usr/bin/env python3
from urllib.request import Request, urlopenfrom urllib.error import URLError
# process SSB dataurl1 = 'https://raw.githubusercontent.com/mapnik/test-data/master/csv/points.csv'url2 = 'https://gitlab.cncf.ci/kubernetes/kubernetes/raw/c69582dffba33e9f1c08ff2fc67924ea90f1448c/test/test_owners.csv'url3 = 'http://data.ssb.no/api/klass/v1/classifications/131/changes?from=2016-01-01&to=9999-12-31'headers1 = {'Accept': 'text/csv'}headers2 = {'Akcept': 'text/csv'}headers3 = {'Accept': 'tekst/cxv'}headers4 = {'Accept': '1234'}req = Request(url3, headers=headers4)resp = urlopen(req)content =  resp.read().decode(resp.headers.get_content_charset()) # get the character encoding from the server responseprint(content)
'''req = Request(url3, headers=headers3)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

req = Request(url3, headers=headers4)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable'''

    On Tuesday, September 25, 2018, 8:38:26 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote:  

Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:

Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a  typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838

Some sample requests from curl with invalid headers.

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv'
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Akcept": "tekst/csv",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst'
{
  "args": {},
  "headers": {
    "Accept": "tekst",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code.

Thanks

----------
nosy: +xtreak

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34777>
_______________________________________

History
Date	User	Action	Args
2018-09-27 20:36:23	tuxcell	set	recipients: + tuxcell, xtreak
2018-09-27 20:36:23	tuxcell	link	issue34777 messages
2018-09-27 20:36:23	tuxcell	create