Skip to content

mb_detect_encoding does not return the first matching encoding anymore #8279

@come-nc

Description

@come-nc

Description

The following code:

<?php

$str = '/dav/files/admin/%C3%BC%C3%B6%C3%A4%C3%B6%C3%A4%C3%BC%C3%B6%C3%A4%C3%BB%C5%B7%C3%AE';
$rawstr = rawurldecode($str);

var_dump(
    mb_detect_encoding($rawstr, ['UTF-8', 'ISO-8859-1']),
    mb_detect_encoding($rawstr, ['ISO-8859-1', 'UTF-8']),
    mb_check_encoding($rawstr, 'ISO-8859-1'),
    mb_check_encoding($rawstr, 'UTF-8'),
);

https://3v4l.org/kqHre

Resulted in this output:

string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
bool(true)
bool(true)

But I expected this output instead:

string(5) "UTF-8"
string(10) "ISO-8859-1"
bool(true)
bool(true)

It seems the behavior of mb_detect_encoding changed in PHP 8.1, not clear if this is on purpose or not.
The documentation of mb_detect_encoding suggest that it will return the first matching encoding, which it does up until PHP 8.0
But with 8.1 it returns iso even if mb_check_encoding returns true for both utf and iso.

PHP Version

8.1

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions