Permalinks: Sanitize non-visible characters inside sanitize_title_with_dashes().

This change prevents non-visible characters in titles from creating encoded values in permalinks, opting instead for the following replacement strategy:

* Non-visible non-zero-width characters are replaced with hyphens
* Non-visible zero-width characters are removed entirely

Included with this change are 64 additional PHPUnit assertions to confirm that only the targeted non-visible characters are sanitized as intended.

Before this change, URLs would unintentionally contain encoded values where these non-visible characters were. After this change, URLs intentionally strip out or hyphenate these non-visible characters.

Props costdev, dhanendran, hellofromtonya, paaljoachim, peterwilsoncc, poena, sergeybiryukov.

Fixes #47912.

git-svn-id: https://develop.svn.wordpress.org/trunk@51984 602fd350-edb4-49c9-b593-d223f7449a82
This commit is contained in:
John James Jacoby
2021-11-02 18:46:36 +00:00
parent d7518d1927
commit 8f9eea80f9
2 changed files with 244 additions and 0 deletions

View File

@@ -2288,11 +2288,45 @@ function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'displa
'%cc%80',
'%cc%84',
'%cc%8c',
// Non-visible characters that display without a width.
'%e2%80%8b',
'%e2%80%8c',
'%e2%80%8d',
'%e2%80%8e',
'%e2%80%8f',
'%e2%80%aa',
'%e2%80%ab',
'%e2%80%ac',
'%e2%80%ad',
'%e2%80%ae',
'%ef%bb%bf',
),
'',
$title
);
// Convert non-visible characters that display with a width to hyphen.
$title = str_replace(
array(
'%e2%80%80',
'%e2%80%81',
'%e2%80%82',
'%e2%80%83',
'%e2%80%84',
'%e2%80%85',
'%e2%80%86',
'%e2%80%87',
'%e2%80%88',
'%e2%80%89',
'%e2%80%8a',
'%e2%80%a8',
'%e2%80%a9',
'%e2%80%af',
),
'-',
$title
);
// Convert &times to 'x'.
$title = str_replace( '%c3%97', 'x', $title );
}