Protect newlines inside of CDATA. This was breaking things, notably inline JS that used comments for HTML standards compat.

* Tokenize newlines in `WP_Embed::autoembed()` before running `->autoembed_callback()`
* Tokenize newlines with placeholders in `wpautop()` 
* Introduce `wp_html_split()` to DRY the RegEx from `wp_replace_in_html_tags()` and `do_shortcodes_in_html_tags()`

Adds unit tests.

Props miqrogroove, kitchin, azaozz.
Fixes #33106.


git-svn-id: https://develop.svn.wordpress.org/trunk@33469 602fd350-edb4-49c9-b593-d223f7449a82
This commit is contained in:
Scott Taylor
2015-07-28 23:02:04 +00:00
parent 1558be9dfa
commit 4f814ec9ae
5 changed files with 225 additions and 46 deletions

View File

@@ -333,29 +333,10 @@ function do_shortcodes_in_html_tags( $content, $ignore_html ) {
$trans = array( '[' => '[', ']' => ']' );
$pattern = get_shortcode_regex();
$comment_regex =
'!' // Start of comment, after the <.
. '(?:' // Unroll the loop: Consume everything until --> is found.
. '-(?!->)' // Dash not followed by end of comment.
. '[^\-]*+' // Consume non-dashes.
. ')*+' // Loop possessively.
. '(?:-->)?'; // End of comment. If not found, match all input.
$regex =
'/(' // Capture the entire match.
. '<' // Find start of element.
. '(?(?=!--)' // Is this a comment?
. $comment_regex // Find end of comment.
. '|'
. '[^>]*>?' // Find end of element. If not found, match all input.
. ')'
. ')/s';
$textarr = preg_split( $regex, $content, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
$textarr = wp_html_split( $content );
foreach ( $textarr as &$element ) {
if ( '<' !== $element[0] ) {
if ( '' == $element || '<' !== $element[0] ) {
continue;
}
@@ -370,7 +351,7 @@ function do_shortcodes_in_html_tags( $content, $ignore_html ) {
continue;
}
if ( $ignore_html || '<!--' === substr( $element, 0, 4 ) ) {
if ( $ignore_html || '<!--' === substr( $element, 0, 4 ) || '<![CDATA[' === substr( $element, 0, 9 ) ) {
// Encode all [ and ] chars.
$element = strtr( $element, $trans );
continue;