wordpress-develop/tests/phpunit/tests
Dennis Snell 616e673d3e HTML API: Scan all syntax tokens in a document, read modifiable text.
Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

{{{
<?php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
}}}

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

{{{
    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>
}}}

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

{{{
    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag
}}}

{{{
    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘
}}}

Text nodes
==========

Text nodes are entirely modifiable text.

{{{
    This HTML document has no tags.
    └─ modifiable text ───────────┘
}}}

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

{{{
    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <!-->
    This invalid comment has no modifiable text.
}}}

{{{
    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘
}}}

Other token types also have modifiable text. Consult the code or tests
for further information.

Developed in https://github.com/WordPress/wordpress-develop/pull/5683
Discussed in https://core.trac.wordpress.org/ticket/60170

Follows [57575]

Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam
Fixes #60170



git-svn-id: https://develop.svn.wordpress.org/trunk@57348 602fd350-edb4-49c9-b593-d223f7449a82
2024-01-24 23:35:46 +00:00
..
admin Administration: Introduce new_admin_email_subject filter. 2024-01-14 10:59:48 +00:00
ajax Tests: Use assertSame() in some newly introduced tests. 2024-01-06 12:59:49 +00:00
attachment Coding Standards: Remove superfluous blank lines at the end of various classes. 2023-09-08 09:30:38 +00:00
block-supports Editor: fix fluid font division by zero error when min and max viewport widths are equal. 2024-01-23 05:55:49 +00:00
block-templates Tests: Fix tests following r57265. 2024-01-10 21:12:07 +00:00
blocks Editor: Support deferred block variation initialization on the server. 2024-01-19 20:52:06 +00:00
bookmark Tests: Use the function get_num_queries across all unit tests. 2023-05-11 10:05:51 +00:00
canonical Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
category Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
comment Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
compat Tests: Remove some unnecessary function_exists() checks for compat functions. 2023-10-18 10:39:19 +00:00
cron Cron API: Modify _get_cron_array() to always return an array. 2022-07-29 03:32:58 +00:00
customize Tests: Add a @ticket reference for WP_Customize_Manager::trash_changeset_post() test. 2024-01-05 11:38:34 +00:00
date Tests: Add unit tests for wp_checkdate(). 2023-12-12 12:19:38 +00:00
db Docs: Replace "sanity" with "confidence" for inclusive language. 2024-01-03 21:57:32 +00:00
dependencies Script Loader: Only emit CDATA wrapper comments in wp_get_inline_script_tag() for JavaScript. 2024-01-23 17:54:25 +00:00
diff Code Modernization: Use wp_trigger_error() in WP_Text_Diff_Renderer_Table magic methods. 2023-09-07 20:46:53 +00:00
editor Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
error-protection Tests: Rename classes in phpunit/tests/error-protection/ per the naming conventions. 2021-08-21 15:32:53 +00:00
feed Tests: Reduce usage of assertEquals 2023-09-29 15:22:12 +00:00
filesystem Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
fonts/font-face Fonts: Get font-family name from 'fontFamily' field. 2023-09-25 21:27:51 +00:00
formatting Docs: Replace "sanity" with "confidence" for inclusive language. 2024-01-03 21:57:32 +00:00
functions Tests: Move wp_parse_list() tests to their own file. 2024-01-14 17:15:49 +00:00
general Tests: Use a @requires annotation for readonly() function test. 2023-10-26 20:31:45 +00:00
hooks Tests: Add hook priority call order tests. 2024-01-09 16:32:14 +00:00
html-api HTML API: Scan all syntax tokens in a document, read modifiable text. 2024-01-24 23:35:46 +00:00
http Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
image Media: Fix handling of multibyte exif description metadata. 2024-01-10 21:57:50 +00:00
import Coding Standards: Remove redundant ignore annotations. 2023-09-28 00:02:47 +00:00
includes Code Modernization: Explicitly declare all properties created in set_up() methods of various test classes. 2022-08-27 12:30:08 +00:00
kses Editor: Add PHPUnit tests for 5.9.0 new functions. 2022-10-04 14:20:18 +00:00
l10n I18N: Improve docblocks after [57337]. 2024-01-24 07:55:53 +00:00
link Posts, Post Types: Don't force trailing slash in get_pagenum_link(). 2023-10-16 00:05:28 +00:00
load Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
media Media: Consider inline image CSS width to backfill width and height attributes. 2024-01-16 17:01:30 +00:00
menu Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
meta Revisions: framework for storing post meta revisions. 2023-09-26 15:30:34 +00:00
multisite Coding Standards: Use pre-increment/decrement for stand-alone statements. 2023-09-09 09:26:01 +00:00
oembed Embeds: Ensure the deprecated function print_emoji_styles isn't used 2024-01-17 21:34:24 +00:00
option Options, Meta APIs: Fast follow fixes for option cache priming functions. 2023-10-30 22:56:25 +00:00
pluggable Coding Standards: Use pre-increment/decrement for stand-alone statements. 2023-09-09 09:26:01 +00:00
pomo Code Modernization: Use dirname() with the $levels parameter. 2023-09-11 04:51:09 +00:00
post Docs: Fix broken @covers tag in _wp_post_thumbnail_context_filter() tests. 2023-11-10 22:52:06 +00:00
privacy Docs: Fix various incorrect WP-flavored array specifications. 2023-11-09 09:49:41 +00:00
query Build/Test Tools: Fix unstable query tests. 2024-01-18 18:38:23 +00:00
rest-api Editor: Support deferred block variation initialization on the server. 2024-01-19 20:52:06 +00:00
rewrite Coding Standards: Remove superfluous blank lines at the end of various classes. 2023-09-08 09:30:38 +00:00
script-modules Script Loader: Load the modules to the footer in classic themes 2024-01-24 10:37:54 +00:00
sitemaps Sitemaps: add lastmod for individual posts and the homepage. 2023-10-23 15:40:44 +00:00
style-engine Editor: add size and repeat to background image support. 2024-01-09 06:10:09 +00:00
taxonomy Build/Test Tools: Implement use of the void solution. 2021-08-07 10:29:41 +00:00
term Taxonomy: Check for empty term after DB sanitization in wp_insert_term(). 2024-01-08 22:42:49 +00:00
theme Editor: Update the ThemeJson unit test to cover custom CSS feature. 2024-01-23 09:04:51 +00:00
url Tests: Second pass at merging file-level and class-level DocBlocks in various unit test files. 2023-03-03 14:42:42 +00:00
user Docs: Correct the WP_User Query location reference in query cache tests. 2024-01-22 19:40:18 +00:00
widgets Tests: Reduce usage of assertEquals 2023-09-29 15:22:12 +00:00
wp Tests: Move the tests for WP class methods to the wp directory. 2022-10-28 14:08:20 +00:00
xmlrpc XML-RPC: Add alt attribute value to media item API. 2023-09-20 19:29:30 +00:00
actions.php Tests: Add hook priority call order tests. 2024-01-09 16:32:14 +00:00
adminbar.php Tests: Add tests to ensure the contribute Toolbar node is added when appropriate. 2023-07-13 12:39:42 +00:00
auth.php REST API: Correct parsing of password from Authorization header when processing Application Password credentials. 2023-10-09 14:47:57 +00:00
avatar.php Coding Standards: Remove superfluous blank lines at the end of various classes. 2023-09-08 09:30:38 +00:00
basic.php Tests: Separate the tests in basic.php for clarity. 2022-07-07 23:55:13 +00:00
block-template-utils.php Tests: Improve code coverage for _build_block_template_result_from_file 2023-10-23 05:36:50 +00:00
block-template.php Themes: Improve the performance of _get_block_templates_paths. 2023-12-20 20:00:04 +00:00
cache.php Coding Standards: Always use parentheses when instantiating an object. 2022-11-29 15:49:49 +00:00
canonical.php Media: Revert [57310]. 2024-01-19 23:58:08 +00:00
comment.php Tests: Reset the current user before performing assertions in some comment tests. 2023-10-15 08:07:11 +00:00
cron.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
db.php Database: Reinstate wpdb::$use_mysqli property. 2023-11-08 20:31:34 +00:00
dependencies.php Coding Standards: Remove superfluous blank lines at the end of various functions. 2023-09-08 10:01:14 +00:00
file.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
filters.php Tests: Add hook priority call order tests. 2024-01-09 16:32:14 +00:00
functions.php Tests: Move wp_parse_list() tests to their own file. 2024-01-14 17:15:49 +00:00
https-detection.php Security: remove the cron event that checked for https support. 2023-09-22 19:06:45 +00:00
https-migration.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
kses.php KSES: Add background-repeat to the list of safe CSS properties. 2023-12-26 14:22:45 +00:00
l10n.php Tests: Ensure prerequisites are met for draft length tests in Tests_L10n. 2022-10-01 15:47:13 +00:00
link.php Posts, Post Types: Don't force trailing slash in get_pagenum_link(). 2023-10-16 00:05:28 +00:00
locale.php I18N: Introduce word_count_type property to WP_Locale. 2023-02-07 17:26:14 +00:00
media.php Themes: Skip wrapping block template for singular content with a main query loop when the template was injected from outside the current theme. 2023-10-27 18:16:05 +00:00
meta.php Tests: Reduce usage of assertEquals 2023-09-29 15:22:12 +00:00
post.php Docs: Replace "sanity" with "confidence" for inclusive language. 2024-01-03 21:57:32 +00:00
query.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
readme.php General: Bump the recommended MySQL version in readme.html. 2023-10-22 09:03:12 +00:00
rest-api.php Tests: Avoid an infinite loop in Spy_REST_Server if a non-existing method is called. 2023-11-23 14:39:16 +00:00
rewrite.php Code Modernization: Check the return type of parse_url() in url_to_postid(). 2022-10-01 03:23:41 +00:00
robots.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00
shortcode.php Coding Standards: Remove superfluous blank lines at the end of various functions. 2023-09-08 10:01:14 +00:00
taxonomy.php Taxonomy: add taxonomy for user pattern categories. 2023-09-21 04:23:12 +00:00
template.php Themes: Deprecate usage of TEMPLATEPATH and STYLESHEETPATH constants. 2023-09-20 17:25:26 +00:00
term.php Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
theme-previews.php Editor: Add function prefix to avoid conflicts. 2023-10-02 22:40:36 +00:00
theme.php Tests: Use assertSame() in some newly introduced tests. 2024-01-06 12:59:49 +00:00
upload.php Coding Standards: Remove superfluous blank lines at the end of various classes. 2023-09-08 09:30:38 +00:00
url.php Coding Standards: Use pre-increment/decrement for stand-alone statements. 2023-09-09 09:26:01 +00:00
user.php Tests: Reduce usage of assertEquals 2023-09-29 15:22:12 +00:00
utils.php Coding Standards: Remove superfluous blank lines at the end of various functions. 2023-09-08 10:01:14 +00:00
vars.php Tests: Improve the @group annotation accuracy and consistency. 2023-10-19 13:51:04 +00:00
walker.php Coding Standards: Remove superfluous blank lines at the end of various functions. 2023-09-08 10:01:14 +00:00
widgets.php Coding Standards: Include one space after function keyword for closures. 2023-09-12 15:21:02 +00:00

<?php
/**
 * Validate recommended versions for dependencies referenced in `readme.html`,
 * based on external site support pages.
 *
 * @group external-http
 */
class Tests_Readme extends WP_UnitTestCase {

	/**
	 * @coversNothing
	 */
	public function test_readme_php_version() {
		$this->markTestSkipped(
			'Temporarily disabled. Test should be re-enabled once WordPress is fully compatible with PHP 8.0+.'
		);

		// This test is designed to only run on trunk.
		$this->skipOnAutomatedBranches();

		$readme = file_get_contents( ABSPATH . 'readme.html' );

		preg_match( '#Recommendations.*PHP</a> version <strong>([0-9.]*)#s', $readme, $matches );

		$response_body = $this->get_response_body( 'https://www.php.net/supported-versions.php' );

		preg_match_all( '#<tr class="stable">\s*<td>\s*<a [^>]*>\s*([0-9.]*)#s', $response_body, $php_matches );

		$this->assertContains( $matches[1], $php_matches[1], "readme.html's Recommended PHP version is too old. Remember to update the WordPress.org Requirements page, too." );
	}

	/**
	 * @coversNothing
	 */
	public function test_readme_mysql_version() {
		// This test is designed to only run on trunk.
		$this->skipOnAutomatedBranches();

		$readme = file_get_contents( ABSPATH . 'readme.html' );

		preg_match( '#Recommendations.*MySQL</a> version <strong>([0-9.]*)#s', $readme, $matches );

		$response_body = $this->get_response_body( "https://dev.mysql.com/doc/relnotes/mysql/{$matches[1]}/en/" );

		// Retrieve the date of the first GA release for the recommended branch.
		preg_match( '#.*(\d{4}-\d{2}-\d{2}), General Availability#s', $response_body, $mysql_matches );

		/*
		 * Per https://www.mysql.com/support/, Oracle actively supports MySQL releases for 5 years from GA release.
		 *
		 * The currently recommended MySQL 8.0 branch moved from active support to extended support on 2023-04-19.
		 * As WordPress core may not be fully compatible with MySQL 8.1 at this time, the "supported" period here
		 * is increased to 8 years to include extended support.
		 *
		 * TODO: Reduce this back to 5 years once MySQL 8.1 compatibility is achieved.
		 */
		$mysql_eol    = gmdate( 'Y-m-d', strtotime( $mysql_matches[1] . ' +8 years' ) );
		$current_date = gmdate( 'Y-m-d' );

		$this->assertLessThan( $mysql_eol, $current_date, "readme.html's Recommended MySQL version is too old. Remember to update the WordPress.org Requirements page, too." );
	}

	/**
	 * @coversNothing
	 */
	public function test_readme_mariadb_version() {
		// This test is designed to only run on trunk.
		$this->skipOnAutomatedBranches();

		$readme = file_get_contents( ABSPATH . 'readme.html' );

		preg_match( '#Recommendations.*MariaDB</a> version <strong>([0-9.]*)#s', $readme, $matches );
		$matches[1] = str_replace( '.', '', $matches[1] );

		$response_body = $this->get_response_body( "https://mariadb.com/kb/en/release-notes-mariadb-{$matches[1]}-series/" );

		// Retrieve the date of the first stable release for the recommended branch.
		preg_match( '#.*Stable.*?(\d{2} [A-Za-z]{3} \d{4})#s', $response_body, $mariadb_matches );

		// Per https://mariadb.org/about/#maintenance-policy, MariaDB releases are supported for 5 years.
		$mariadb_eol  = gmdate( 'Y-m-d', strtotime( $mariadb_matches[1] . ' +5 years' ) );
		$current_date = gmdate( 'Y-m-d' );

		$this->assertLessThan( $mariadb_eol, $current_date, "readme.html's Recommended MariaDB version is too old. Remember to update the WordPress.org Requirements page, too." );
	}

	/**
	 * Helper function to retrieve the response body or skip the test on HTTP timeout.
	 *
	 * @param string $url The URL to retrieve the response from.
	 * @return string The response body.
	 */
	public function get_response_body( $url ) {
		$response = wp_remote_get( $url );

		$this->skipTestOnTimeout( $response );

		$response_code = wp_remote_retrieve_response_code( $response );
		$response_body = wp_remote_retrieve_body( $response );

		if ( 200 !== $response_code ) {
			$parsed_url = parse_url( $url );

			$error_message = sprintf(
				'Could not contact %1$s to check versions. Response code: %2$s. Response body: %3$s',
				$parsed_url['host'],
				$response_code,
				$response_body
			);

			if ( 503 === $response_code ) {
				$this->markTestSkipped( $error_message );
			}

			$this->fail( $error_message );
		}

		return $response_body;
	}
}