User:Topbanana/Reports/This article contains a malformed HTML entity
Appearance
Overview
[edit]The list below shows articles containing malformed HTML entities. It was generated on 7th December 2004 from the 15th November 2004 database dump.
Preamble
[edit]Malformed HTML entities may not be rendered correctly in some browsers. They should begin with an ampersand, end with a semi-colon and contain a valid token in the middle.
How does this work?
[edit]- Examine the links shown on the report below and correct those that are wrong.
- Delete from the list those things you've fixed, or
- Score out false positive suggestions by enclosing them within <s></s> tags, and explain why the correction suggestion in inappropriate
- Optionally, mark your edit comment with the following text, so that other users are drawn to this page thus increasing the number of people fixing wiki errors: [[User:Topbanana/Reports/This article contains a malformed HTML entity|Help Wikipedia fix suspected malformed HTML entities - click here!]]
Regenerating this report
[edit]This report is generated from a Link Analysis Database using the SQL:
DROP TABLE html_entity; CREATE TABLE html_entity ( code varchar(32) NOT NULL, PRIMARY KEY( code ) ) ENGINE=MyISAM; INSERT INTO html_entity VALUES ( 'sup1' ); INSERT INTO html_entity VALUES ( 'sup2' ); INSERT INTO html_entity VALUES ( 'sup3' ); INSERT INTO html_entity VALUES ( 'amp' ); INSERT INTO html_entity VALUES ( 'lt' ); INSERT INTO html_entity VALUES ( 'gt' ); INSERT INTO html_entity VALUES ( 'nbsp' ); INSERT INTO html_entity VALUES ( 'mdash' ); SELECT concat( '*[[', art_title, ']] - check ', group_concat( code ) ) FROM art, html_entity WHERE art_text REGEXP concat( '&', code , '([^;]|$)' ) GROUP BY art_title ORDER BY art_title;
Suggested improvements
[edit]- specify the line on which the HTML is malformed, and/or specify the nature of the malformation.
- This regexp only searches for one type of malformation: leaving off the final semicolon. --ChrisRuvolo 17:58, 18 Nov 2004 (UTC)
- Flag false positives that aren't really malformed HTML entities, for example the use of &c instead of etc or similar. (There were several of these in Abbey.) It would also be worth searching for &c;, which is valid if uncommon English masquerading as a correctly formed but non-existent HTML entity. -- Avaragado 20:59, 24 Nov 2004 (UTC)
- Well, even if they're not supposed to be HTML entities, they still make malformed HTML and should have the ampersand replaced with & DopefishJustin (・∀・) 23:16, Nov 29, 2004 (UTC)
- Search for other HTML entities. Common ones should include: lt, gt, amp, nbsp, mdash, sup1, sup3, and numeric entities (eg. Α). A regexp like this might work:
REGEXP '&(sup[123]|amp|lt|gt|nbsp|mdash|#x[0-9a-fA-F]*)([^;]|$)'
- --ChrisRuvolo 22:59, 24 Nov 2004 (UTC)
Search for missing leading ampersand.Possible regexp:
REGEXP '(^|[^&])(sup[123]|amp|lt|gt|nbsp|mdash|#x[0-9a-fA-F]*);'
- --ChrisRuvolo 22:59, 24 Nov 2004 (UTC)
- Note, this needs improvement. As it stands, it would hit false positives showing how HTML entities should look. e.g.: &sup2; --ChrisRuvolo 23:03, 24 Nov 2004 (UTC)
- And also words like "lamp", "felt", etc, when followed by a semi-colon. IMHO checking for missing leading ampersand is unnecessary. I'm not sure I've ever seen one of these in the wild - people forget the semi-colon all the time, but remember the ampersand. -- Avaragado 23:13, 24 Nov 2004 (UTC)
- This is true. I withdraw my suggestion. --ChrisRuvolo 23:56, 24 Nov 2004 (UTC)
- And also words like "lamp", "felt", etc, when followed by a semi-colon. IMHO checking for missing leading ampersand is unnecessary. I'm not sure I've ever seen one of these in the wild - people forget the semi-colon all the time, but remember the ampersand. -- Avaragado 23:13, 24 Nov 2004 (UTC)
- Note, this needs improvement. As it stands, it would hit false positives showing how HTML entities should look. e.g.: &sup2; --ChrisRuvolo 23:03, 24 Nov 2004 (UTC)
- List updated. I've opted for a slightly more databasey mechanism to report on several different HTML entities. Yes, this still just checks for unclosed entities - there's work to be done on other malformations, not forgetting out-and-out typos such as &nsbp;. However, as is the list shows enough problems to keep eveyone occupied for now ;) - TB 09:30, 2004 Dec 7 (UTC)
- Its been nearly a year. Can we get another run of this report, TB? Thanks. --ChrisRuvolo (t) 04:47, 1 November 2005 (UTC)
List
[edit]The list below omits sup2s for now - most seem to have been fixed after the datbase dump I'm working from was taken. - TB 11:21, 2004 Dec 7 (UTC)
False Positives
[edit]Franklin_W._Olin_College_of_Engineering - check lt- false positive—part of the URL for an external link
U.S._presidential_election,_2012 - check nbsp- false positive—nothing found in substub article, no significant history
University_of_Dallas - check lt- false positive—it's part of the URL for an external link
Additional problems found
[edit]List_of_Famicom_games - Malformed html entity fixed but faulty table markup leaves an extraneous <tr><td> at top.fixed --Phil | Talk 12:26, Mar 9, 2005 (UTC)