User:Inductiveload/Parser migration

MediaWiki is preparing to change the under-lying parser that is used to interpret Wikitext and produce HTML. The old parser is called "Tidy" and the new one is "Remex".

There are many (hundreds of thousands) of cases where the Wikitext at Wikisource is somehow invalid. In some cases, this might cause different results when the new parser is used. However, in a lot of cases, there is no real visible difference.

There is a tool called a linter which can show many of these problems. They are shown at Special:LintErrors.

How to check for linter errors
Other than trawling Special:LintErrors, there is a tool by de:w:PerfektesChaos which adds a linter checker to the top of pages. There are instructions at en:w:User:PerfektesChaos/js/lintHint, but to activate on all WS pages, add this to your Special:MyPage/common.js:

// linter config object var myLintHints = { };

// lint in all namespaces myLintHints.rooms = "*";

// communicate user defined object mw.hook( "lintHint.config" ).fire( myLintHints );

// finally, load gadget mw.loader.load( "https://en.wikipedia.org/w/index.php?title=User:PerfektesChaos/js/lintHint/r.js&action=raw&bcache=1&maxage=86400&ctype=text/javascript" );

A yellow button should appear on pages near the top and when clicked it will show the linter errors on the page. When you are editing a page, clicking it will check the current editor contents live.

Note: currently doesn't seem to work in Page: or Index: namespaces. The development version (as of 2.14) does work. Replace  with   above.

How to compare parser outputs
There is a tool that you can active in your Preferences under Editing called "Parser Migration tool". This allows you to see a page as it is processed by each parser. Ideally, both sides will be identical.

Errors and templates
The error reporting at Special:LintErrors often includes the template the error is in. This can be useful, and it can be misleading, as it could indicate the error is in a parameter of the template, or it could indicate it's somewhere in the template code, or it could be an interaction of the two.

Error in parameter
In this case, the error has nothing to do with the template, it just happens to be in a parameter:

This will be reported as though larger block, but the problem is not found in the template code.

Common linter errors
There is a description of each linter error reported at mw:Help:Extension:Linter.

Below is a quick description of some common ones in the context of Wikisource. Nearly all linter errors are harmless, in that the affected code does not generally render out differently, even if the two parsers might disagree about something. However, they do also often indicate low-quality markup or markup that has a typo or has been broken accidentally.

Span vs block
HTML has "span" and "block" elements. Span elements generally look like,  ,   etc. Block elements are like   and.

Span elements should not contain block elements. However this is often inadvertanly done by including block elements inside a template that represents a span elements:

Foo

bar

In this case, a block element is produced by the paragraph break and is put inside a span-based template. This produces a Misnested tag with different rendering in HTML5 and HTML4 or Misnested tag linter error. To resolve it, either use a separate larger for each line, or use the larger block template, which can contain block elements.

If the misnested tag is a  tag, consider replacing with a block sizing template like larger block, as the font tag is deprecated anyway.

These errors are marked as "high-priority" (for the HTML4/5 ones). These problems will cause differences in rendering:



The others are "medium-priority", and do not generally appear to cause much visual disturbance between the two parsers.

Tag interleaving
This error can also happen when tags are interleaved rather than nested:

 fdsdfsf  sddsf  adsafd 

In this case, the bold tag is closed before the italic tag, even though it was opened first. In HTML, the tags must be nested, so that if you open a tag inside another one, you close the inner one before you close the containing one. Exactly how you do this will depend somewhat on what the markup was trying to achieve but might be like this:

 fdsdfsf  sddsf   adsafd 

This kind of mis-nesting is fairly rare as it's harder to do with templates, and generally the two parsers come up with the same output.

It can still happen with wikitext:

dasdas asdads adad

Unterminated markup
Markup like the following confuses the parser as it has to guess where to put the closing tag. When this happens, it often makes the right choice (which is why editors don't notice). In any case, it raises a Missing end tag error.

''italic, but where is the end?

bold, but where is the end?

There are many tags that can be left unterminated (upsetting the linter but maybe producing valid output). The linter will tell you what the tag is. The majority are italic or bold markup, either HTML tags or Wikitext.

Obsolete tags
Some tags are deprecated in HTML, mostly because they violate the separation between content and layout. The most common are  and.

can generally be easily replaced by center.

can usually be replaced with a colour template like red or one of the text size templates like larger, depending on what it's being used for.

Stripped tags
These are when the parser doesn't know what to do with a tag and discards it. Very often it's due to a superfluous closing tag, possibly left when the opening tag was removed in the past:

Lorem ipsum

These errors probably are harmless as the linter is discarding them anyway, but they might be a sign that some formatting has been broken.

Tidy bug affecting font tags wrapping links
This is a very specific error caused by code like this:

A link

When the new parser interprets this, the link will not be coloured as expected. Generally, at WS, this only happens in user signatures, so it's not a critical problem affecting content. As  tags are deprecated anyway, the correct markup for it would be:

A link

Error in template
Any parser error that happens in the template will happen on all pages that use the template.

Imagine a template like this that used  to center the only parameter:

Every page that uses this template will show up with an "Obsolete tag" lint error. Each error will be reported as being through this template. Changing  to center would fix every page the template is used on. It can take a while for Special:LintErrors to update when these errors are fixed.

Error caused by interaction of parameters and template
Imagine a template that makes text red:

As this is a span-based template, if you feed it block elements, it will cause "Misnested tag" errors. Again it will be reported as though the template, but it's not fully the template's fault, and it's not fully the parameter's fault. The errors can be avoided in two ways:


 * Fix the parameters to not cause the errors. This may mean you can't format something quote how you wanted if the template isn't written to allow it (e.g. paragraph breaks in a template that wraps the input in a span)
 * Change the template to accomodate the input you want to give (in the above case, use a div). Bear in mind, this might break existing users of the template (in the above example, you would no longer be able to use the template within a line of text without a line break).

Another example of this could be a template like this:



If you call this template like this:

{{mytemplate| text

You will get an error, because the wikitext expands to:

 text 

Which is invalid, as the span and italic tags will be interleaved rather than nested as you might expect (as  stands for   and , depending on context). In this case, you could:
 * replace one or both of the 's with
 * rethink why you are italicising text that's already being italicised by a template - are you mistaken or is the template too inflexible?