texvccheck: use texvc as input verification only.

texvc the input language for formulae in wikipedia has become familiar to wide user range of wikipedia editors. Texvc accepts a selection of LaTeX commands augmented with customized macros as user input. In some cases the input is transformed to html or mathml. But in the majority texvc creates a LaTeX document that is rendered via latex to dvi and then converted to an image. The intermediate tex, dvi and aux files are cleaned up accessible after the conversion is completed. Texvc works quite well if the goal is to get png images. However, if a more modern output is desired (MathML, SVG or a LaTeX string to be rendered in the browser via MathJAX) texvc does not help a lot, since the intermediate information is not accessible. Therefore, we started a project to decrypt the magic that happens inside the texvc ocaml code. In a second step we create a light version of texvc (texvc-light) that performs checking and expansion of the user defined commands only. After this code is proven to work, a future step might be to create a grammar e.g. using antlr. That is more flexible and can be used for a wide range of projects. texvccheck aims to tackle the following issues with texvc:
  • no detailed documentation
  • no option to customize allowed input
  • no access to the tex code, used to generate png image
texvccheck (initial codename Texvc-light) removes the following features of texvc:
  • MathML generation
  • HTML output

decryption of texvc's ocaml magic

The list of allowed commands and transaction was extracted with a sequence of regular expressions. So no magic and no error prone manual work: texvc commands

texvccheck

texvccheck has been merged to the MediaWiki math extension at 14th of December 2013 as part of the Math2.0. code of tevccheck

texvc grammar

The first attempt was to write a grammar using antlr and compile that into php code that could be run directly from the mediawiki software. We tried to use antlr and phpantrl-runtime. However this soultion (need of antlr 3 instad of 4) and the fact that phpantrl is not know to be used anywhere in production, made it unlikely that we'll proceed to research in that direction. Comments and inspiration on reasonable tools and ways are welcome.

Tags: