Skip to content

updated starmath writer #289

Open
jdpipe wants to merge 17 commits into
jgm:masterfrom
jdpipe:master
Open

updated starmath writer #289
jdpipe wants to merge 17 commits into
jgm:masterfrom
jdpipe:master

Conversation

@jdpipe
Copy link
Copy Markdown

@jdpipe jdpipe commented Apr 29, 2026

a new PR based on the jdpipe-master plus changes to address comments from #285

to generate the visual comparison doc/pdf files (eg starmath-review.pdf) use the following:

./tools/make-starmath-review.py --output-dir /tmp/starmath-review && xdg-open /tmp/starmath-review/starmath-review.odt

this change uses the 'native' representation as the base for tests, and includes all tex-to-native tests from the existing test suite in addition to the new ones earlier provided just for starmath.

there are a number of limitations in starmath exposed. the main issue is around \mathcal and \mathbb and \mathfrak which don't exist 'easily' in LO. my solution was to request the user to select 'Latin Modern Math' as their custom 'serif' font in LO Math, and then provide these chars as unicode in the equation. converted equations include a comment telling users to do that.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Apr 30, 2026

all tests seem to be passing now -- some changes required to overcome an issue with API changes in the typst-symbols package, which was blocking the build for this PR. I patched for that, but if you have a better/different solution for that issue, please use that instead.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Apr 30, 2026

You should be able to simply merge/rebase the commit to texmath that handles the typst-symbols changes.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Apr 30, 2026

there are a number of limitations in starmath exposed. the main issue is around \mathcal and \mathbb and \mathfrak which don't exist 'easily' in LO. my solution was to request the user to select 'Latin Modern Math' as their custom 'serif' font in LO Math, and then provide these chars as unicode in the equation. converted equations include a comment telling users to do that.

Do these work currently in LibreOffice with pandoc's current MathML output?

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Apr 30, 2026

there are a number of limitations in starmath exposed. the main issue is around \mathcal and \mathbb and \mathfrak which don't exist 'easily' in LO. my solution was to request the user to select 'Latin Modern Math' as their custom 'serif' font in LO Math, and then provide these chars as unicode in the equation. converted equations include a comment telling users to do that.

Do these work currently in LibreOffice with pandoc's current MathML output?

I glad you asked... I learned something.

By testing 'standard' pandoc/LO on a separate/clean Raspberry Pi system, I found that the existing MathML-based behaviour for texmath is already good specifically with these fonts. The default fonts (Liberation Serif etc) in LO appear to provide the missing symbols via font fallback behaviour (to Noto Sans Math, as confirmed using gucharmap). The following LaTeX code

\mathbb{A} \mathcal{A} \mathfrak{A}

can be used directly with pdflatex to get
image

Meanwhile, un-patched pandoc converts this (via MathML) into the perfectly serviceable Starmath code:

{ nitalic 𝔸 nitalic 𝓐 nitalic 𝕬 }

This could be only slightly improved by rendering as

nitalic {𝔸𝓐𝕬}
or perhaps
nitalic {𝔸 𝓐 𝕬}

or some variants on that theme. A bit of testing with LibreOffice in Raspberry Pi (standard/default fonts, no tweaks or special fonts added) gave me:

image

Whereas LO with Ubuntu 24.04 (with whatever added fonts I happened to have, and with the 'serif' font set for Starmath set to 'Latin Modern Math'):

image

So clearly if we choose 'nitalic {𝔸𝓐𝕬}' then we get some glitchy behaviour in Raspberry Pi with the standard font (ie's the fallback to Noto Sans Math which seems to be the issue). But actually even when that glitch is not there, I felt the symbols were a bit too close. LO is not as nuanced with symbols spacing as Latex is. The best compromise seems to be

nitalic {𝔸 𝓐 𝕬}

rather than

nitalic {𝔸𝓐𝕬}

and I felt this is also better than the current pandoc behaviour, which is
{ nitalic 𝔸 nitalic 𝓐 nitalic 𝕬 }

But having decided on the 'nitalic + unicode' approach, it's all much of a muchness. It appears that Ubuntu (or at least my local install) has got better font fallbacks, and more work is probably needed on math fonts for LO in general. But the unicode-encoding of \mathcal and \mathbb and \mathfrak (etc) is all we need for now, we don't need to require the user to install/select a special font as I had earlier been suggesting. That is because fonts can selectively fall back to other fonts where specific glyphs are missing, and that is already working correctly here in this specific case, at least on the Linux systems I checked.

It is nevertheless true that the Latin Modern Math font is markedly better on character spacing that the default fallback fonts, in both cases. I feel that there's probably a valid bug here to be reported to font-config or whatever the relevant project is...

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented May 11, 2026

@jgm i think the rebase is done now... hopefully the build is clean...

@jgm
Copy link
Copy Markdown
Owner

jgm commented May 11, 2026

As for the linux/stack build failure...be sure to include .github/workflows/ci.yml in your rebase; I have removed the stack build from master...

Comment thread src/Text/TeXMath/Writers/StarMath.hs Outdated
mapStyledUnicode :: (Char -> Maybe Char) -> T.Text -> Maybe T.Text
mapStyledUnicode f t = T.pack <$> mapM f (T.unpack t)

scriptChar :: Char -> Maybe Char
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding these tables scriptChar etc., can't you just use toUnicodeChar from Text.TeXMath.Unicode.ToUnicode ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this -- i'll have a look.

Comment thread .github/workflows/ci.yml
run: |
cabal v2-update
cabal v2-build --dependencies-only --enable-tests --disable-optimization -fexecutable all
cabal v2-build --dependencies-only --enable-tests --disable-optimization -fexecutable -f-server all
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove testing the server build?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i think this was an artifact of the rebasing. wasn't intentional. i'll sort it out.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i think this is correct now? the "-f-server" command is there and being run in GHA.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 3, 2026

a note -- to get better visual agreement of inline vs display equations, i added some new logic, and new visual review code also. i'll circle back next to you review comments.

… etc., can't you just use toUnicodeChar from Text.TeXMath.Unicode.ToUnicode ?"
@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 4, 2026

I think a tex writer error was found in the test suite here, I logged it as #290 rather than seeking to patch here.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 4, 2026

OK i think we're good to go now. Here's the visual test document, in case you want to review it: starmath-review.pdf

@jgm
Copy link
Copy Markdown
Owner

jgm commented Jun 5, 2026

Thanks for providing the review document. But it seems to me that the review we really need would compare: (a) LibreOffice using the starmath generated by this library with (b) LibreOffice using the mathml generated by this library [as in current pandoc]. If this comparison is favorable to starmath, it would be reason to switch.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 6, 2026

Hi -- I think the code produced by this new writer is clearly better than the earlier code, because it renders as idiomatic starmath. When using the mathml syntax, the conversion to starmath occurs within LO, only on the occasion of opening and attempting to edit the equation. at this point, the poor-quality LO converter generates unreadable and unmaintable starmath code. this was the motivation for the new writer in fact, more than the visual equivalence of the latex and the mathml. I think that's important to note.

however, you're right, we should run a visual check also against the previous conversion pathway. i'll have a go.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 6, 2026

Ok so i found errors in the following from the MathML pathway. The converter produces equations that LO cannot render without errors (upside down red question marks). The produced report is here:
starmath-review.pdf

differentiable_manifold
schwinger_dyson
sphere_volume
subsup
002_common_accents
010_absolute_value_bars_normalize_to_delimiters

other points of difference:

  • the starmath writer doesn't have a great solution to the 'prime' issue in LO. neither does the mathml converter. i've reported a bug in LO around the prime symbol, but it's a real gap in the language at this point.
  • differences in spacing for \quad and \qquad
  • the starmath writer produces different output now for $...$ and $$...$$ equations (inline vs 'display' equations). the mathml writer doesn't do that.

I'll commit the revised code so that you can reproduce this document yourself if desired.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 6, 2026

the script also produces .odt output, see attached.
starmath-review.odt

here are some comparisons of what the resulting starmath code looks like for the end user: (first is starmath, second is via mathml):

boxed

\boxed{x^{2} + y^{2} + z^{2}}

we force a deliberate render error above, because \boxed has no equivalent in starmath

{ x ^ 2 + y ^ 2 + z ^ 2 }

subsup

x_b^a ~ x_b^a ~ {func min} csub {A}~ {func max} csub {B}~ {func det} csub {C}~ {func Pr} csub {A}~ {func gcd} csub {A}~ {dot u}^2 ~ {overline u}_%ivarepsilon ~ {underline u}_b^a ~ {a + b} overbrace "term"~ {a + b} overbrace c~ {a + b} underbrace c~ {a + b} underbrace c{~}^H e3{~}_x A{~}_x^3
{ x ^ a _ b `` x ^ a _ b `` nitalic min csub A `` nitalic max csub B `` nitalic det csub C `` nitalic Pr csub A `` nitalic gcd csub A `` u csup ̇ ^ 2 `` u csup ¯ _ ε `` u csub _ ^ a _ b `` { a + b } csup ⏞ csup "term" `` { a + b } csup ⎴ csup c `` { a + b } csub ⏟ csub c `` { a + b } csub ⎵ csub c `` ^ H e 3 `` _ x A `` ^ 3 _ x }

(mathml version has render errors)

002_common_accents

ddot x + hat x + tilde x + vec x + bar x
{ x csup ̈ + x csup ̂ + x csup ̃ + x csup toward + x csup ‾ }

(again, rendering errors, and non-idiomatic starmath)

So Texmath is generating MathML that is at worst wrong or at best not compatible with the LO mathml-to-starmath converter, and hence not really useful. Some of this issue can be addressed with bug-fixes in LO -- I'll take that up separately at some point, hopefully!

@jdpipe jdpipe closed this Jun 6, 2026
@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Jun 6, 2026

(stray keystroke caused closing the issue -- I immediately re-opened, sorry)

@jdpipe jdpipe reopened this Jun 6, 2026
@jgm
Copy link
Copy Markdown
Owner

jgm commented Jun 6, 2026

Fantastic, this is very helpful. However, I am convinced that in many cases the MathML we currently produce is inferior to what you're getting with starmath. It would be a net gain to switch at this point.

I did notice a few little things that I thought I'd point out here. I don't know if they can be fixed easily; some are among the issues you already noted:

Here the starmath version has a stray character at the start, rendered as upside down question mark - why?

image

In the cancel case we also see this -- the MathML version is better, because you don't get these upside-down question marks. But maybe not a huge deal because this is an unsupported feature.

image

In this case the prime is added in the wrong place by the starmath version, and the MathML version does better (image shows the MathML):
image

042_adjacent_bold_terms_get_separator seems to point to a bug in the MathML renderer, or maybe TeX reader -- mathbf(n) seems to generate a non-italicized n, even though n by itself would be italicized. Worth reporting separately.

deMorgans_law
the starmath version wrongly italicizes the Union symbol (is it perhaps a literal letter U in the starmath)?
the mathml version italicizes Union and Intersection uniformly.
the tex italicizes neither.

math-in-text
Here, the starmath versions inserts some extra unwanted spaces.
Compare starmath:
image
and MathML:
image

026_greek_identifier_spacing_in_products
Here the starmath version has insufficient spacing between the line and the denominator.
Better in tex and MathML.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants