Intent to Ship: MathML in Chromium
29 points • 1 comments
Recent @dginev Activity
Intent to Ship: MathML in Chromium
29 points • 1 comments
Hi, ar5iv creator here. Just to jot down that:
1. This was first reported to HN 55 days ago, discussion is here: https://news.ycombinator.com/item?id=30163308
2. There is an official arXiv blog announcement with a little bit of extra details here: https://blog.arxiv.org/2022/02/21/arxiv-articles-as-responsi...
When I first commented on HN, I mentioned that I really hoped we can find a way for arXiv to officially endorse the project and adopt it, so that we solve the copyright issues and get on a healthy path to full integration with the service. That happened a month ago! More healthy work ahead.
Thanks to everyone on HN for the feedback so far. If/when you see broken renderings (still plenty of examples) do comment with the article link, or (even better) use the "report an issue" button at the bottom of each document page. Cheers!
The arXiv of the future will not look like the arXiv
118 points • 51 comments
Oh, and the last question - yes, if arXiv integrates the feature, they will be able to serve any of the versions, including the most recent one.
I can technically implement that, but I really don't want to, as I see it as crossing a certain line. Seeing ar5iv as a limited, constrained, service is a good thing - I think it clearly communicates that I do not want to compete with arXiv.
Certainly, sorry for the confusion.
There's actually multiple "we", since there are two institutions involved, and one foil character - I'm the only one responsible for ar5iv "the website", in a personal capacity.
The fidelity of the generator has the "we" of the team behind LaTeXML, the TeX-to-HTML conversion tool. That is in many ways the most important project to remember here, as that is what we want to actively improve to a point where it is "good enough" in creating HTML over the entirety of arXiv.
The institution hosting the website, and wanting to "serve a community" is KWARC, a research group at the university of FAU-Erlangen in Germany. There are all kinds of projects and services brewing on that end, which have interplay with the HTML data behind ar5iv, but are not directly on the site.
And as to all of us reading HN, I think we are actually interested in arXiv itself being maximally useful. And so is the ar5iv site - it's a temporary deployment, that really is aiming to reintegrate back into the arxiv.org site, and general infrastructure.
If/when that happens is unclear, but in the meantime there is a lot of improvements that can be made, both in what HTML can be generated, deciding what the markup of scientific documents ought to be in the first place, as well as gaining some insights for what new problems arXiv would encounter if they served HTML.
Got it. So even with hyphenation, the gaps are still bad enough that you'd consider the current ar5iv rendering bumpy and distracting?
I think I can see that, but it's almost there, which is why it feels like there has to be something I'm missing for it to justify "just right".
But yes, at the least you've convinced me we should have a separate theme that goes left-aligned, and possibly makes a number of other choices that maximize readability.
Since I'd still want the folks that want "as good as PDF", to feel justified for sticking around.
What's a Dartmouth?
Same HTML backend generator (latexml), different frontends, and different coverage of arXiv.
Also, ar5iv may disappear very quickly, since I am unsure if it's more helpful or harmful. But I'll definitely lean on the public attention to keep asking arXiv to integrate an HTML preview for their articles. In the one-and-only arxiv.org itself.
Lastly, one difference that may ignite a curious debate is that ar5iv is committed to being MathML-native. Yes. MathML is the only markup used for math syntax, and you'll see it rendered directly, undisturbed, with Firefox today.
Over 500 million MathML elements in the full dataset too, pretty awe-inspiring.
If you have a spare minute, please pay a visit to the "report issue" button at the bottom.
Indeed - hard problem and a messy solution. I have no easy answers.
Which is intentional. ar5iv does not aim to be a live preview service, or replace arXiv.
The primary aim is to serve the community with the outputs we have, while we improve the coverage and fidelity of our generator.
And yes, using only the official sources arXiv has released for reuse: https://arxiv.org/help/bulk_data_s3
This is indeed a major difference with -vanity
HN can obviously summon the site author (hi!)
What I was wondering - and still am - shouldn't it be possible to get to a "Pleasant" justified layout on the web in general?
The jagged left-aligned paragraphs are some of the first bits people point to when invoking "my PDF looks better". I definitely am not saying I did it perfectly, but shouldn't it be possible to get a good justified scientific article on the web? Why not?
You make it sound as if you write HTML5 by hand when entering content in web pages. Do you have the same complaint for SVG diagrams or HTML tables, which are even worse for writing by hand than MathML?
I hope not. The key gain here is that the syntax used for mathematics in HTML needs to be easy to integrate in the web platform of 2021. Can you still use the DOM with MathML? Yes. Can you naturally interleave it with other HTML elements to create multi-modal constructions? Yes. With diagrams? SVG+MathML is possible, yes.
Indeed it is designed for technologists, solving a problem on the world wide web, as well as in other structured formats. And no, no one is asking mathematicians to type XML or HTML by hand in 2021.
An author can keep using latex, asciimath, Word or OpenOffice, MathType, MathQuill, or even Mathematica/Matlab/scipy syntax, as they've done until now. And then have their toolchain of choice prepare MathML to be served on the web, (or epub ebooks, etc), in a uniform manner understood by all vendors. Mere mortals can then finally create web apps that can natively access the math, without writing half-baked tex parsers that have hundreds of awkward and undocumented special cases.
TeX syntax has no path to richer integrations into the web platform and keeps mathematics out of reach for all modern ecosystem trends. I certainly think MathML can still be improved quite a bit - mostly by making it smaller and simpler, so that anyone can pick it up in an hour and write a new app in a day.
You ought to consider a name charge seriously, at least for the 3.0 version, in my view. It's hard to estimate how much a name impacts adoption, but it is quite likely people would be reluctant to even try a tool which creates associations that they find unappealing.
Switching to some phonetically adjacent name, say TypeMax or SciMacs or ... could be a friendly invitation to a new generation of users to give the platform a spin.
Deep Expertise: The Need for Fine-Tuning at Scale
3 points • 0 comments
Don't compose non-standard semantics: negative one
1 points • 0 comments
I think the new value-add is based on a combination of: 1) the internet and 2) being further down Moore's Law
The increase in both computational power and connectivity has had impact all over the place, but has also transformed science. Even the most analog disciplines now have computational modeling branches, are building datasets, and need best practices for presenting and publishing computational artifacts. From Digital Humanities, to Computational Biophysics.
Printed pages struggle to convey the depth of these results, as do (originally) flat data formats like PDF, which are oriented towards pixels, rather than datatypes.
But I have to disagree strongly about sniping LaTeX as (completely) outdated, because of the need to supplement the printed page with a "content browser". Yes, we most certainly need a computational toolkit for presenting scientific results and aiding peer-review. But no, typesetting is not going away. The human taste for aesthetically pleasing documents is here to stay, and beautifully laid out narratives are instrumental for getting your message across to your audience. Delighting the eye allows the reader to focus on content, rather than struggle with an ugly form.
So my take on my work at Authorea (yep, beware inside bias!) is to transfer our typesetting best practices to the web as the baseline, but then focus and innovate on the "depth" of web-first science articles. To improve academic writing, we need to expose and interact with data, do the well-known social collaboration gig, and embed guarantees for transparency and reproducibility. Machine-assisted quality control and authoring are the two large impact sweet-spots to hit in the coming decade on this front. Time will show if we're in the know or not, but LaTeX is certainly a point of departure that has to be supported, and learning it is a healthy thing to do in 2017. We have a lot of important work ahead of us!
-1 has clear semantics? Hold my beer
2 points • 0 comments
I'm not sure that image is hires.