Comments (9)
Having auto-generated tables of contents would save a lot of developer time.
"Let's just let the community make multiple incompatible implementations" has resulted in the status quo: still no autogenerated TOC for CommonMark documents.
AFAIU, one argument presented in [1] is that :
- anchor generation is best as a CommonMark extension or post-processing step
- because nobody can decide which anchor generation strategy is best for all applications
- tables of contents depend on anchor generation
- therefore, tables of contents must also be either an extension or a not-CommonMark thing
As a result:
- there is still no standard way to add an autogenerated Table of Contents to CommonMark documents
- because CommonMark cannot / will not solve for this feature, it's up to e.g. GitHub (in e.g. the github/markup repository) to implement a non-standardized TOC post-processing step on top of their already-implemented non-standardized anchor generation post-processing step
- GitHub expressed in [2] that they thought it would be best for CommonMark to standardize TOC markup (so that CommonMark documents are portable across implementations of CommonMark)
[1] https://talk.commonmark.org/t/feature-request-automatically-generated-ids-for-headers/115
How about:
- pluggable and/or configurable anchor generation
- a standard syntax for including a TOC (with parameters) that is backwards-compatible
from cmark.
The issue of link anchors is a tricky one and it has been extensively discussed here, as you probably know: https://talk.commonmark.org/t/feature-request-automatically-generated-ids-for-headers/115/10
I'm thinking about the following approach:
- Add an identifier field to every node.
- Add functions to the API to set and get this identifier.
- Make the HTML renderer sensitive to this identifier, so that if it is set, it gets included on the element (assuming the element has a tag that takes an ID).
- Don't change anything else about cmark.
With this setup, you could postporcess the AST after parsing and add header identifiers, using any scheme that makes sense for your application, and you could also insert a TOC at that point. This could be done using the cmark API and the iterator interface.
What do you think? If we wanted to, we could provide default functions for doing these transformations, and include an option with the command-line tool.
A more flexible approach would be to add an attributes
field to every node: this would be a linked list of attribute, string-value pairs.
None of this would require the spec to say anything in particular about header identifiers.
@nwellnhof @MathieuDuponchelle @coding-horror @vmg @gjtorikian
I'd be interested in any thoughts on this too.
from cmark.
Not sure about that, the problem with this approach is portability.
The use case for anchors most people are interested in, afaict, isn't table of contents generation, but in-document linking. If each tool uses the API in a different way to set ids, then when switching tools to render the same manually written markdown file containing links to anchors breaks.
My preferred solution would be to have a simple add-anchors option, which would generate ids guaranteed to be valid in html, like the auto_identifiers extension does in pandoc.
from cmark.
+++ Mathieu Duponchelle [Jun 24 16 11:23 ]:
Not sure about that, the problem with this approach is portability.
The use case for anchors most people are interested in, afaict, isn't
table of contents generation, but in-document linking. If each tool
uses the API in a different way to set ids, then when switching tools
to render the same manually written markdown file containing links to
anchors breaks.My preferred solution would be to have a simple add-anchors option,
which would generate ids guaranteed to be valid in html, like the
auto_identifiers extension does in pandoc.
As you can see from the linked talk page, there are lots of differences of opinion about how the automatica links should be generated, or whethere they should be generated at all. It may be that different kinds of sites are going to need different approaches for this. And so I'm not persuaded that the spec should demand a particular method for generating automatic IDs... The pandoc method has the drawback that if you re-order sections in your document, and some sections have the same name, links may break.
The proposal here would be compatible with eventually standardizing on one way of doing this. But it would make it possible for people to add IDs now, in a way that suits their purposes, without any decision being made in the spec.
from cmark.
The pandoc method has the drawback that if you re-order sections in your document, and some sections have the same name, links may break.
Right, but there is no perfect solution to this. Writers wanting to make sure such links don't break could use the "custom attribute syntax" once it has been agreed upon :)
from cmark.
GitHub generates unique IDs and anchors for CommonMark headings. How do they do it?
See:
- github/markup#904
- https://talk.commonmark.org/t/feature-request-automatically-generated-ids-for-headers/115
from cmark.
It is done in a post-processing step, similarly to how we linkify @mentions (like @westurner), add issue references (#78), etc. I can detail the (very simple) algorithm for generating the anchors if you like, but in short, it's not related to CommonMark processing.
from cmark.
A table of contents is a presentation issue. Adding a syntax to inject a ToC adds little information to the document that cannot already be derived from the document headings. Take a look at KeenWrite's PDF themes:
https://github.com/DaveJarvis/keenwrite/blob/master/docs/screenshots.md#pdf-themes
In particular, look at the upper-right corner of the following image:
https://raw.githubusercontent.com/DaveJarvis/keenwrite/master/docs/images/screenshots/08.png
That ToC (in green) was generated by first converting Markdown to XHTML using flexmark-java, then the XHTML was typeset using ConTeXt. The ConTeXt typesetting engine provides control over the ToC colours, number of levels, leader dots, font sizes, location in the document, etc.
pandoc has the same functionality for generating a ToC in HTML pages by passing the --toc
command-line option.
IMO, GitHub needs to add an externally defined configuration file that instructs its Markdown parser how to generate the corresponding HTML output. This could include exporting a ToC, tweaking the heading level depth, and define variables that are interpolated.
Consider a file named .config.yaml
:
---
meta:
toc:
insert: true
depth: 3
application:
name: My Super App
version: 1.2.3
Alongside the following README.md
:
# {{application.name}}
Changes to version {{application.version}} include:
* Bug fix
* Feature creep
Would render as:
My Super App
Changes to version 1.2.3 include:
- Bug fix
- Feature creep
What would be great to standardize is the configuration file syntax so that pandoc, GitHub, and Markdown renderers/editors could all parse the same standard metadata.
By updating .config.yaml
as part of the process it ensures that the application name, version, and other build-related information have a single source of truth.
from cmark.
from cmark.
Related Issues (20)
- Link reference definition title that should not be
- `cmark -t commonmark` makes little sense HOT 3
- libcmark uses full version number as major dylib version number HOT 3
- HTML comments do not follow the 0.30 spec
- Make `CMARK_OPT_UNSAFE` settable during runtime HOT 4
- Cannot compile with `-ftest-coverage` HOT 4
- Part of the code is under GPL2 HOT 5
- `make mingw` should forward/use CC, CXX, HOST variables if set HOT 2
- How to support a link with space? HOT 4
- Please provide working .a/.lib files HOT 1
- why does this HTML block start and end on the same line? HOT 3
- Keep copyright information in source files
- Quadratic behavior when parsing smart quotes HOT 4
- Severe performance regression HOT 3
- Windows compilation using MSVS does not seem to work HOT 12
- HTML declaration blocks do not follow spec 0.30 HOT 1
- api_test failure HOT 9
- [iOS] Does this SDK need a Privacy Manifest? HOT 2
- Example wrappers leak memory HOT 2
- U+FFFE and U+FFFF encoded wrongly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmark.