policystat / ckeditor-spell-check-plugin-js-dev-edge Goto Github PK

This project forked from nanospell/ckeditor-spell-check-plugin-js-dev-edge

0.0 0.0 0.0 999 KB

User interface enhancements and beta features for http://ckeditor-spellcheck.nanospell.com . This repo is for nanospell developers and trusted partners to modify and fork plugin.js

License: Other

HTML 4.36% JavaScript 56.72% CSS 0.52% ASP 2.43% PHP 35.97% ApacheConf 0.01%

ckeditor-spell-check-plugin-js-dev-edge's People

Contributors

Watchers

ckeditor-spell-check-plugin-js-dev-edge's Issues

Short circuit the many empty nodes case

Something crazy happened and we ended up with thousands of p tags with a nonbreaking space inside. This causes the document to take upwards of 4 minutes to be responsive after loading. Turning off the spellcheck reduces this wait time to about 10 seconds.

Unfortunately the resulting Chrome timeline causes Chrome debugger to crash. Need to spend some amount of time reproducing this.

Look into ways to short circuit the spellcheck. Perhaps it can use element html to exclude empty nodes from the range that is walked, or something.

Reuse the selection bookmark when doing batched scan/render

Bookmarking a selection actually can be quite expensive, because we have to do it in the DOM-inserty way. This gets extra bad when doing the initial batched operation as well as any other large spellcheck jobs.

getWords and render should support a parameter that tells it a bookmark already exists.

The bookmarking can also be moved up the call tree (basically the nearest spot that lets it cover loops of calls to those functions)

Spellcheck spans are modifying document HTML

Under some common condition (it's happened in 3 out of 3 edited PolicyStat documents I've checked so far), spell check modifies the HTML, even if the user doesn't use the spellcheck suggestions.

The goal should be to not alter HTML structure. My guess is that the structure is also altered when you accept a suggestion, but I didn't test that.

Some examples of HTML before editor load and then the resulting HTML after the document was saved:

First word spelling suggestion

Wes Q3
Wes Q3

Last word spelling suggestion

<a href="https://policystat.zendesk.com">Site to Site VPN</a>
<a href="https://policystat.zendesk.com">Site to Site </a><a href="https://policystat.zendesk.com">VPN</a>

First word again, with `a` tag

<a href="support.policystat.com">PolicyStat Learning Center</a>
<a href="support.policystat.com">PolicyStat</a><a href="support.policystat.com"> Learning Center</a>

Change some of the event handlers into commands

Mainly the marking function should probably be a CKEditor Command. It won't be called by the user, but this allows for better control over undo-ability.

Minimize DOM manipulations by not clearing spellcheck spans

Currently, each spellcheck cycle clears all the spellchecking spans under the element being marked. For large typo-heavy blocks, this is super inefficient.

Suppose you have a paragraph with 100 typos in it. Each time spellcheck is called on that block (for example, for each new word), all 100 spans are removed and their text nodes put back into the paragraph. Then the paragraph is scanned for any new words. AJAX call is made if required, then all typos are once again wrapped in spans.

Some ideas:

the word scanning should ignore words that are already in spellcheck spans
the word marking should not re-mark words that are already in spellcheck spans

Partial words being marked as typos

I did some editing today and noticed that on at least two occasions, spellcheck marked a word as a typo because it chopped my word. I'm working on the exact reproduction steps, but it was while I was just typing up new content and using my keyboard to navigate around and edit it.

Example 1

<p><a data-cke-saved-href="https://en.wikipedia.org/wiki/Technical_debt" href="https://en.wikipedia.org/wiki/Technical_debt">Technical Debt</a>&nbsp;is a metaphor used to describe areas where we've consciously&nbsp;chosen to trade off the short term versus long term by building something <span class="nanospell-typo"></span><span class="nanospell-typo"><span class="nanospell-typo">qui</span>ckly</span> instead of in the best way. Just like using a credit card to pay for <span class="nanospell-typo"></span><span class="nanospell-typo"><span class="nanospell-typo">thi</span>ngs</span> in our personal life, it can be a smart strategy as long as we keep it under control.</p>

"quickly" was split into "qui" and "ckly"
"things was split into "thi" and "ngs"

Example 2

<p>The flex engineer helps to <span class="nanospell-typo">kee</span>p this debt under control by taking projects to improve the development process, upgrade libraries, refactor code, and optimize performance.</p>

"keep" was split into "kee" and "p", but "p" wasn't marked as a typo.

Parallelize initial spellcheck

Although #5 improved performance while actually working on the document, it did not solve the initial load. This uses markAllTypos which calls markTypos in sequence on each block.

This entire sequence for the initial load is as follows:

Plugin is loaded and initialized
startSpellCheckTimer is called with a null root element
This checks if there is a current timer running (self._timer). because there can't be, it calls checkNow() which sets up some locks and on to the next thing
START_SCAN_WORDS/scanWords on the entire document
START_CHECK_WORDS / rpc call is fired with the unique words
START_MARK_TYPOS / markTypos across the entire document

All the events/functions in steps 4-6 are already set up to be able to be run only under a given root element.

One thing that could help responsiveness is to also use this parallelization for the initial load. Instead of just getting stuck waiting for full document scans, AJAX call with all the words, and full document typo marking, the initial load could simply find all the paragraph blocks, and do steps 4-6 for each one.

The tricky part (which would also fix some other issues) is that currently, the "lock" on spellcheck is global. If one scan/check/mark spellcheck cycle is running, no other spellcheck cycles can run.

Things to do:

spellcheck "lock" should be per "paragraph-like" block
initial run should call spellcheck on all found "paragraph-like" blocks asynchronously, in parallel

Some things to worry about:

what if a block is moved/removed while things are running?

Batch spellcheck XHR requests, at least for the initial load

scanning for words, and marking blocks should still be per-block.

But the in between step of doing the XHR check should be batched.

When for example, 5 blocks have been scanned, do the XHR to check the words. When this comes back, mark the blocks that were included in the XHR request.

have a way of resetting the plugin for testing

There isn't a great way of resetting the plugin state, which hurts testability.

For example, if two tests have identical starting HTML, the second test won't make an ajax call, because it will already have suggestions in cache.

Reuse range iterator across renders (or more efficiently batch renders together)

Huge part of current cost is that the first getNextRange() call on a range iterator sucks: (see the comment on first iteration)

http://docs.ckeditor.com/source/rangelist.html#CKEDITOR-dom-rangeListIterator-method-getNextRange

We pass the range iterator an array of ranges indicating the bad words in the current block. This means that in cases where the block has 1 typo are as slow to mark as blocks that have 99 typos.

some numbers:

in a typical actual-paragraph, the first getNextRange() takes 18ms due to the setup step. Subsequent calls to getNextRange() for additional words are 0.13ms or so.

This also explains why tables can be so stupid:

Suppose you have a table of 1 acronym for cell, each cell will create a new iterator (because each cell is a paragraph.) This is stupid.

Contractions being marked as typos

It seems like most (all?) contractions are being marked as typos. All of these examples were in my document on initial load, so it's not any kind of interaction issue.

doesn't

<li><strong>Wait a month</strong>: It probably <span class="nanospell-typo">doesn</span>'t make sense for us to wait a month to prioritize that typo fix as part of an <span class="nanospell-typo">OKR</span> cycle. We want to be more responsive than that.</li>

shouldn't

<p>For all other product enhancements (which includes bug fixes), the best way to <strong>increase the visibility is to increase the number of votes the idea has in UserVoice</strong>. The best way to do this is to record votes for any customers you speak with who express interest in your idea. You <span class="nanospell-typo">shouldn</span>'t be afraid to bring it up with them while talking about something else.&nbsp;You can also campaign internally to get other team members to vote on it.</p>

aren't

<p>Votes <span class="nanospell-typo">aren</span>'t a perfect system, but they're a concrete way of quantifying the scope of an idea's impact. It does have some distinct advantages over fuzzier systems.</p>

Words get mashed together when being walked, if the words are around a nested list

The nested list doesn't correctly separate words,

therefore

            '<ul>' +
                '<li>foo' +
                    '<ol>' +
                        '<li>bar</li>' +
                    '</ol>' +
                'baz</li>' +
            '</ul>'

returns "foobaz" and "bar" through the wordwalker.

https://github.com/PolicyStat/ckeditor-spell-check-plugin-js-dev-edge/blob/pstat_rewrite/tests/typowalking/wordwalker.js#L138

Stop creating bookmarks when touching blocks that the selection is not in

I was talking through this problem and then realized this is horrible to do. This must be leftover cruft from the initial "touch the whole document" version.

Handle LocalStorage unavailability properly

Some of our customers have LocalStorage disabled as an IT policy, which results in a trace like this:

Error: Access is denied.

   at hasPersonal (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:839:4)
   at validWordToken (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:876:4)
   at getUnknownWords (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:616:6)
   at scanWordsInRange (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:643:6)
   at scanWords (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:411:5)
   at h (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:2803)
   at Anonymous function (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:3675)
   at CKEDITOR.editor.prototype.fire (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:4343)
   at checkNow (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:401:6)

make editor more responsive when marking typos

Span reuse issues - nbsps being marked and occasional nested spans

Not sure what is happening yet, just documenting.

From playing around, typing & pausing:

<p><span class="nanospell-typo">zxp</span><span class="nanospell-typo">dpasda</span><span class="nanospell-typo"> <span class="nanospell-typo">clzcoxzkoad</span>&nbsp;<span class="nanospell-typo"><span class="nanospell-typo">dsaiajkdis</span>&nbsp;zcxxzp</span>&nbsp;dsadas</span><span class="nanospell-typo">&nbsp;cxziczi</span><span class="nanospell-typo">&nbsp;cpzcxxzoa</span>&nbsp;&nbsp;</p>

Some of the typos have leading stuck into them, so they will never have suggestions.

Other typos somehow end up nested inside an outer typo. Rightclick handler goes first to the outer typo which won't have any suggestions.

Fix the backwards events tests

Stumbled into this when working on #39 - our event observer actually fires backwards (and our tests catch things in reverse order) because the event priority is wrong. Kinda.

Selection is not stable in IE11 (probably due to bookmarks)

Creating a bookmark in IE11 modifies the selection ranges.

In the render method, before going in and marking spans, we create a bookmark at the current selection. This causes the really simple "ignore the current selection" code in the Wordwalker to fail.

There are a few ways out of this:

Really optimistically: because we no longer care about marking the current word, we no longer need to set bookmarks in order for the cursor to not move (very unlikely this actually works, but worth trying)
get the selection before creating the bookmark, since we only care about it for the purposes of marking the text node that we don't care about. this may still not work since the bookmark might do something bad.
take another pass at the "find the currently selected word" code. It should also ignore empty trailing children instead of just always looking at the last child. (I am most sure that this will work, though it feels the most hacky)

Break elements in list items don't count as word boundaries

For example:

<ol class="list-lower-alpha">
                <li>Surrogate may not consent to:<br />
                <br />
                Voluntary inpatient mental health services<br />
                Electroconvulsive therapy<br />
                Appointment of another surrogate decision maker.</li>
            </ol>

a spellcheck span instead wraps serviceselectroconvulsive and therapyAppointment

One enormous block
Multiple blocks with identical content (there is a bug, somewhere, and the words get re-checked)
many tiny blocks (it should batch by number of words, not number of blocks, but we were in a hurry at the time and that fixed most real world cases.)