Giter Site home page Giter Site logo

policystat / ckeditor-spell-check-plugin-js-dev-edge Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nanospell/ckeditor-spell-check-plugin-js-dev-edge

0.0 0.0 0.0 999 KB

User interface enhancements and beta features for http://ckeditor-spellcheck.nanospell.com . This repo is for nanospell developers and trusted partners to modify and fork plugin.js

License: Other

HTML 4.36% JavaScript 56.72% CSS 0.52% ASP 2.43% PHP 35.97% ApacheConf 0.01%

ckeditor-spell-check-plugin-js-dev-edge's People

Contributors

caffodian avatar sbaughman avatar ztkemsley avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ckeditor-spell-check-plugin-js-dev-edge's Issues

Short circuit the many empty nodes case

Something crazy happened and we ended up with thousands of p tags with a nonbreaking space inside. This causes the document to take upwards of 4 minutes to be responsive after loading. Turning off the spellcheck reduces this wait time to about 10 seconds.

Unfortunately the resulting Chrome timeline causes Chrome debugger to crash. Need to spend some amount of time reproducing this.

Look into ways to short circuit the spellcheck. Perhaps it can use element html to exclude empty nodes from the range that is walked, or something.

Reuse the selection bookmark when doing batched scan/render

Bookmarking a selection actually can be quite expensive, because we have to do it in the DOM-inserty way. This gets extra bad when doing the initial batched operation as well as any other large spellcheck jobs.

getWords and render should support a parameter that tells it a bookmark already exists.

The bookmarking can also be moved up the call tree (basically the nearest spot that lets it cover loops of calls to those functions)

Spellcheck spans are modifying document HTML

Under some common condition (it's happened in 3 out of 3 edited PolicyStat documents I've checked so far), spell check modifies the HTML, even if the user doesn't use the spellcheck suggestions.

The goal should be to not alter HTML structure. My guess is that the structure is also altered when you accept a suggestion, but I didn't test that.

Some examples of HTML before editor load and then the resulting HTML after the document was saved:

First word spelling suggestion

  • <strong>Wes Q3</strong>
  • <strong>Wes</strong><strong> Q3</strong>

Last word spelling suggestion

  • <a href="https://policystat.zendesk.com">Site to Site VPN</a>
  • <a href="https://policystat.zendesk.com">Site to Site </a><a href="https://policystat.zendesk.com">VPN</a>

First word again, with a tag

  • <a href="support.policystat.com">PolicyStat Learning Center</a>
  • <a href="support.policystat.com">PolicyStat</a><a href="support.policystat.com"> Learning Center</a>

Minimize DOM manipulations by not clearing spellcheck spans

Currently, each spellcheck cycle clears all the spellchecking spans under the element being marked. For large typo-heavy blocks, this is super inefficient.

Suppose you have a paragraph with 100 typos in it. Each time spellcheck is called on that block (for example, for each new word), all 100 spans are removed and their text nodes put back into the paragraph. Then the paragraph is scanned for any new words. AJAX call is made if required, then all typos are once again wrapped in spans.

Some ideas:

  • the word scanning should ignore words that are already in spellcheck spans
  • the word marking should not re-mark words that are already in spellcheck spans

Partial words being marked as typos

I did some editing today and noticed that on at least two occasions, spellcheck marked a word as a typo because it chopped my word. I'm working on the exact reproduction steps, but it was while I was just typing up new content and using my keyboard to navigate around and edit it.

Example 1

<p><a data-cke-saved-href="https://en.wikipedia.org/wiki/Technical_debt" href="https://en.wikipedia.org/wiki/Technical_debt">Technical Debt</a>&nbsp;is a metaphor used to describe areas where we've consciously&nbsp;chosen to trade off the short term versus long term by building something <span class="nanospell-typo"></span><span class="nanospell-typo"><span class="nanospell-typo">qui</span>ckly</span> instead of in the best way. Just like using a credit card to pay for <span class="nanospell-typo"></span><span class="nanospell-typo"><span class="nanospell-typo">thi</span>ngs</span> in our personal life, it can be a smart strategy as long as we keep it under control.</p>
  • "quickly" was split into "qui" and "ckly"
  • "things was split into "thi" and "ngs"

Example 2

<p>The flex engineer helps to <span class="nanospell-typo">kee</span>p this debt under control by taking projects to improve the development process, upgrade libraries, refactor code, and optimize performance.</p>

"keep" was split into "kee" and "p", but "p" wasn't marked as a typo.

Parallelize initial spellcheck

Although #5 improved performance while actually working on the document, it did not solve the initial load. This uses markAllTypos which calls markTypos in sequence on each block.

This entire sequence for the initial load is as follows:

  1. Plugin is loaded and initialized
  2. startSpellCheckTimer is called with a null root element
  3. This checks if there is a current timer running (self._timer). because there can't be, it calls checkNow() which sets up some locks and on to the next thing
  4. START_SCAN_WORDS/scanWords on the entire document
  5. START_CHECK_WORDS / rpc call is fired with the unique words
  6. START_MARK_TYPOS / markTypos across the entire document

All the events/functions in steps 4-6 are already set up to be able to be run only under a given root element.

One thing that could help responsiveness is to also use this parallelization for the initial load. Instead of just getting stuck waiting for full document scans, AJAX call with all the words, and full document typo marking, the initial load could simply find all the paragraph blocks, and do steps 4-6 for each one.

The tricky part (which would also fix some other issues) is that currently, the "lock" on spellcheck is global. If one scan/check/mark spellcheck cycle is running, no other spellcheck cycles can run.

Things to do:

  • spellcheck "lock" should be per "paragraph-like" block
  • initial run should call spellcheck on all found "paragraph-like" blocks asynchronously, in parallel

Some things to worry about:

  • what if a block is moved/removed while things are running?

Batch spellcheck XHR requests, at least for the initial load

scanning for words, and marking blocks should still be per-block.

But the in between step of doing the XHR check should be batched.

When for example, 5 blocks have been scanned, do the XHR to check the words. When this comes back, mark the blocks that were included in the XHR request.

have a way of resetting the plugin for testing

There isn't a great way of resetting the plugin state, which hurts testability.

For example, if two tests have identical starting HTML, the second test won't make an ajax call, because it will already have suggestions in cache.

Reuse range iterator across renders (or more efficiently batch renders together)

Huge part of current cost is that the first getNextRange() call on a range iterator sucks: (see the comment on first iteration)

http://docs.ckeditor.com/source/rangelist.html#CKEDITOR-dom-rangeListIterator-method-getNextRange

We pass the range iterator an array of ranges indicating the bad words in the current block. This means that in cases where the block has 1 typo are as slow to mark as blocks that have 99 typos.

some numbers:

in a typical actual-paragraph, the first getNextRange() takes 18ms due to the setup step. Subsequent calls to getNextRange() for additional words are 0.13ms or so.

This also explains why tables can be so stupid:

Suppose you have a table of 1 acronym for cell, each cell will create a new iterator (because each cell is a paragraph.) This is stupid.

Contractions being marked as typos

It seems like most (all?) contractions are being marked as typos. All of these examples were in my document on initial load, so it's not any kind of interaction issue.

doesn't

<li><strong>Wait a month</strong>: It probably <span class="nanospell-typo">doesn</span>'t make sense for us to wait a month to prioritize that typo fix as part of an <span class="nanospell-typo">OKR</span> cycle. We want to be more responsive than that.</li>

shouldn't

<p>For all other product enhancements (which includes bug fixes), the best way to <strong>increase the visibility is to increase the number of votes the idea has in UserVoice</strong>. The best way to do this is to record votes for any customers you speak with who express interest in your idea. You <span class="nanospell-typo">shouldn</span>'t be afraid to bring it up with them while talking about something else.&nbsp;You can also campaign internally to get other team members to vote on it.</p>

aren't

<p>Votes <span class="nanospell-typo">aren</span>'t a perfect system, but they're a concrete way of quantifying the scope of an idea's impact. It does have some distinct advantages over fuzzier systems.</p>

Handle LocalStorage unavailability properly

Some of our customers have LocalStorage disabled as an IT policy, which results in a trace like this:

Error: Access is denied.

   at hasPersonal (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:839:4)
   at validWordToken (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:876:4)
   at getUnknownWords (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:616:6)
   at scanWordsInRange (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:643:6)
   at scanWords (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:411:5)
   at h (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:2803)
   at Anonymous function (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:3675)
   at CKEDITOR.editor.prototype.fire (https://d2zk9fgwitlpui.cloudfront.net/compress/js/624753d82da0.js:1256:4343)
   at checkNow (https://sh-absmc.policystat.com/site_media/lib/ckeditor-new/plugins/nanospell/plugin.js:401:6)

Span reuse issues - nbsps being marked and occasional nested spans

Not sure what is happening yet, just documenting.

From playing around, typing & pausing:

<p><span class="nanospell-typo">zxp</span><span class="nanospell-typo">dpasda</span><span class="nanospell-typo"> <span class="nanospell-typo">clzcoxzkoad</span>&nbsp;<span class="nanospell-typo"><span class="nanospell-typo">dsaiajkdis</span>&nbsp;zcxxzp</span>&nbsp;dsadas</span><span class="nanospell-typo">&nbsp;cxziczi</span><span class="nanospell-typo">&nbsp;cpzcxxzoa</span>&nbsp;&nbsp;</p>

Some of the typos have leading ย  stuck into them, so they will never have suggestions.

Other typos somehow end up nested inside an outer typo. Rightclick handler goes first to the outer typo which won't have any suggestions.

Fix the backwards events tests

Stumbled into this when working on #39 - our event observer actually fires backwards (and our tests catch things in reverse order) because the event priority is wrong. Kinda.

Selection is not stable in IE11 (probably due to bookmarks)

Creating a bookmark in IE11 modifies the selection ranges.

In the render method, before going in and marking spans, we create a bookmark at the current selection. This causes the really simple "ignore the current selection" code in the Wordwalker to fail.

There are a few ways out of this:

  1. Really optimistically: because we no longer care about marking the current word, we no longer need to set bookmarks in order for the cursor to not move (very unlikely this actually works, but worth trying)
  2. get the selection before creating the bookmark, since we only care about it for the purposes of marking the text node that we don't care about. this may still not work since the bookmark might do something bad.
  3. take another pass at the "find the currently selected word" code. It should also ignore empty trailing children instead of just always looking at the last child. (I am most sure that this will work, though it feels the most hacky)

Break elements in list items don't count as word boundaries

For example:

<ol class="list-lower-alpha">
                <li>Surrogate may not consent to:<br />
                <br />
                Voluntary inpatient mental health services<br />
                Electroconvulsive therapy<br />
                Appointment of another surrogate decision maker.</li>
            </ol>

a spellcheck span instead wraps serviceselectroconvulsive and therapyAppointment

Degenerate cases with initial spellcheck

Following cases are very bad:

  • One enormous block
  • Multiple blocks with identical content (there is a bug, somewhere, and the words get re-checked)
  • many tiny blocks (it should batch by number of words, not number of blocks, but we were in a hurry at the time and that fixed most real world cases.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.