Giter Site home page Giter Site logo

Comments (19)

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024 1

I was able to reproduce this issue in #3027. Indeed, the GitHub in place editor replaces all POSIX LF newlines by CR LF.

I wonder whether this is somehow caused by the addition of pre-commit. I'll try with a repository that doesn't have pre-commit enabled.

from codespell.

peternewman avatar peternewman commented on June 26, 2024

Worse, this means you can't review and make or merge suggested changes because the diff is now too large. 😒

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

Ah, I thought this was somehow because the dictionary file had grown too large, but yes, GitHub complains the diff is too large (50,000 lines). I was wondering where this comes from.

Which files are being changed by pre-commit? 50,000 lines means it's the dictionary file, but then why does it mess with the dictionary file?

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

Is 174a5bf just about whitespace changes? How can a couple of whitespace changes result in a diff of 50,000 lines? I'm lost.

from codespell.

peternewman avatar peternewman commented on June 26, 2024

I've not looked properly, but I suspect it's changing all the line endings back and forth.

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

To/from DOS/POSIX line endings? But then it would be an effect of mixed-line-ending, not trailing-whitespace.

It's still a mystery to me what triggers the massive diffs. Does the GitHub editor write files with DOS endings?

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

In the short term, the --fix=no option of mixed-line-ending might work around these back & forth line endings changes.

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

It looks like the editor you are using in GitHub changes the newlines in all of dictionary.txt.

  • I am at a loss why it does that, a Google search comes up with nothing.
  • I am at a loss why we hadn't noticed that before.

from codespell.

peternewman avatar peternewman commented on June 26, 2024

I wonder if something has changed, or it's just that pre-commit is changing them to say LF, whereas GitHub generates CRLF, and most other editors won't explicitly change things, so it never used to have an impact.

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

Some of your commits change all newlines to CR LF. For example, after 95a2fbc, all lines end with \r\n:

$ git checkout peternewman-obsloete 
$ 
$ git checkout 95a2fbc985065cff6f47c982e82355bb22d885b4
$ 
$ od -c dictionary.txt
0000000   1   n   d   -   >   1   s   t  \r  \n   2   r   d   -   >   2
0000020   n   d  \r  \n   2   s   t   -   >   2   n   d  \r  \n   3   n
0000040   d   -   >   3   r   d  \r  \n   3   r   t   -   >   3   r   d
0000060  \r  \n   3   s   t   -   >   3   r   d  \r  \n   4   r   d   -
0000100   >   4   t   h  \r  \n   _   _   a   t   t   r   i   b   y   t
[...]

I suspect this is new behaviour. I mean, either we would have CR LF in all files by now, or we would have seen similar diffs of 50,000 lines in the past. The question is why now? Does pre-commit somehow affect the GitHub editor? Was this specific commit really created using the GitHub editor?

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

Which of the two following options did you choose to edit the dictionary?

Edit

from codespell.

peternewman avatar peternewman commented on June 26, 2024

They were all edit in place. I used find (and maybe replace), which is a fairly new feature in the basic editor, but that was about all of note.

from codespell.

peternewman avatar peternewman commented on June 26, 2024

Well I'm glad I'm not imagining it!

I wonder if it could be that maybe the first run of pre-commit fixed all the line endings to be consistent and the presumed auto-mode behaves differently when the file is consistent. Or worse it's clever enough to realise I'm editing on Windows and behave accordingly!

We could try something like this to fix it (if that editor honours it):
https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

The newlines in dictionaries had always been a simple LF. I checked old commits to make sure.

I have already run dos2unix on dictionaries, the dictionaries haven't changed. This means all newlines are standard POSIX LF newlines.

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

This is a bug of the GitHub in place editor. What appears to trigger it is not pre-commit, but the size/contents of the modified file.

The size of file dictionary.txt increased considerably in the last few weeks. Between 8d0d82b and current master branch:

  • 813 β†’1,129 KB
  • 37,406 β†’51,304 lines

I suspect the size or number of lines triggered some bug, perhaps a variable overflow:

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

The bug is triggered when adding just 1 byte (and no new lines) to a file of size > 1 MB .
It is not triggered when adding 1 byte (or more) to a file of size <= 1 MB.

Tried with these two files:

I guess this clears things up: this really is a bug of the GitHub in place editor. Does this mean we should split dictionary.txt into chunks < 1 MB? I would recommend we leave it as is for now. I have notified GitHub about the bug.

from codespell.

peternewman avatar peternewman commented on June 26, 2024

I guess this clears things up: this really is a bug of the GitHub in place editor. Does this mean we should split dictionary.txt into chunks < 1 MB? I would recommend we leave it as is for now.

That's some excellent digging! Can't we fix this in the short term by adding a pre-commit or repo .gitattributes to force it back to LF, which will at least mean the overall diff of a PR will be small, even if the changes per commit are significant?

I have notified GitHub about the bug.

Excellent, is that something public you can link to, or a private report option they have?

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

We already have a pre-commit hook that forces back to LF (hence the unreadable diffs). I have tried a .gitattributes file that would force LF, but it doesn't seem to be taken into account by GitHub.

Unfortunately, the ticket I managed to create appears to be private – makes sense since GitHub is not open source.

from codespell.

DimitriPapadopoulos avatar DimitriPapadopoulos commented on June 26, 2024

This is a know GitHub bug when editing files > 1 MB. it had already been reported to them. They will probably fix it in time.

Also, see #1275 about breaking up dictionary.txt into separate files, which might result in files <= 1 MB.

@peternewman Can we close this ticket now that it is clear this is a GutHub bug for files > 1 MB, unrelated to pre-comit?

from codespell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.