Giter Site home page Giter Site logo

Comments (23)

stenskjaer avatar stenskjaer commented on August 27, 2024

This was a bit challenging (as #32 is going to be). But I have pushed a suggestion to branch "issue-31". Does that work as you would expect?

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Working on the testcases I can come up with, but some tests fail to compile because of they necessarily produce edtext within sameword. So I am not totally sure yet.

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

In a recent branch of reledmac (maieul/ledmac#767) this does not give compilation errors. There is still some uncertainty about how the numbering should be done. But soon the solution to the compilation error should be released, and then we'll close this, if it looks as expected.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Unfortunately there still seem to be issues with the basic functionality, though...

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
aa bb 
\edtext{cc 
	\edtext{%
			\edtext{aa}{\Afootnote{AA \emph{X}}}
		bb}%
		{\Afootnote{BB AA \emph{Y}}}%
		}%
	{\lemma{cc–bb}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

gives

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{\sameword{aa} \sameword{bb}} 
\edtext{cc 
	\edtext{%
			\edtext{\sameword[3]{\sameword[2]{aa}}{\Afootnote{AA \emph{X}}}
		\sameword[1]{bb}}}%
		{\Afootnote{BB AA \emph{Y}}}%
		}%
	{\lemma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

which breaks the reledmac-compilation because of wrong bracketing:

                \par 
l.29 ...mma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}
                                                  .
? 
Runaway argument?
./min1SW.tex:29: Paragraph ended before \edtext was complete.
<to be read again> 
                   \par 
l.29 ...mma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}
                                                  .
? 
./min1SW.tex:31: Argument of \edtext has an extra }.
<inserted text> 
                \par 
l.31 \endnumbering
                  
? 
Runaway argument?
./min1SW.tex:31: Paragraph ended before \edtext was complete.
<to be read again> 
                   \par 
l.31 \endnumbering
                  
? 

When the bracket from behind \sameword[1]{bb}} is moved to behind {\sameword[2]{aa}, i.e.

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{\sameword{aa} \sameword{bb}} 
\edtext{cc 
	\edtext{%
			\edtext{\sameword[3]{\sameword[2]{aa}}}{\Afootnote{AA \emph{X}}}
		\sameword[1]{bb}}%
		{\Afootnote{BB AA \emph{Y}}}%
		}%
	{\lemma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

it compiles but it renders "aa³³". (The double numbers are discussed elsewhere.)

But I am wondering whether leaving out \sameword[3] might be the simple solution here? Or do I get this wrong?

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

Ahh yes, I see. This is a problem, as actually I think it should just be \sameword[2,3]{aa}.

I will get back to these issues as soon as possible, hopefully during this week (travelling and work tends to get in the way of moonlight projects like this, unfortunately).

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Of course! I am very well aware that this project is something on the side which probably already has cost you way more time than you ever thought it would.

The more I'd appreciate if you could find the time to iron out the (final?) bugs!

(I have this gigantic file I need to annotate very soon and if I am to do this by hand I won't do anything else all summer~…)

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

Okay, the problem is that this should be annotated as follows (according to my view and logic):

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{aa bb}
\edtext{
  cc
	\edtext{%
    \sameword[2]{\edtext{\sameword[3]{aa}}{\Afootnote{AA \emph{X}}}
      bb}}%
  {\Afootnote{BB AA \emph{Y}}}%
}%
{\lemma{cc–bb}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

But with this annotation it will not make any sameword counting in reledmac. So I will write an issue about that.

EDIT: Before it will work in reledmac (or we know another way of doing it) I won't implment it, as it is not completely trivial. So it doesn't make sense to spend too much time on it before we know what reledmac will expect. Sorry for the temporarily broken annotations that this may result in :/

from samewords.

floriandk avatar floriandk commented on August 27, 2024

After some brooding and experimenting, I'd suggest just to let multiword-samewords be, at least for the time being. Cf my comment on ledmac/issues/771

Also (if I understand the mechanism correctly} you could generally leave out all the [X] from \sameword[X]{word} if this particular word isn't reused i a \lemma{word…}
As it is now samewords adds lots of [X]es even to a tex-file completely without \lemmas.

(For the example above we also need \sameword{bb} within the \lemma{…} -- samewords does this, so that's obviously only a typo in the example)

If we disregard the "doubles" I think the correct markup for the example above should be:

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{aa} \sameword{bb}
\edtext{%
  cc
  \edtext{%
    \edtext{\sameword{aa}}{\Afootnote{AA \emph{X}}}
    \sameword[1]{bb}}%
  {\Afootnote{BB AA \emph{Y}}}}%
  {\lemma{cc–\sameword{bb}}%
\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

Hmm... I would be sorry to give up the multi word annotations completely, as it already words in a lot of cases, and there also have been made changes to reledmac to facilitate it. So we are already more than halfway there.

But I see that this particular case, where your have nested critical notes inside the sameword annotations, it gives problems.

I would love for there to be a solution to this, but if it is too complicated at the reledmac end then of course it is hard to solve.

But how about this ("temporary", I whisper to my self, hopefully) solution: Whenever the script registers that it runs into a case of this particular type with a multiword with a nested critical note, it reverts to single word annotation.
Could that be a way to make compilable editions that, at least for now, can give you usable, if not perfect, disambiguation?

I think I'll make an issue about enabling or disabling multiword annotations. If you have opinions about that, please let me know.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Don't get me wrong: I'd love this to work automatically!
I just see that Maïeul already has a lot of requests on his table and in the end this here is a rare case that will will be disambiguated correctly, though unelegantly, if we stick to the single word level. As it is now the files won't compile without manual fixing.

So I am all for the fallback that you describe
Alternatively you could perhaps consider offering single word markup as default for the time being and put the wordpair processing into a development branch until reledmac can handle all cases correctly.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Also (if I understand the mechanism correctly} you could generally leave out all the [X] from \sameword[X]{word} if this particular word isn't reused i a \lemma{word…}
As it is now samewords adds lots of [X]es even to a tex-file completely without \lemmas.

… it only just occured to me that always adding the [X] regardless of the existence of a \lemma probably is the easy way, isn't it?
So just forget about this comment…

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

Just a quick note: I think you are right that the multiword lemmas gives more problems than we (read: I) had foreseen. It opens up for lots of overlapping problems as the sameword matches can go across edtext boundaries.

So I'm thinking about a suggestion for a labelling approach (like we do in so many other cases where stuff overlaps). I'll sleep on it a couple of days and maybe present the idea to Maieul and see if he thinks it is a better solution from his reledmac side of the table.

And I agree that for now focusing on multiple single word annotations is maybe a more safe way to go. With a bit of luck we can move to that solution by the end of the weekend where I'll have some hours to work on this.

EDIT: And yeah, you're right about the level numbering, it is easier just always to write it explicitly. I understand it seems too much sometimes, but it's no harm when it's there. I may think about changing it at a later point, but it has a lower priority for me.

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

I have switched the default annotation to be by single word in the branch "issue-31". There is currently no way to try the multiword annotation, as there are still problems with that. But there is nothing in the way of giving the single word annotation a try while I think about the multiword stuff.

So if you should have the chance at some point of checking whether the problems about overlapping entries like in this one, it would be good:

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{\sameword{aa} \sameword{bb}} 
\edtext{cc 
	\edtext{%
			\edtext{\sameword[3]{\sameword[2]{aa}}}{\Afootnote{AA \emph{X}}}
		\sameword[1]{bb}}%
		{\Afootnote{BB AA \emph{Y}}}%
		}%
	{\lemma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

For the multiword annotation my idea is to run multiword annotation, then have a validation to check for problems with the result, and in case there are problems, to switch to single word annotation for a relevant section of the text and warn the user about it. But the details of that is still giving me some problems.

But if we can just get the single word annotation working, that will be a start (even though I'm not completely happy with the result that you get in the apparatus).

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Thank you very much for this!
I've both tried different test-cases and my huge project and even the latter can now be annotated automatically -- for a huge part at least.
I am still chasing one hang in the second part of the file though and will put together a MnWE as soon as I can.

(There are also still some weird error-messages coming up when compiling the result with reledmac but the pdf looks correct as far as I can see. So this is probably a reledmac-glitch which I also hope to identify soon.)

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

I'm happy that you like the progress.

I have been working more on it to address some of the problems of multiword annotations. I have just pushed a couple of changes to the same branch that should make it possible to make some cases of multiword wrapping work much better (including the original problem).

The original problem is now (edit from "not") annotated as follows (I removed a linebreak from the original MWE for testing purposes):

\documentclass{article}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\sameword{\sameword{aa} \sameword{bb}}
\edtext{
  cc
  \edtext{\sameword[2]{\edtext{\sameword[3]{aa}}{\Afootnote{AA \emph{X}}}
    \sameword[1]{bb}}
  }%
  {\Afootnote{BB AA \emph{Y}}}%
}%
{\lemma{cc–\sameword{bb}}\Afootnote{\emph{Ø}}}.
\pend
\endnumbering

\end{document}

I believe this to be correct, and reflect the structure of the critical notes. This compiles (yuhuu), but the annotation is still the double annotation that I believe is unsatisfactory if a user opts for multiword annotation. But as we already know, that is a reledmac issue, rather than an issue with this script.

This does however not address the cases where there is no possible way to annotate it with a multiword annotation that will produce a compilable output. Such are cases where a \sameword{} annotation should begin before and end inside another \edtext{}{} macro, or vice versa.

Next step is to make some sort of exit-and-repair strategy for those cases.

EDIT: I am of course also curious (anxious?) to see the problems that still remain once you manage to isolate them.

EDIT 2: The branch is not ready yet. It gives problems in other contexts that I completely ignored. Apologies.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

For now I'll keep working with the version from this morning that only does single words.

Here I have a funny one for you:

\documentclass[a5paper]{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\edtext{aa
bb
cc 
dd
ee
ff
gg
hh
\edtext{}{\xxref{mylabel-start}{mylabel-end}\lemma{ii–line}\Aendnote{A}}\edlabel{mylabel-start}ii
jjjjjjjjj}{\lemma{aa–jjjjjjjjj}\Aendnote{B}} some more text to reach the next line\edlabel{mylabel-end} ii.
\pend
\endnumbering
\doendnotes{A}

\end{document}

gives

\documentclass[a5paper]{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
\edtext{aa
bb
cc 
dd
ee
ff
gg
hh
\edtext{\sameword[2]{}}{\xxref{mylabel-start}{mylabel-end}\lemma{\sameword{ii}–line}\Aendnote{A}}\edlabel{mylabel-start}ii
jjjjjjjjj}{\lemma{aa–jjjjjjjjj}\Aendnote{B}} some more text to reach the next line\edlabel{mylabel-end} \sameword{ii}.
\pend
\endnumbering
\doendnotes{A}

\end{document}

Compiling stops with:

16: Missing number, treated as zero.
<to be read again> 
                   }
l.16 {ii}–line}{A}{A}{}{L}{edtxt@2}
                                     %
? s

but produces the expected result when I ignore the error. Weirdly enough the error goes away if the empty annotation is changed manually to \edtext{\sameword[1]{}} or when the xxref-span lies within the same line (eg compile with a4paper).
I suppose this is easy to fix, as I cannot see any reason to ever annotate an empty \edtext{}.

Perhaps you could also address the problem that the "ii" directly after the first \edlabel{mylabel-start} doesn't get annotated. Just adding \edlabel to the exclude_macros doesn't seem to do the trick.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Bummer, that's not all there's to it… Removing the empty \samewords makes most of the errors go away. But some constellations keep being problematic. Eg:

\documentclass[a5paper]{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
we need some more text to reach the second line again
\edtext{lorem
ipsum
\edtext{}{\xxref{start}{end}\lemma{and–elit}\Afootnote{X}}\edlabel{start}and}{\lemma{lorem–and}\Afootnote{Ø}}
and dolor sit amet, consectetuer adipiscing elit\edlabel{end}.
\pend
\endnumbering
\doendnotes{A}

\end{document}

->

\documentclass[a5paper]{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
we need some more text to reach the second line again
\edtext{\sameword[1]{lorem}
ipsum
\edtext{\sameword[2]{}}{\xxref{start}{end}\lemma{\sameword{and}–elit}\Afootnote{X}}\edlabel{start}and}{\lemma{lorem–\sameword{and}}\Afootnote{Ø}}
\sameword{and} dolor sit amet, consectetuer adipiscing elit\edlabel{end}.
\pend
\endnumbering
\doendnotes{A}

\end{document}

Even after removing \sameword[2]{} I still get:

15: Missing number, treated as zero.
<to be read again> 
                   }
l.15 \endnumbering

with superfluous \sameword[1]{lorem} and missing \edlabel{start}\sameword[1]{and}. Here is seems to be the missing markup of "and" that bothers reledmac; adding it will at least make the error-message go away.

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

Okay, but it seems that those problems are primarily caused by the overlapping apparatuses and use of \xref{}. I have merged the solutions from the branch on issue #32 into the current tip of branch "issue-31", where you can try it again.

Basically the empty \edtext{}{} in connection with the \xref is a part of the problem, and now it will raise a warning in the cases where it thinks that it is relevant.

But because the overlapping does not detect samewords properly, as we have already discussed in #32, this is only a partial solution as of now.

(There is no need to worry about the still existing problems with multiword annotation, because that is not enabled by default.)

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Great! I just updated it and will have a close look at it.

One strange break-off from the earlier versions is still there unfortunately:

\beginnumbering
\pstart
\edtext{}{\lemma{arbitrary text}\Aendnote{A}}
\somemacro{something}%
\edtext{}{\xxref{start}{end}\lemma{first–last}\Aendnote{}}\edlabel{start}\edtext{first
other
last\edlabel{end}
more}{\lemma{first–more}\Aendnote{A}}
\pend
\endnumbering

will break off with

Traceback (most recent call last):
  File "~/sameword-test/samewords-issue-31/samewords/tokenize.py", line 482, in _tokenize
    open_idx = self._stack_bracket[-1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/samewords", line 11, in <module>
    load_entry_point('samewords', 'console_scripts', 'samewords')()
  File "~/sameword-test/samewords-issue-31/samewords/cli.py", line 110, in main
    print(samewords.core.process_document(filename, procedure))
  File "~/sameword-test/samewords-issue-31/samewords/core.py", line 32, in process_document
    for par in chunk_pars(chunk)])
  File "~/sameword-test/samewords-issue-31/samewords/core.py", line 32, in <listcomp>
    for par in chunk_pars(chunk)])
  File "~/sameword-test/samewords-issue-31/samewords/core.py", line 10, in run_annotation
    tokenization = Tokenizer(input_text)
  File "~/sameword-test/samewords-issue-31/samewords/tokenize.py", line 362, in __init__
    self.wordlist = self._wordlist()
  File "~/sameword-test/samewords-issue-31/samewords/tokenize.py", line 372, in _wordlist
    word, pos = self._tokenize(self.data, pos)
  File "~/sameword-test/samewords-issue-31/samewords/tokenize.py", line 489, in _tokenize
    word.close_macro(0)
  File "~/sameword-test/samewords-issue-31/samewords/tokenize.py", line 179, in close_macro
    raise IndexError('The word does not have any open macros.')
IndexError: The word does not have any open macros.

It doesn't seem to matter whether the \somemacro is excluded in the .json file or not.

Removing \somemacro{something}% will make it run through, as will adding a space/linebreak behind it.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

Okay, but it seems that those problems are primarily caused by the overlapping apparatuses and use of \xref{}. I have merged the solutions from the branch on issue #32 into the current tip of branch "issue-31", where you can try it again.

Basically the empty \edtext{}{} in connection with the \xref is a part of the problem, and now it will raise a warning in the cases where it thinks that it is relevant.

But because the overlapping does not detect samewords properly, as we have already discussed in #32, this is only a partial solution as of now.

Yes, all I have found until now are related to \xxref. I don't have any illusions that they will be marked up correctly before Maïeul will propose a new system in due course.
I just hope for samewords not to be thrown off by them, so I can flag and, if necessary, number the occurrences manually at a late proofing stage.

from samewords.

floriandk avatar floriandk commented on August 27, 2024

The issues raised in #31 (comment) and #31 (comment) work fine with the updated branch-31

As for #31 (comment) I have to excuse for

It doesn't seem to matter whether the \somemacro is excluded in the .json file or not.

This was wrong -- adding \somemacro to the exclude_macros does indeed make it run through, I had only specified a wrong path to the .json-file. (But perhaps you could at some point add an error-message for a missing specified config-file to help people like me with bad file organisation…?)

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

I have added a error when the config file does not exist. It is in the current "issue-31" branch, and should go into the next release. I'm not sure you need to test it, but if you have a test of the tip running, you could always give it a try.

More generally: I feel that I am getting ready to finish the next release and close these issues concerning the nested multiword and singleword annotation. I feel that they are mostly solved with the current progress. There are still the problems with the handling in reledmac of xxrefs and the presentation of some cases of multiword annotations, but I don't think I can do much about that from my end.

I'll create a new issue to handle the impossible overlapping cases separately, and I'm not sure that will make it into the upcoming version.

from samewords.

stenskjaer avatar stenskjaer commented on August 27, 2024

I have merged the changes into master and released 0.5. I'm closing this for now.

from samewords.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.