dteviot,webtoepub

Comments (25)

dteviot commented on July 25, 2024

Thanks for the list of special cases.

On Wed, Jun 29, 2016 at 4:04 PM, dreamer2908 [email protected]
wrote:

Images can be embedded in B-T stories in form of inline images
instead of thumbnails. The result xhtml code will be (slightly) invalid if
WebToEpub encounters this type of images: div tag is inside p tag.
Example: All non-gallery images here: Utsuro no Hako:Volume 1
https://www.baka-tsuki.org/project/index.php?title=Utsuro_no_Hako:Volume_1
Result xhtml code for the first image:
https://www.baka-tsuki.org/project/index.php?title=File:Utsuro_no_Hako_vol1_pic1.jpg

Epubcheck error message: ERROR: /home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0000_Novel_Illustrations.xhtml(2,34): element "div" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns=" http://www.w3.org/2000/svg")

WebToEpub doesn't convert the deprecated u tag (underline) into
suitable form for epub.
Normal underline>
should become
Normal underline>
Sample: same as above. Epubcheck error message: ERROR: /home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0001_Prologue.xhtml(4,85): element "u" not allowed anywhere; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns=" http://www.w3.org/2000/svg")

Invalid id in span tag inside h* tag are not fixed, like
1^st time

Epubcheck error message:
ERROR:
/home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0002_1st_time.xhtml(1,497):
value of attribute "id" is invalid; must be an XML name without colons
Side note: BTE-GEN converts it into
, but it's still
not fixed, and not useful here.

Well, some more, but I lost the samples.

center tag isn't allowed in epub, too. text should
become

align attribute in p/span/div should be converted into css style
text-align:

BTE-GEN moves up heading if higher levels are missing, i.e h2 to h1, h3
to h2 if there's no h1. Can this be considered?

In list of references (translator's notes) in B-T web, the link to jump up
to where the reference belongs to only has a single ↑ symbol. The same in
BTE-GEN's output. In WebToEpub's output, it becomes Jump up ↑. If you
remove cite-accessibility-label (class), the Jump up text will stop
popping up out of nowhere.

Full disclose: I'm developing my own (not easy-to-use) Baka-Tsuki to epub
converter, which is for freaks like me, and not for normal users at all.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#32, or mute the thread
https://github.com/notifications/unsubscribe/AE6w2Umnuteh5N5vw1lswaMLLCA7q25fks5qQe7LgaJpZM4JAwH7
.

from webtoepub.

dreamer2908 commented on July 25, 2024

One more (potential) issue, well, if you have free time to play with.

Images, both in thumbnail and inline form, can have a custom target link, rather than link to image page. I haven't seen anyone using it in B-T, so it's not really important.

Example: User_talk:Dreamer2908. WebToEpub breaks pretty badly.

from webtoepub.

dteviot commented on July 25, 2024

Yup. That's one of the two key problems I'm currently trying to solve for
#9
Hopefully I'll have it solved by the end of the weekend.

On Wed, Jun 29, 2016 at 5:21 PM, dreamer2908 [email protected]
wrote:

One more (potential) issue, well, if you have free time to play with.

Images, both in thumbnail and inline form, can have a custom target link,
rather than link to image page. I haven't seen anyone using it in B-T, so
it's not really important.

Example: User_talk:Dreamer2908
https://www.baka-tsuki.org/project/index.php?title=User_talk:Dreamer2908.
WebToEpub breaks pretty hard.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#32 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AE6w2ZmnNcP6IBVv6G0V1zUWkm1IpKPYks5qQgDPgaJpZM4JAwH7
.

from webtoepub.

belldandu commented on July 25, 2024

Well @dreamer2908

Check if div's parent is p and if so then move out and remove the p tag.
Easy to fix.
is because of the number being the first thing it sees in the ID. There is no real "easy" way to fix this. Having a span tag inside of a header tag is perfectly valid. The problem is that epubcheck and epub readers do not like having numbers as the first character of the ID. We could check every elements ID for the first character and if that first character is a number then we append to the beginning ID (so that epubcheck doesnt go derp). However i'm not sure how dteviot would like this approach and i'm kind of against it mostly because cpu cycles.
Center. same as 2.
align. same as 2.
custom target link. relatively easy actually. Although the real question is should we ignore / remove these custom links or follow them.

@dteviot

from webtoepub.

belldandu commented on July 25, 2024

@dteviot Will you be doing this or should i? Also it would be nice if i could self assign myself to certain issues.

from webtoepub.

dteviot commented on July 25, 2024

@belldandu

Will you be doing this or should i

If you want to do it, that's fine with me. As I've said at momement, I'm trying to get the "use URL to specify cover image". I think that's the highest gain item on the list currently.

Also it would be nice if i could self assign myself to certain issues.

Fine with me. Tell me what I need to do to give you the rights and I'll get it done.

from webtoepub.

belldandu commented on July 25, 2024

I should be able to if i'm contributor rank @dteviot

from webtoepub.

dteviot commented on July 25, 2024

@belldandu you should be a contributor now. If not, please let me know.

from webtoepub.

dreamer2908 commented on July 25, 2024

Well, I'll just leave this here.

WebToEpub v0.0.8 encounters a parsing error on this page: Utsuro_no_Hako:Volume2_May_2 (and Utsuro_no_Hako:Volume2, which includes it).

Screenshot: https://i.imgur.com/guVoRXM.png

from webtoepub.

belldandu commented on July 25, 2024

@dteviot i spelled that wrong its collaborator
@dreamer2908 i'm looking into that.

from webtoepub.

dteviot commented on July 25, 2024

@belldandu try this.https://github.com/dteviot/WebToEpub/invitations

from webtoepub.

belldandu commented on July 25, 2024

@dteviot there we go.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

WebToEpub v0.0.8 encounters a parsing error on this page: Utsuro_no_Hako:Volume2_May_2 (and Utsuro_no_Hako:Volume2, which includes it).

D'oh!
Fixed.
My apologies for not noticing this sooner.

from webtoepub.

dreamer2908 commented on July 25, 2024

@dteviot
Thanks. I completely forgot about this.

I checked out version 0.0.0.14, and it indeed no longer throws errors. But I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.

That's odd. I'll add investigating to my ToDo list.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

I'm looking at fixing this issue

> Images can be embedded in B-T stories in form of inline images instead of thumbnails. The result xhtml code will be (slightly) invalid if WebToEpub encounters this type of images: div tag is inside p

This occurs because I'm wrapping the <svg> element in a <div class=”svg_outer svg_inner”>. I'm wrapping it in a <div> so that a style is applied to the <div>.

div.svg_outer {
   display: block;
   margin-bottom: 0;
   margin-left: 0;\r
   margin-right: 0;\
   margin-top: 0;\r
   padding-bottom: 0;
   padding-left: 0;
   padding-right: 0;
   padding-top: 0;
   text-align: left;
}
div.svg_inner {
   display: block;
   text-align: center;
}

The reason I'm doing this is because Lord Simon told me to do this. (He's the one who wrote BTE-GEN.)

An obvious fix (to me) would be to not have a wrapping <div> tag and apply the style directly to the <div> element. For that matter, I'm also puzzled why there's both a svg_outer and svg_inner style.

Anyway, my knowledge of CCS is extremely limited (as you've probably guessed by my above statements) so I'm hoping you could tell me WHY Lord Simon told me to do this, and why the changes I've suggested would be a bad idea. Failing that, can you point me in the direction of some good CCS documentation?

Thanks for your time.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.

OK, I know what's happening here.
The entire chapter, except for the final line is wrapped in a <i> tag. i.e. the chapter looks like this

<i>
<h3>May 2nd (Saturday) 00:31<h3>
<p>Exactly 15 minutes …
</p>
</i>
<p><i>I think that makes us a perfect match, … </i></p>

But one of the steps of WebToEpub is to “flatten” the HTML, so that all header tags are immediate children of the body, so the italic tag is being discarded.

In this case, rather than trying to fix WebToEpub, I'm going to suggest the easiest way to fix edit the page on Baka-Tsuki, moving the <i> to after the </h3> tag.

I will attempt to make the change.

from webtoepub.

dreamer2908 commented on July 25, 2024

@dteviot

About inline images and div inside p, rather than changing the way you handle images, I think it's easier to find a suitable place for it. Either moving the image out of p before processing, or doing some sanity checking like whether div can really be inserted there would do.

About the italic stuff, well, it's indeed easy to fix the page on Baka-Tsuki. I already know what changes to make to the page, if you want to end the case with this. But erroneous html is everywhere (i isn't even allowed to wrap p), so some degree of error correction will be necessary eventually.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

You might like to take the latest version of the Sonako branch https://github.com/dteviot/WebToEpub/tree/sonako for a spin, I've been busy today.

rather than changing the way you handle images, I think it's easier to find a suitable place for it.
Either moving the image out of p before processing

Yes, that's what I'm doing now. If parent is a <p> put the image before the tag.

WebToEpub doesn't convert the deprecated u tag (underline)

It does now.

center tag isn't allowed in epub, too

Also fixed.

If you remove cite-accessibility-label (class), the Jump up text will stop popping up out of nowhere

Done

Invalid id in span tag inside h* tag are not fixed, like

Those links were only needed for the table of contents on the original page. As they're no longer needed (page is split on Header tags) I'm removing them. (At least, the code is now supposed to remove them.)

About the italic stuff, well, it's indeed easy to fix the page on Baka-Tsuki. I already know what changes to make to the page.

So do I. Looks like someone put an open italic command at the start of the precceeding chapter, and didn't close it until the end of the following chapter. So there's two chapters in italics. In this case, I think it's an error by the translator. That is, the chapters are not supposed to be italic. I've sent a PM to the translator and we'll see what happens. My guess is nothing.

But erroneous html is everywhere (i isn't even allowed to wrap p), so some degree of error correction will be necessary eventually.

Agreed, error handling will be necessary. However, in this case, I think what the parser is doing is reasonable. (Discarding the weird italic tag.) But if you find other cases where the parser has problems please let me know.

from webtoepub.

dreamer2908 commented on July 25, 2024

@dteviot

I've checked out the latest sonako branch, and it seems to work as expected. GJ.

But if you find other cases where the parser has problems please let me know.

Well, if similar weirdness remaining unfixed is considered problems.

It seems that Baka-Tsuki would output weird html if a long section is italic/bold/etc and there's anything that is not text inside. Example: HEAVY_OBJECT:Volume11_Chapter_3#Part_12. The weirdness still remains in the generated epub.

This kind of usage of italic/bold is awfully familiar that I'm afraid it's everywhere.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

It seems that Baka-Tsuki would output weird html if a long section is italic/bold/etc and there's anything that is not text inside. Example: HEAVY_OBJECT:Volume11_Chapter_3#Part_12. The weirdness still remains in the generated epub.

This kind of usage of italic/bold is awfully familiar that I'm afraid it's everywhere.

I'm going to call it a bug. As this incident has so many issues in it I'm starting to loose track of them all I'm raising this as a new issue.

from webtoepub.

dteviot commented on July 25, 2024

I could have sworn i already fixed this in an earlier commit.

You tried, it didn't work properly. There were two problems.

Didn't always result in a valid id. (example Fate/Zero has IDs that
start with a '-')
And it didn't update any hyperlinks that referred to the id.

Uhhhh doesn't removing them also break the citations at the bottom of the
page?

If you mean footnotes, I'm only removing the ids that are not referred to.
Footnotes seem to have valid IDs.

On Sat, Jul 30, 2016 at 6:45 AM, Belldandu [email protected] wrote:

Invalid id in span tag inside h* tag are not fixed, like

Those links were only needed for the table of contents on the original
page. As they're no longer needed (page is split on Header tags) I'm
removing them. (At least, the code is now supposed to remove them.)

Uhhhh doesn't removing them also break the citations at the bottom of the
page?

I could have sworn i already fixed this in an earlier commit.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE6w2U-N64oCFTuvzlnSIJ0WWuYIP1joks5qakpkgaJpZM4JAwH7
.

from webtoepub.

dteviot commented on July 25, 2024

@dreamer2908

BTE-GEN moves up heading if higher levels are missing, i.e h2 to h1, h3 to h2 if there's no h1. Can this be considered?

Done in latest commit to Sonako branch.

align attribute in p/span/div should be converted into css style text-align:

Any chance you can locate an example or two of this please? I haven't found an example yet.

from webtoepub.

dreamer2908 commented on July 25, 2024

@dteviot

Here: Leviathan:Volume_5_Afterword

I've just looked at the wikitext and it turns out that the translator used a weird way to right align text. Feel free to skip this.

from webtoepub.

belldandu commented on July 25, 2024

Just a heads up @dteviot collections hit so I didn't get in ;-; and I have a job interview this Friday at 1 pm. Also my computer broke.

from webtoepub.

Baka-Tsuki - epubcheck errors about webtoepub HOT 25 OPEN

Comments (25)

1^st time

, but it's still
not fixed, and not useful here.

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (25)

1st time

, but it's still not fixed, and not useful here.

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

1^st time

, but it's still
not fixed, and not useful here.