Comments (25)
Thanks for the list of special cases.
On Wed, Jun 29, 2016 at 4:04 PM, dreamer2908 [email protected]
wrote:
- Images can be embedded in B-T stories in form of inline images
instead of thumbnails. The result xhtml code will be (slightly) invalid if
WebToEpub encounters this type of images: div tag is inside p tag.
Example: All non-gallery images here: Utsuro no Hako:Volume 1
https://www.baka-tsuki.org/project/index.php?title=Utsuro_no_Hako:Volume_1
Result xhtml code for the first image:https://www.baka-tsuki.org/project/index.php?title=File:Utsuro_no_Hako_vol1_pic1.jpgEpubcheck error message: ERROR: /home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0000_Novel_Illustrations.xhtml(2,34): element "div" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns=" http://www.w3.org/2000/svg")- WebToEpub doesn't convert the deprecated u tag (underline) into
suitable form for epub.Normal underline>
should becomeNormal underline>
Sample: same as above. Epubcheck error message: ERROR: /home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0001_Prologue.xhtml(4,85): element "u" not allowed anywhere; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns=" http://www.w3.org/2000/svg")- Invalid id in span tag inside h* tag are not fixed, like
1st time
Epubcheck error message:
ERROR:
/home/yumi/Downloads/Utsuro_no_...koVolume_1.epub/OEBPS/Text/0002_1st_time.xhtml(1,497):
value of attribute "id" is invalid; must be an XML name without colons
Side note: BTE-GEN converts it into, but it's still
not fixed, and not useful here.Well, some more, but I lost the samples.
- center tag isn't allowed in epub, too. text should
become- align attribute in p/span/div should be converted into css style
text-align:BTE-GEN moves up heading if higher levels are missing, i.e h2 to h1, h3
to h2 if there's no h1. Can this be considered?In list of references (translator's notes) in B-T web, the link to jump up
to where the reference belongs to only has a single ↑ symbol. The same in
BTE-GEN's output. In WebToEpub's output, it becomes Jump up ↑. If you
remove cite-accessibility-label (class), the Jump up text will stop
popping up out of nowhere.Full disclose: I'm developing my own (not easy-to-use) Baka-Tsuki to epub
converter, which is for freaks like me, and not for normal users at all.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#32, or mute the thread
https://github.com/notifications/unsubscribe/AE6w2Umnuteh5N5vw1lswaMLLCA7q25fks5qQe7LgaJpZM4JAwH7
.
from webtoepub.
One more (potential) issue, well, if you have free time to play with.
Images, both in thumbnail and inline form, can have a custom target link, rather than link to image page. I haven't seen anyone using it in B-T, so it's not really important.
Example: User_talk:Dreamer2908. WebToEpub breaks pretty badly.
from webtoepub.
Yup. That's one of the two key problems I'm currently trying to solve for
#9
Hopefully I'll have it solved by the end of the weekend.
On Wed, Jun 29, 2016 at 5:21 PM, dreamer2908 [email protected]
wrote:
One more (potential) issue, well, if you have free time to play with.
Images, both in thumbnail and inline form, can have a custom target link,
rather than link to image page. I haven't seen anyone using it in B-T, so
it's not really important.Example: User_talk:Dreamer2908
https://www.baka-tsuki.org/project/index.php?title=User_talk:Dreamer2908.
WebToEpub breaks pretty hard.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#32 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AE6w2ZmnNcP6IBVv6G0V1zUWkm1IpKPYks5qQgDPgaJpZM4JAwH7
.
from webtoepub.
Well @dreamer2908
- Check if div's parent is
p
and if so then move out and remove the p tag. - Easy to fix.
- is because of the number being the first thing it sees in the ID. There is no real "easy" way to fix this. Having a span tag inside of a header tag is perfectly valid. The problem is that epubcheck and epub readers do not like having numbers as the first character of the ID. We could check every elements ID for the first character and if that first character is a number then we append to the beginning ID (so that epubcheck doesnt go derp). However i'm not sure how dteviot would like this approach and i'm kind of against it mostly because cpu cycles.
- Center. same as 2.
- align. same as 2.
- custom target link. relatively easy actually. Although the real question is should we ignore / remove these custom links or follow them.
from webtoepub.
@dteviot Will you be doing this or should i? Also it would be nice if i could self assign myself to certain issues.
from webtoepub.
Will you be doing this or should i
If you want to do it, that's fine with me. As I've said at momement, I'm trying to get the "use URL to specify cover image". I think that's the highest gain item on the list currently.
Also it would be nice if i could self assign myself to certain issues.
Fine with me. Tell me what I need to do to give you the rights and I'll get it done.
from webtoepub.
I should be able to if i'm contributor rank @dteviot
from webtoepub.
@belldandu you should be a contributor now. If not, please let me know.
from webtoepub.
Well, I'll just leave this here.
WebToEpub v0.0.8 encounters a parsing error on this page: Utsuro_no_Hako:Volume2_May_2 (and Utsuro_no_Hako:Volume2, which includes it).
Screenshot: https://i.imgur.com/guVoRXM.png
from webtoepub.
@dteviot i spelled that wrong its collaborator
@dreamer2908 i'm looking into that.
from webtoepub.
@belldandu try this.https://github.com/dteviot/WebToEpub/invitations
from webtoepub.
@dteviot there we go.
from webtoepub.
WebToEpub v0.0.8 encounters a parsing error on this page: Utsuro_no_Hako:Volume2_May_2 (and Utsuro_no_Hako:Volume2, which includes it).
D'oh!
Fixed.
My apologies for not noticing this sooner.
from webtoepub.
@dteviot
Thanks. I completely forgot about this.
I checked out version 0.0.0.14, and it indeed no longer throws errors. But I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.
from webtoepub.
I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.
That's odd. I'll add investigating to my ToDo list.
from webtoepub.
I'm looking at fixing this issue
> Images can be embedded in B-T stories in form of inline images instead of thumbnails. The result xhtml code will be (slightly) invalid if WebToEpub encounters this type of images: div tag is inside p
This occurs because I'm wrapping the <svg> element in a <div class=”svg_outer svg_inner”>. I'm wrapping it in a <div> so that a style is applied to the <div>.
div.svg_outer {
display: block;
margin-bottom: 0;
margin-left: 0;\r
margin-right: 0;\
margin-top: 0;\r
padding-bottom: 0;
padding-left: 0;
padding-right: 0;
padding-top: 0;
text-align: left;
}
div.svg_inner {
display: block;
text-align: center;
}
The reason I'm doing this is because Lord Simon told me to do this. (He's the one who wrote BTE-GEN.)
An obvious fix (to me) would be to not have a wrapping <div> tag and apply the style directly to the <div> element. For that matter, I'm also puzzled why there's both a svg_outer and svg_inner style.
Anyway, my knowledge of CCS is extremely limited (as you've probably guessed by my above statements) so I'm hoping you could tell me WHY Lord Simon told me to do this, and why the changes I've suggested would be a bad idea. Failing that, can you point me in the direction of some good CCS documentation?
Thanks for your time.
from webtoepub.
I noticed something strange: texts in part "May 2nd (Saturday) 00:31" are all italic in Baka-Tsuki, but in the generated epub, only the last sentence is italic.
OK, I know what's happening here.
The entire chapter, except for the final line is wrapped in a <i> tag. i.e. the chapter looks like this
<i>
<h3>May 2nd (Saturday) 00:31<h3>
<p>Exactly 15 minutes …
</p>
</i>
<p><i>I think that makes us a perfect match, … </i></p>
But one of the steps of WebToEpub is to “flatten” the HTML, so that all header tags are immediate children of the body, so the italic tag is being discarded.
In this case, rather than trying to fix WebToEpub, I'm going to suggest the easiest way to fix edit the page on Baka-Tsuki, moving the <i> to after the </h3> tag.
I will attempt to make the change.
from webtoepub.
About inline images and div
inside p
, rather than changing the way you handle images, I think it's easier to find a suitable place for it. Either moving the image out of p
before processing, or doing some sanity checking like whether div
can really be inserted there would do.
About the italic stuff, well, it's indeed easy to fix the page on Baka-Tsuki. I already know what changes to make to the page, if you want to end the case with this. But erroneous html is everywhere (i
isn't even allowed to wrap p
), so some degree of error correction will be necessary eventually.
from webtoepub.
You might like to take the latest version of the Sonako branch https://github.com/dteviot/WebToEpub/tree/sonako for a spin, I've been busy today.
rather than changing the way you handle images, I think it's easier to find a suitable place for it.
Either moving the image out of p before processing
Yes, that's what I'm doing now. If parent is a <p> put the image before the tag.
WebToEpub doesn't convert the deprecated u tag (underline)
It does now.
center tag isn't allowed in epub, too
Also fixed.
If you remove cite-accessibility-label (class), the Jump up text will stop popping up out of nowhere
Done
Invalid id in span tag inside h* tag are not fixed, like
Those links were only needed for the table of contents on the original page. As they're no longer needed (page is split on Header tags) I'm removing them. (At least, the code is now supposed to remove them.)
About the italic stuff, well, it's indeed easy to fix the page on Baka-Tsuki. I already know what changes to make to the page.
So do I. Looks like someone put an open italic command at the start of the precceeding chapter, and didn't close it until the end of the following chapter. So there's two chapters in italics. In this case, I think it's an error by the translator. That is, the chapters are not supposed to be italic. I've sent a PM to the translator and we'll see what happens. My guess is nothing.
But erroneous html is everywhere (i isn't even allowed to wrap p), so some degree of error correction will be necessary eventually.
Agreed, error handling will be necessary. However, in this case, I think what the parser is doing is reasonable. (Discarding the weird italic tag.) But if you find other cases where the parser has problems please let me know.
from webtoepub.
I've checked out the latest sonako branch, and it seems to work as expected. GJ.
But if you find other cases where the parser has problems please let me know.
Well, if similar weirdness remaining unfixed is considered problems.
It seems that Baka-Tsuki would output weird html if a long section is italic/bold/etc and there's anything that is not text inside. Example: HEAVY_OBJECT:Volume11_Chapter_3#Part_12. The weirdness still remains in the generated epub.
This kind of usage of italic/bold is awfully familiar that I'm afraid it's everywhere.
from webtoepub.
It seems that Baka-Tsuki would output weird html if a long section is italic/bold/etc and there's anything that is not text inside. Example: HEAVY_OBJECT:Volume11_Chapter_3#Part_12. The weirdness still remains in the generated epub.
This kind of usage of italic/bold is awfully familiar that I'm afraid it's everywhere.
I'm going to call it a bug. As this incident has so many issues in it I'm starting to loose track of them all I'm raising this as a new issue.
from webtoepub.
I could have sworn i already fixed this in an earlier commit.
You tried, it didn't work properly. There were two problems.
- Didn't always result in a valid id. (example Fate/Zero has IDs that
start with a '-') - And it didn't update any hyperlinks that referred to the id.
Uhhhh doesn't removing them also break the citations at the bottom of the
page?
If you mean footnotes, I'm only removing the ids that are not referred to.
Footnotes seem to have valid IDs.
On Sat, Jul 30, 2016 at 6:45 AM, Belldandu [email protected] wrote:
Invalid id in span tag inside h* tag are not fixed, like
Those links were only needed for the table of contents on the original
page. As they're no longer needed (page is split on Header tags) I'm
removing them. (At least, the code is now supposed to remove them.)Uhhhh doesn't removing them also break the citations at the bottom of the
page?I could have sworn i already fixed this in an earlier commit.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE6w2U-N64oCFTuvzlnSIJ0WWuYIP1joks5qakpkgaJpZM4JAwH7
.
from webtoepub.
BTE-GEN moves up heading if higher levels are missing, i.e h2 to h1, h3 to h2 if there's no h1. Can this be considered?
Done in latest commit to Sonako branch.
align attribute in p/span/div should be converted into css style text-align:
Any chance you can locate an example or two of this please? I haven't found an example yet.
from webtoepub.
Here: Leviathan:Volume_5_Afterword
I've just looked at the wikitext and it turns out that the translator used a weird way to right align text. Feel free to skip this.
from webtoepub.
Just a heads up @dteviot collections hit so I didn't get in ;-; and I have a job interview this Friday at 1 pm. Also my computer broke.
from webtoepub.
Related Issues (20)
- Please add site https://novelbob.org
- Please add site https://taonovel.com/ HOT 3
- 403 error foxaholic on fetching chapter HOT 7
- WebToEpub vs FFnet access denied 403 HOT 14
- Please add site https://topnovelfull.com/ HOT 3
- Questionablequesting.com updated their Site HOT 4
- Please add site https://mtlarchive.com/ HOT 4
- 403 Status for Mtlarchive website HOT 3
- Mtlarchive.com parser HOT 10
- https://www.lightnovelworld.co/ HOT 1
- Unwanted elements being included in Scribble Hub stories. HOT 2
- Unable to start HOT 2
- Chinese characters showing up as Specials Unicode Block HOT 2
- 403 error for 18.foxaholic.com and foxaholic.com HOT 7
- https://phoenixwalktranslations.com parser HOT 2
- Please add a parser for jjwxc.net HOT 4
- Please add site https://jobnib.com HOT 1
- Patreon worked yesterday but no today HOT 12
- Please add site https://www.webnovelpub.pro/ HOT 5
- Simple-ish improvement ideas/Library feature is unintuitive and obscure
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from webtoepub.