Comments (2)
The reason for this behavior is that your HTML is invalid.
Any tag must have an end tag, aside from a few exceptions listed here:
https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
Also, custom elements must contain a hyphen (e.g <a-player>
) but JSoup does not seem to enforce this.
https://html.spec.whatwg.org/multipage/custom-elements.html#custom-elements-core-concepts
I am not aware of any setting that ignores custom tags but there are two other options:
- You escape the angle brackets in
<player>
:
org.jsoup.nodes.Document doc;
String output;
org.jsoup.nodes.Document.OutputSettings outputSettings;
doc = Jsoup.parse("""
<!DOCTYPE html>
<html lang="en">
<head><title>Title</title></head>
<body>
<u>Oh, hello! You must be the person I've been waiting on all morning. </u><strong><u>You wouldn't happen to be <player> would you?</u></strong>
</body>
</html>
""");
output = Parser.unescapeEntities(doc.select("body").html(), true);
Output:
<u>Oh, hello! You must be the person I've been waiting on all morning. </u><strong><u>You wouldn't happen to be <player> would you?</u></strong>
- You deal with the end-tag in your program, disable the additional line breaks in JSoup and hope that future versions of JSoup will neither enforce the rules about end tags nor hyphens in custom elements:
org.jsoup.nodes.Document doc;
String output;
org.jsoup.nodes.Document.OutputSettings outputSettings;
doc = Jsoup.parse("""
<!DOCTYPE html>
<html lang="en">
<head><title>Title</title></head>
<body>
<u>Oh, hello! You must be the person I've been waiting on all morning. </u><strong><u>You wouldn't happen to be <player> would you?</u></strong>
</body>
</html>
""");
outputSettings = new org.jsoup.nodes.Document.OutputSettings();
outputSettings.prettyPrint(false);
doc.outputSettings(outputSettings);
output = doc.select("body").html().trim();
Output:
<u>Oh, hello! You must be the person I've been waiting on all morning. </u><strong><u>You wouldn't happen to be <player> would you?</player></u></strong>
from jsoup.
Thank you for the response, and I apologize for my late one.
I was aware that it was identifying it as a tag and trying to treat it as such. I was able to work around it in my program thankfully and circumvent the entire thing. Not so fortunately, it ended massively overcomplicating my code.
The problem that this API has, in my opinion, is that there is no way to turn off the autocorrecting of the parse function. It's not that I'm requesting that the API ignore them entirely, but in my opinion, there should be a way to have JSoup parse the strings, and simply not call whatever function is inserting text into my Strings without my permission. It's worsened by the fact I have no control whatsoever, even having a callback when it edits the string would be nice, mostly so I could just override it and have it not touch the string.
If this project is ever updated, I suggest the feature to work something like this:
JSoup.setFixErrors(false);
If this is set to false, then the code that inserting the end tag automatically will simply not be called, and the text will not be parsed by the system. Ideally, it'd also include an optional callback that catches the "error" and feeds it into the function.
If I could please be directed to the code in this API that handles this autocorrecting functionality, perhaps I could look into adding the support to help out, or at least have the change locally.
Thank you again for your time!
from jsoup.
Related Issues (20)
- Parsing a part of an html string HOT 2
- Minor performance optimizations HOT 2
- Problems with Elements class array methods updating the DOM
- Empty HTML tags are not removed by .text()
- text() is missing a separator space between <button> text nodes HOT 1
- Negative source range start/end positions for first text node HOT 1
- Some html tags didn't support by Jsoup? HOT 1
- [API] [Question] Why is StructuralEvaluator not public?
- bash: syntax error near unexpected token HOT 1
- Fatal Exception: java.lang.NoSuchMethodError: No static method withInitial HOT 2
- 9dec1 HOT 1
- Missing class when building release on Android in 1.17.2
- :empty selector bug [1.17.2]
- nested :has() selector not working [v1.17.2]
- org.jsoup.nodes.Node; HOT 1
- Bugs HOT 1
- Element HOT 1
- Parsing error
- :has doesn't work with sibling combiner
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jsoup.