Comments (5)
Hi @LorenzBuehmann ,
I vaguely (it was a very long time ago!) recall this coming up before. A difference now is that the only site this affects is likely to be wikidata (and then, only for now).
Here is a MVCE:
public static void main(String...args) {
// U00E7
String qs = "SELECT ?x { BIND('Curaçao' As ?x) }";
String qsx = "SELECT ?x { BIND('Cura\\u00E7ao' As ?x) }";
RowSet rowSet = QueryExecHTTP
.service("https://query.wikidata.org/sparql")
//.sendMode(QuerySendMode.asPostForm)
//.sendMode(QuerySendMode.asPost)
.sendMode(QuerySendMode.asGetAlways)
.queryString(qs)
.select();
RowSetOps.out(rowSet);
}
After checking, the corruption is on the request receiving and qsx
works in all three cases.
The three different sendModes give three different results.
asGetAlways
worksasPost
is corrupted in a way that looks like UTF-8 read as ISO-8859-?asPostForm
is a different corruption, not sure what and that might be Jena.
I don't know why ISO-8859 is being used if their servers are Linux (system default). It hints it is a choice in the Blazegraph code.
from jena.
Hi @afs
Yes, as I expected a limitation on the Wikidata backend or at least their server setup. I was just confused by the different behaviour of Jena 4.1.0 vs the latest versions, and then I remembered that you changed the used HTTP API.
static private String getQueryString(final HttpServletRequest req)
throws IOException {
if (RESTServlet.hasMimeType(req, MIME_SPARQL_QUERY)) {
// return the body of the POST, see trac 711
return readFully(req.getReader());
}
return req.getParameter(ATTR_QUERY) != null ? req
.getParameter(ATTR_QUERY) : (String) req
.getAttribute(ATTR_QUERY);
}
Unfortunately they rely on the old HTTP API and the HttpServletRequest
sticks to ISO-8859-
by default if in the HTTP request no encoding is specified - and you can't change the default encoding afaik. The only fix would be to set the encoding on the request object, i.e.
req.setCharacterEncoding("UTF-8");
So not sure how to continue, we'll raise an issue on Blazegraph, but I don't think that fix will even make it to Wikidata setup as they would have to rebuild and redeploy Blazegraph.
Or they would set the default encoding in their Jetty server if possible.
Regarding POST Form, via curl
it works:
curl -X POST --data "query=SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql
For Jena I guess we can close this issue (once you got an idea on the POST form issue you mentioned) here and at least have it for reference and documentation as a known limitation. Might affect other users as well.
from jena.
@afs a follow up issue/question (we could also open another issue for better reference)
Wikidata people argued to use POST form because it works ...
We tried to set the SERVICE
request mode via Fuseki assembler config:
ja:context [ ja:cxtName "arq:httpServiceSendMode" ; ja:cxtValue "asGetWithLimitForm" ] ;
This indeed fails, as Context::get
tries to return an object of the expected type in Service::chooseQuerySendMode
method which in that case will be QuerySendMode
and indeed casting a String
to this type fails.
A quick fix would workaround the limitation and handle at least the two different types of the context value, i.e. i) String
coming from an assembler config or ii) a QuerySendMode
coming from maybe some Java API setup :
private static QuerySendMode chooseQuerySendMode(String serviceURL, Context context, QuerySendMode dftValue) {
if ( context == null )
return dftValue;
Object querySendMode = context.<Object>get(httpServiceSendMode, dftValue);
if (querySendMode instanceof String) { // handle string type from assembler config
return QuerySendMode.valueOf((String) querySendMode);
} else if (querySendMode instanceof QuerySendMode) { // handle enum type from Java API
return (QuerySendMode) querySendMode;
}
// handle null value and other non-supported types
return context.get(httpServiceSendMode, dftValue);
}
from jena.
Separate issue and PR please!
With error handling.
from jena.
We might as well put back the "charset=utf8". It didn't seem to cause problems.
I noticed another problem - GET and POST+form are not encoding as % characters outside printable ASCII.
Everything works, including Wikidata, but strictly it is wrong.
A fix is quite easy - PR #1269.
from jena.
Related Issues (20)
- Using a reasoner to retrieve individuals beloning to a Class HOT 3
- How do you work with jena-fuseki-ui for local development? HOT 4
- Update jena-text to use Lucene in a Java21 compatible way.
- Always use hash joins when joining VALUES blocks HOT 3
- Prefixes Service
- More granular control over Blank node serialization HOT 1
- Error parsing linkedart context definition HOT 2
- ResultSetException: Datatype is rdf:langString but no language given HOT 3
- `UNDEF` in `VALUES` doesn't work with `SERVICE` HOT 4
- Provide a framework for normalizing RDF terms
- Titanium JSON-LD processor no longer supported HOT 1
- Move test log4j setting from log4j.properties to log4j2-test.properties
- Inconsistent default graph handling in RIOT writers HOT 3
- RDFLink: API for passing raw query strings to backend HOT 3
- java.lang.Error: Maximum lock count exceeded HOT 8
- SPARQL query round-trip serialization error HOT 1
- `jena-arq` module introduces a JUnit dependency into compile scope HOT 1
- Clean up AuthBearerFilter
- Align building of shared jars
- Upgrade to vitest v2.x
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jena.