ibm-watson-data-lab / simple-search-service Goto Github PK
View Code? Open in Web Editor NEWA faceted search engine and content API.
A faceted search engine and content API.
Sometimes it's useful to specify a Cloudant URL without binding to a Cloudant service directly (e.g. if you want to use a proxy).
This should be possible by specifying a URL using SSS_CLOUDANT_URL
but SSS always looks for a Cloudant service in VCAP_SERVICES
if it's present, preventing SSS_CLOUDANT_URL
from being used in Bluemix deployments.
Raised by @mikebroberg and transferred from another repo:
Whenever I try to facet on a number value, I can't render any search results. This happened both with CSVs and TSVs.
With this, we can expose to users access to their Bluemix, Cloudant (and eventually, Redis) dashboards.
Use jshint
and and jscs
tasks to verify that code changes are clean.
Using local config for Cloudant
ERR: Invalid or unexpected value passed to Simple Service Registration module
body-parser deprecated bodyParser: use individual json/urlencoded middlewares app.js:67:40
server starting on http://localhost:6001
(node:6521) DeprecationWarning: Using Buffer without new
will soon stop working. Use new Buffer()
, or preferably Buffer.from()
, Buffer.allocUnsafe()
or Buffer.alloc()
instead.
_http_server.js:193
throw new RangeError(Invalid status code: ${statusCode}
);
The Shared plan is no longer available. D2BM will fail without this change.
Please refer to the following document for details and migration instructions https://github.com/IBM-Bluemix/cf-deployment-tracker-client-node/wiki/migrating-to-the-new-metrics-tracker-client.
After data has been uploaded there's a continue button on the Get Data, Upload Data and Create Index pages. There's no help text on any of these pages outlining what one would have to do to load another data set. One basically has to open every SSS page to find the answer (Settings > Delete Data).
Trying to insert/update rows using the /rows
endpoints, SSS returned
"statusCode":404,"body":"{\"error\":\"COL1 is not a valid parameter,COL2 is not a valid parameter,COL3 is not a valid parameter,COL4 is not a valid parameter\",\"reason\":\"Validation failure\"
Debugging the issue I noticed that an error occurred loading the schema. However, that error is not propagated and an empty default schema is returned ...
seamsdb.get("schema", function(err, data) {
if (err) {
return callback(null, defaultSchema);
}
callback(err, data);
});
... causing validation failures because COL1, COL2 etc are unknown.
It appears that if a data value is null, an empty string is displayed in the rows preview. I loaded the movies sample data set and previewed the search. The first 20 rows did not display any data under the rating and earnings_rank headings, giving the appearance that there's a bug. Only after I inspected the raw data I realized that the sample rows shown contained null values. It might be good if a place holder string could be displayed instead of the empty string to avoid confusion.
For larger datasets, uploading can be a pain. I uploaded a 800MB file for 2-3 hours over a poor connection outside of Berlin. I realize my use case is narrow, but I think these proposed enhancements could help others:
In my case, I wound up setting up a remote ubuntu desktop box that I uploaded a gzipped file (91MB), then uncompressed to upload in the SSS GUI.
Good luck and thanks for everyone's hard work so far!
P.S. I realized that @bradnoble and I used to work together back at Mullen in 2004 or 2005. I was a 23yo account guy back then and would be surprised if you remembered me. Glad to see you're doing well. :)
Should follow the same pattern as Create Index and Preview Search.
Raised by @mikebroberg from another repo:
When you write data to file from MySQL, it uses a shorthand for NULL values in the resulting CSV or TSV: \N See https://dev.mysql.com/doc/refman/5.0/en/null-values.html for more.
It would be cool if SEaMS could support this. I guess you wouldn't want to have the option to import as JavaScript's null datatype, but maybe have it in there as a string? Dunno.
So imagine we have a data like this
firstname,lastname,home address
glynn,bird,10 front street
the firstname and lastname fields are ok, but the home address
field causes us problems.
We try and fudge it in the front end (https://github.com/ibm-cds-labs/simple-search-service/blob/master/public/js/seams.js#L153) to replace spaces with underscores. But the back end needs to do the same because Cloudant doesn't like names of indexes to have spaces in either.
So I tried this change at this line https://github.com/ibm-cds-labs/simple-search-service/blob/master/lib/schema.js#L65:
for (var i in schema.fields) {
var f = schema.fields[i];
var nicename = f.name.replace(/ /g,"_");
if (f.name != "_id") {
func += ' indy("' + nicename + '", doc["'+ f.name + '"], ' + f.facet + ');\n';
}
}
to do the same fudge at the back end. But the this only fudges the index name.
So then I tried removing the front-end fudge so that it leaves the field name with a space in it, but then I get JS errors at the front-end:
jquery.js:1496 Uncaught Error: Syntax error, unrecognized expression: select[name=Business Unit]
because selectors with spaces in freak out too.
In summary
I just completed the Simple Search Service tutorial and it was actually pretty easy - I did it front to end in under 10 minutes. The only wall I ran into was when downloading the sample movie data set from github - the link took me straight to that movie TSV file code rather than the master open data folder where I could actually have the option to download it. I understand why you did that - so that you can see exactly which data you're supposed to use and upload to the service - but it's not made very clear in the directions that you then need to go to the master directory and download the whole zip. Unless there is a way around this, which I didn't see.
If an environment variable LOCKDOWN is present and set to "true", then UI should not allow data to be deleted or added. SSS would just behave as a search API.
this is linked to from the README but 404s https://github.com/ibm-cds-labs/simple-search-service/blob/master/API%20Reference.md
When a user is uploading data, we show them the preview of the data that's about to get indexed, but we don't allow them to "go back" and upload again. Poses a problem when, for example, a user is trying to upload/index a CSV with a poorly formatted header row.
Requires a hard refresh of the app to resolve.
Currently text/plain is returned.
If a user tries to upload a CSV or TSV that doesn't include a header row, we should alert them. (Without that header row, we can't present a coherent "Create Index" screen, among other issues.)
the search index that gets created is using the standard analyzer. this standard analyzer has the following stop words: "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
stop words do not get indexed so when trying to perform a faceted search with one of these words no results are returned and there are no messages (error or warning). as a result the user may think the search failed or there is an issue with the search service.
In certain scenarios it would be desirable to have the ability to programmatically remove all data from the index and define a custom schema. The existing /import
API endpoint requires a prior data upload.
Rasied by @mikebroberg (transferred from another repo)
I had values in a TSV file that I wanted to upload to SEaMS as type "arrayofstrings". I edited my TSV in Excel to do some stupid sorting and didn't realize that Excel had wrapped some of the values in double-quotes.
When attempting to upload the TSV with double-quotes to SEaMS, the first step, "1 of 3 - Upload data," would read 100 percent, but the UI would not go on to the second step, "2 of 3 - Schema." Trying to return to the main page for my SEaMS app (http://seams-broberg-152.mybluemix.net/admin/home) would fail. That's because the app crashed in Bluemix.
I didn't check this functionality in a CSV because the TSV better handled my commas, but if there is some scenario where this would happen with a CSV, Brad suggested checking that out too. Worth fixing for the TSV though, if people have a field they want to upload as an array so they can facet on its elements. Thanks!
We still need to be able facet on array_of_strings facets. It's just that the Content API (api.html) view doesn't present content correctly when the “automagical content discovery” happens on array_of_strings facets.
This is the comment.
Super weird, but if you go to my demo app using the optional Redis service at http://seams-broberg-1040.mybluemix.net/admin/search you'll see all the nice facets. Try searching for genre:'C' and it works beautifully. So do all the other elements except for genre:'A'
Pretty sure this is related to the Redis cache option, because this was working perfectly yesterday with the default IBM cache. Thanks.
Dataset at https://www.dropbox.com/s/4fp7v7ikndtbc1p/movieSEaMS_001.tsv?dl=0
... it shouldn't have to be set on API calls.
So, the app needs to change, and requests out to /search should no longer be appended with the limit query param.
We have several instances of the SSS deployed. Each application operates on its own Cloudant instance. It would be nice if all instances could use a dedicated repository database in a single instance to reduce the number of Bluemix resources.
I'm not sure how you want to phrase it, but without finding and parsing the code, I didn't realize I had to exactly match the name "Redis by Compose" in my Bluemix deployment since Bluemix usually appends something like "-xy" to the name when binding a service.
Ensure cache is cleared when the "Delete" button is pressed in the admin.
The following text is displayed:
Example with Dynamic Content
Here, in this example, let's pretend the content in this content area is all about when the is/are . Good news, you indexed your data with as a facet, so we can use the Simple Search Service to retrieve exactly the results you want, and then use Javascript to write the JSON returned by the API into the page.
We shouldn't let users pass by the "Create index" screen without selecting at least one facet. They can decide later to not use the facets, but this demo is about the power of faceting -- everything that comes after that step depends on it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.