Giter Site home page Giter Site logo

snow-fox-data / dss-thread Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 2.0 222.55 MB

Dataiku Thread™ Data Catalog Plugin by Snow Fox Data

Home Page: https://www.snowfoxdata.com/thread-plugin

License: Other

HTML 0.93% JavaScript 28.09% CSS 52.67% Python 17.74% Makefile 0.56%
data-catalog data-science dataiku

dss-thread's People

Contributors

mc-clifford avatar rymoore avatar rymoore99 avatar stevewithington avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

dss-thread's Issues

Last Scanned

Could you indicate the date of the last scan for each project?

Thread Not Returning Results in DSS 11.1 Instances

First off, dynamite work! This is incredibly well thought out. I will be giving a shout out to Snow Fox and Excelion in the Dataiku Sales Engineering Global Call.

Describe the bug
Thread works perfectly in version 10, however, in version 11.1, Tread returns zero results. This appears to be a deprecation from one of the API calls.


2022-11-03 04:19:56,711 INFO 127.0.0.1 - - [03/Nov/2022 04:19:56] "GET /dss-stats HTTP/1.1" 200 
-THREAD datasets do not exist yet
/opt/dataiku/dss_install/dataiku-dss-11.1.0/python/dataikuapi/dss/dataset.py:132: DeprecationWarning: Dataset.get_definition is deprecated, please use get_settings  
warnings.warn("Dataset.get_definition is deprecated, please use get_settings", DeprecationWarning)
/opt/dataiku/dss_install/dataiku-dss-11.1.0/python/dataikuapi/dss/dataset.py:144: DeprecationWarning: Dataset.set_definition is deprecated, please use get_settings  
warnings.warn("Dataset.set_definition is deprecated, please use get_settings", DeprecationWarning)
2022-11-03 04:20:00,312 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Datasets--2022-11-03 04:20:00,312 INFO Initializing write session
2022-11-03 04:20:00,336 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,338 INFO Initializing write data stream (sZ7gv1aINY)
2022-11-03 04:20:00,339 INFO Remote Stream Writer closed
2022-11-03 04:20:00,341 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,341 INFO Waiting for data to send ...
2022-11-03 04:20:00,341 INFO Got end mark, ending send
0 rows successfully written (sZ7gv1aINY)
2022-11-03 04:20:00,552 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Index--2022-11-03 04:20:00,552 INFO Initializing write session
2022-11-03 04:20:00,577 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,580 INFO Initializing write data stream (B4Qq39Og86)
2022-11-03 04:20:00,581 INFO Remote Stream Writer closed
2022-11-03 04:20:00,583 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,583 INFO Waiting for data to send ...
2022-11-03 04:20:00,583 INFO Got end mark, ending send
0 rows successfully written (B4Qq39Og86)
2022-11-03 04:20:00,849 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Column-Mapping--
2022-11-03 04:20:00,850 INFO Initializing write session
2022-11-03 04:20:00,884 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,888 INFO Initializing write data stream (nFw6UApJBf)
2022-11-03 04:20:00,891 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,891 INFO Waiting for data to send ...
2022-11-03 04:20:00,891 INFO Remote Stream Writer closed
2022-11-03 04:20:00,892 INFO Got end mark, ending send
0 rows successfully written (nFw6UApJBf)

To Reproduce

  1. Create a new project in a 11.1 instance
  2. Add the Visual Webapp Thread
  3. Begin Scanning the DSS Instance
  4. The above will show in the log.
  5. The screenshot below will occur after a minute or two with a modal responding as "disabled"

Expected behavior
Thread would perform as normal.

Screenshots
At some point after starting the DSS scan.
image

Additional context
Add any other context about the problem here.

  • Browsers tried: Firefox and Chrome
  • DSS version 11.1
  • Attempted testing on new and existing DSS instances.

Happy to do any testing as needed.

Unable to create or edit definition if field `Name` contains a hashtag (#)

Describe the bug
Hashtag (#) in field name causes error when attempting to edit definition.

To Reproduce
Steps to reproduce the behavior:

  1. Go to a dataset containing fields with hashtags in the Name.
  2. Click on the field name to edit

Expected behavior

Creating or editing a definition for a field containing a hashtag should not cause an error.

Screenshots
thread-error-01

thread-error-02

Additional context

  • Browser: Chrome
  • DSS version: 11.0

Failed to post events to event server

Describe the bug
Our entire catalog somehow got wiped out.

Now when trying to re-scan the following message pops up on the Thread Public web interface with only an "OK" button:

dss.my-corp.com says:
'to'

The logs show the following over and over over again.

[2022/10/12-21:19:27.347] [Thread-350] [WARN] [dip.http.utils]  - Got response code 502
[2022/10/12-21:19:27.347] [Thread-350] [WARN] [dku.auditmechanism.eventserver]  - Failed to post events to event server
java.io.IOException: Unknown error on command (HTTP code:502):<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>

	at com.dataiku.dip.util.HTTPClientUtils.handleJSONResp(HTTPClientUtils.java:226)
	at com.dataiku.common.rpc.InternalAPIClient.handleJSONResp(InternalAPIClient.java:866)
	at com.dataiku.common.rpc.APIKeyAuthAPIClient.handleJSONResp(APIKeyAuthAPIClient.java:96)
	at com.dataiku.common.rpc.InternalAPIClient.postObject(InternalAPIClient.java:310)
	at com.dataiku.dip.security.audit.targets.EventServerTarget$QueueSender.run(EventServerTarget.java:167)
[2022/10/12-21:19:27.457] [qtp722417467-376348] [DEBUG] [dku.tracing]  - [ct: 0] Start call: /api/futures/get-update [GET] user=user.one [futureId=VOvlVVLf]
[2022/10/12-21:19:27.457] [qtp722417467-376348] [DEBUG] [dku.tracing]  - [ct: 0] Done call: /api/futures/get-update [GET] time=0ms user=user.one [futureId=VOvlVVLf]

To Reproduce
Steps to reproduce the behavior:

  1. re-scan DSS

Expected behavior
The catalog gets rebuilt

Additional context
Add any other context about the problem here.

  • Chrome
  • DSS version 11.0.1
  • Thread™ version 1.1.3

Add an `exclude_tags` Project Variable

Is your feature request related to a problem? Please describe.
I can't explicitly exclude datasets from getting scanned.

Describe the solution you'd like
I see we can use limit_to_tags to limit the scan to only datasets with a particular tag.
I want to use a exclude_tags to the optional Project Variables to explicitly exclude datasets from getting scanned.

Describe alternatives you've considered
Tag all datasets I want scanned with a tag and put that tag in the limit_to_tags Project Variable.
This is unnecessarily tedious when all I want to do is exclude a small number of datasets from a very large number of datasets.

Corrupted row in __Thread_Index__

When scanning our DSS instance, the indexing always stops too early and does not complete the entire scan. Some projects are not in the Thread_Index.

In the screenshot below you can see from the index subset of a dataframe created directly from Thread_Index that the last row the "key" and "last_modified" columns appear to have been shifted to the left by one column. Then a "NaN" (Null) value is left in the actual "last_modified" column.

Also, the only way to get to nearly any project is by putting the Project Key directly in URL, as this appears to break the search functionality of Thread. See the error message at the bottom of this post.

image

[2022-05-18 13:48:19,339] [27/MainThread] [ERROR] [dataiku.webapps.backend] Exception on /search [GET]
Traceback (most recent call last):  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 2077, in wsgi_app    response = self.full_dispatch_request()  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1525, in full_dispatch_request    rv = self.handle_user_exception(e)  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1523, in full_dispatch_request    rv = self.dispatch_request()  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1509, in dispatch_request    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)  
File "<string>", line 155, in search  File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__    return self._getitem_array(key)  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/frame.py", line 2709, in _getitem_array    if com.is_bool_indexer(key):  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/common.py", line 107, in is_bool_indexer    
raise ValueError('cannot index with vector containing 'ValueError: cannot index with vector containing NA / NaN values

Unable to add or edit definitions

I am unable to add/edit definitions in the Thread app.
I am an owner of the Thread project within DSS. All I see is a blank screen in place of the definition window, I am able to view the DSS and Lineage tabs.
Earlier the member was not able to add/edit definitions and now the owner of the project is not able to add/edit.

a3802d80-fae6-45e4-837c-a4692d743ae1

Column Renaming

Unclear what happens to definitions that were applied to columns that no longer exist

Unable to add or edit definitions

I am unable to add/edit definitions in the Thread app.
I am a member of the Thread project within DSS. All I see is a blank screen in place of the definition window, I am able to view the DSS and Lineage tabs.
image

Implement the ability to share Thread catalog between different dataiku instances

Is your feature request related to a problem? Please describe.
Our organization has multiple dataiku design nodes or instances. Having a catalog that is consistent and "up-to-date" between the instances is important.

Describe the solution you'd like
We would like to be able to share one catalog between the different instances.

Describe alternatives you've considered
An alternative, discussed with @rymoore99, is to export the catalog as Dataiku Project, and import it in other instances. This is a nice feature I wasn't aware of, but I think it might be useful to migrate instances, but it only partially solves the problem of having a single consistent catalog, or it might require a lot of manual work and discipline to achieve the goal in a daily basis.

Lost "applied to"

When applying an existing definition to new columns, the existing columns are not keeping the definition application

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.