Giter Site home page Giter Site logo

za-grad / otvoreni-akti Goto Github PK

View Code? Open in Web Editor NEW
10.0 5.0 2.0 2.01 MB

A better search engine for the City of Zagreb decrees

Home Page: https://akti.za-grad.com

License: GNU General Public License v3.0

Python 4.99% HTML 94.73% CSS 0.27% Shell 0.01%
opendata django elasticsearch postgres croatia

otvoreni-akti's People

Contributors

dependabot[bot] avatar metakermit avatar mzugcic avatar ranajaydas avatar zdeslav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

otvoreni-akti's Issues

Search index to be automatically rebuilt whenever documents.py is changed

Problem:
The Heroku app recently showed Server 500 errors upon any search, because I changed documents.py during the hackathon (this file decides how to index the acts and is closely linked to elasticsearch). Every time this file is updated, we should rebuild the index by running python manage.py search_index --rebuild.

Solution:
The search_index must be somehow rebuilt, either on a regular basis or upon every deploy. Exact solution to be evaluated.

Add celery tasks for automated scraping

Problem:

  1. Currently, the scraper needs to be started manually from the Django shell.

Solution:

  1. Add 2 celery tasks -
    1.1 Task to do a rescrape of last 10 periods of acts every night at 3am, Croatia time.
    1.2 Task to do a full scrape every night at 4am, Croatia time.

Have an option to show archived search results

Problem:

  1. Otvoreni-akti relies on the Grad Zagreb website being online and reachable to show search results. If the mayor decides to take down the website, there should be an archive of the existing acts.

Possible solutions:

  1. Provide a link next to the app to show an archived version of the act (plain-text).
  2. Add a Textfield to the Act model to save the full HTML of the act (this solution is more beautiful but not preferable as it will require a complete rescrape).

Other thoughts:
Not sure if this will be a credible source of information because otvoreni-akti database is managed by a 3rd party (us) and there is no guarantee that this data has not been tampered with.

Search ordering by relevance / date etc

Problem:
There are no options for sorting of search results.

Solution:

  1. Add options to sort the results by date (ascending and descending)
  2. Add option to sort the results by relevance (this is the current default)

simplify the folder structure

Let's try and create a simpler folder structure based on something like this:

https://github.com/metakermit/hellodjangorest

Also, I think we could rename the project itself from skupstina to otvoreni-akti. Then we could keep the name skupstina just for the Django app. That way in the future in theory we could have another Django app that deals with some other open data (just hypothetically, not saying we really will do this), like I don't know the data about companies in Zagreb and their public finance.

We could then have:

otvoreni-akti/ <- the root git repository

  • .gitignore

  • .env

  • manage.py

  • otvoreni-akti/ <- the Django root package

    • init.py
    • settings.py
    • wsgi.py
    • apps/
      • skupstina/

        • views.py
        • models.py

Repair styling issues

Issues as listed by @ranajaydas:

  1. It’d look nicer if the edges aligned with the main search bar (the period one is a bit to the left)
  2. In mobile view, the boxes shrink a bit so the height isn’t the same as the search box so it looks a bit strange
  3. We should add outline: none; to all the search boxes to remove the outlines that occur on selection
  4. The additional search info drop down would look better if it’s inside the purple header to maintain visual consistency

Improved document classification and filtering by document type

1)

We have several types of Acts:

i) Zaključak (en = Determination or Conclusion)
ii) Obrazloženje (en = Deliberation or Explaination)

Do we already have tags as such in the database?

Each point about specific topic on a list of acts for specific fixed time period can have multiple attachments for (a) and/or (b), usually A i is followed by B, but A can also be single attachment on a point.

Example:

Topic: SOME TITLE
Acts: (list of attachments)

  • Zaključak
  • Obrazloženje
  • Zaključak
  • Obrazloženje

More info:

https://legal-dictionary.thefreedictionary.com/acts

act
1 the formally codified result of deliberation by a legislative body; a law, edict, decree, statute, etc. See ACT OF PARLIAMENT.
2 a formal written record of transactions, proceedings, etc., as of a society, committee or legislative body. --> (2) is definition where Mayor’s acts fit into.

https://legal-dictionary.thefreedictionary.com/determination

Can we show label / tag next to each title to show type in which it was sound?

2)

Can we add advanced filter (GUI or by setting a proper keyword in advanced search) to allow search in a specific file types? I.e. Google search

(https://support.google.com/webmasters/answer/35287?hl=en): filetype:doc

3)

I have already shown that specific points on list of acts can have multiple documents. Can we search by a keyword and expand that search displays all linked documents to a point with that keyword. i.e. searching “property” will list all documents that are attached to a point in which any of a childs/attachments have that keyword). It could be an optional “advanced” search parameter. This is because even though keyword is found in a specific document, whole story is said by the all attachments. This is just a thought and could require further discussion by the key users, but I thought it could have sense.

Add a description of the Otvoreni-akti website to the header/about section

Problem:
There is no explanation for what the app does.
"Add "user manual" - maybe some examples and general info (what are these acts and what kind of info do they contain) - so if person comes for the first time they can get idea what can they find/search for"

Solution:
Add a description to the header or create an about section.

Create custom context manager for requests (to handle max retries etc)

We are using requests in a lot of places and this is prone to failure. Instead of having setup and teardown code for checking maximum retries, I suggest adding a context manager for it.

For example, replace:

parse_complete = False
max_retries = 10
sleep_time = 10
print('Parsing ', subject['subject_url'])
while not parse_complete and max_retries > 0:
    try:
        subject_details = parse_subject_details(subject['subject_url'])
        parse_complete = True
    except exceptions.ConnectionError as e:
        parse_complete = False
        max_retries -= 1
        print('Connection Error while parsing {}:\n{}\n'.format(subject['subject_url'], e))
        print('Retrying...\n')
        time.sleep(sleep_time)
if max_retries == 0:
    print('Maximum retries exceeded. Please run the scraper again.\n')
    raise exceptions.ConnectionError

with something like:

with request_with_retry(url):
    do_something

Add Pytest to CI/CD for Heroku to run a test before accepting any changes

Problem:
The Heroku app recently showed Server 500 errors upon any search, because I changed documents.py during the hackathon (this file decides how to index the acts and is closely linked to elasticsearch). Every time this file is updated, we should rebuild the index by running python manage.py search_index --rebuild.

Solution:
Upon running pytest manually on Heroku, I was able to trace the error because two of the view tests failed. Pytest should therefore be a part of our CI/CD pipeline.

Improve date filter on search page

Problem:

  1. The date picker changes from browser to browser and on Chrome it shows dd-----yyyy
  2. The date picker cannot be cleared once a date is entered on iOS.

Solution:

  1. Use a custom CSS/JS date picker

Advanced search & filtering

Would be cool to be able to search not just by terms, but also by filtering from and to dates.

Maybe something like:

Screenshot 2020-01-30 16 35 03

Add exact term search

For example, if I search for:

"Zagreb Park"

it should only show results for that exact term

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.