za-grad / otvoreni-akti Goto Github PK
View Code? Open in Web Editor NEWA better search engine for the City of Zagreb decrees
Home Page: https://akti.za-grad.com
License: GNU General Public License v3.0
A better search engine for the City of Zagreb decrees
Home Page: https://akti.za-grad.com
License: GNU General Public License v3.0
Problem:
The Heroku app recently showed Server 500 errors upon any search, because I changed documents.py during the hackathon (this file decides how to index the acts and is closely linked to elasticsearch). Every time this file is updated, we should rebuild the index by running python manage.py search_index --rebuild.
Solution:
The search_index must be somehow rebuilt, either on a regular basis or upon every deploy. Exact solution to be evaluated.
Problem:
The search for common terms like "Zagreb" and "točka" is very slow.
Possible solutions:
Obey the testing goat...BAAAAAAAAAAAAAA
Funny video of Harry demoing TDD for Django:
https://www.youtube.com/watch?v=X9474CgJleg
Harry's book:
https://www.obeythetestinggoat.com/pages/book.html
Problem:
Solution:
Problem:
Possible solutions:
Other thoughts:
Not sure if this will be a credible source of information because otvoreni-akti database is managed by a 3rd party (us) and there is no guarantee that this data has not been tampered with.
Problem:
There are no options for sorting of search results.
Solution:
As stated in the title, it just gets in the way on mobile specifically.
Let's try and create a simpler folder structure based on something like this:
https://github.com/metakermit/hellodjangorest
Also, I think we could rename the project itself from skupstina to otvoreni-akti. Then we could keep the name skupstina just for the Django app. That way in the future in theory we could have another Django app that deals with some other open data (just hypothetically, not saying we really will do this), like I don't know the data about companies in Zagreb and their public finance.
We could then have:
otvoreni-akti/ <- the root git repository
.gitignore
.env
manage.py
otvoreni-akti/ <- the Django root package
skupstina/
Issues as listed by @ranajaydas:
We have several types of Acts:
i) Zaključak (en = Determination or Conclusion)
ii) Obrazloženje (en = Deliberation or Explaination)
Do we already have tags as such in the database?
Each point about specific topic on a list of acts for specific fixed time period can have multiple attachments for (a) and/or (b), usually A i is followed by B, but A can also be single attachment on a point.
Example:
Topic: SOME TITLE
Acts: (list of attachments)
More info:
https://legal-dictionary.thefreedictionary.com/acts
act
1 the formally codified result of deliberation by a legislative body; a law, edict, decree, statute, etc. See ACT OF PARLIAMENT.
2 a formal written record of transactions, proceedings, etc., as of a society, committee or legislative body. --> (2) is definition where Mayor’s acts fit into.
https://legal-dictionary.thefreedictionary.com/determination
Can we show label / tag next to each title to show type in which it was sound?
Can we add advanced filter (GUI or by setting a proper keyword in advanced search) to allow search in a specific file types? I.e. Google search
(https://support.google.com/webmasters/answer/35287?hl=en): filetype:doc
I have already shown that specific points on list of acts can have multiple documents. Can we search by a keyword and expand that search displays all linked documents to a point with that keyword. i.e. searching “property” will list all documents that are attached to a point in which any of a childs/attachments have that keyword). It could be an optional “advanced” search parameter. This is because even though keyword is found in a specific document, whole story is said by the all attachments. This is just a thought and could require further discussion by the key users, but I thought it could have sense.
Example:
ALLOWED_HOSTS should not be ['*'] once the app is deployed to production
Ability to do partial scrapes will be useful when automating routine DB updates using celery
Problem:
There is no explanation for what the app does.
"Add "user manual" - maybe some examples and general info (what are these acts and what kind of info do they contain) - so if person comes for the first time they can get idea what can they find/search for"
Solution:
Add a description to the header or create an about section.
We are using requests in a lot of places and this is prone to failure. Instead of having setup and teardown code for checking maximum retries, I suggest adding a context manager for it.
For example, replace:
parse_complete = False
max_retries = 10
sleep_time = 10
print('Parsing ', subject['subject_url'])
while not parse_complete and max_retries > 0:
try:
subject_details = parse_subject_details(subject['subject_url'])
parse_complete = True
except exceptions.ConnectionError as e:
parse_complete = False
max_retries -= 1
print('Connection Error while parsing {}:\n{}\n'.format(subject['subject_url'], e))
print('Retrying...\n')
time.sleep(sleep_time)
if max_retries == 0:
print('Maximum retries exceeded. Please run the scraper again.\n')
raise exceptions.ConnectionError
with something like:
with request_with_retry(url):
do_something
Problem:
The Heroku app recently showed Server 500 errors upon any search, because I changed documents.py during the hackathon (this file decides how to index the acts and is closely linked to elasticsearch). Every time this file is updated, we should rebuild the index by running python manage.py search_index --rebuild.
Solution:
Upon running pytest manually on Heroku, I was able to trace the error because two of the view tests failed. Pytest should therefore be a part of our CI/CD pipeline.
Deploy the app online
Problem:
Solution:
Problem:
The current search assumes that words like Zagreb, Zagreba and Zagrebu are different and shows different results for each one. Ideally, a search for Zagreba should show results for all 3.
There is an option for snowball filtering in documents.py but it only works for English words.
Solution:
For example, if I search for:
"Zagreb Park"
it should only show results for that exact term
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.