Giter Site home page Giter Site logo

okfn-brasil / perfil-politico Goto Github PK

View Code? Open in Web Editor NEW
159.0 22.0 40.0 3.37 MB

A platform for profiling public figures in Brazilian politics

Home Page: https://perfilpolitico.serenata.ai/

License: GNU General Public License v3.0

Jupyter Notebook 50.72% Python 46.68% Dockerfile 0.29% Procfile 0.01% PLpgSQL 2.30%
django python

perfil-politico's Introduction

Travis CI Codecov Code Climate Apoia.se

Perfil Político

A platform for profiling candidates in Brazilian 2022 General Election, based entirely on open data.

Install

This project requires Docker and Docker Compose.

Settings

To run the API, you must copy the .env.sample to a .env file. You can edit it accordingly if you want run in a production env.

Creating the container

You need to create the docker container:

$ docker-compose up -d

Note: You can use docker compose instead of docker-compose in this project.

Database initial setup

You should create your database by applying migrations:

$ docker-compose run --rm django ./manage.py migrate

Running

To run the project locally, you can simply use this command:

$ docker-compose up

The website and API will be available at localhost:8000 and the Jupyter at localhost:8888.

Bringing data into your database

Your local data/ directory is mapped, inside the container, to /mnt/data. Each command uses a CSV (compressed as .xz or not) from a public and available source. Use --help for more info. Yet some extra data can be generated with some Django custom commands.

Once you have download the datasets to data/, you can create your own database from scratch running:

$ docker-compose run --rm django ./manage.py load_affiliations /mnt/data/filiacao.csv
$ docker-compose run --rm django ./manage.py load_candidates /mnt/data/candidatura.csv
$ docker-compose run --rm django ./manage.py link_affiliations_and_candidates
$ docker-compose run --rm django ./manage.py link_politicians_and_election_results
$ docker-compose run --rm django ./manage.py load_assets /mnt/data/bemdeclarado.csv
$ docker-compose run --rm django ./manage.py pre_calculate_stats
$ docker-compose run --rm django ./manage.py load_bills /mnt/data/senado.csv
$ docker-compose run --rm django ./manage.py load_bills /mnt/data/camara.csv
$ docker-compose run --rm django ./manage.py load_income_statements /mnt/data/receita.csv
# make sure to read the instructions on populate_company_info.sql before running the next command
$ docker-compose run --rm postgres psql -U perfilpolitico < populate_company_info.sql

⚠️ Note that it will change the primary keys for all candidates in the database! So be careful on running it for production environment because some endpoints as /api/candidate/<pk>/ depends on this primary key to retrieve the correct data.

Or you can update the data from your database using the commands:

$ docker-compose run --rm django ./manage.py unlink_and_delete_politician_references
$ docker-compose run --rm django ./manage.py load_affiliations /mnt/data/filiacao.csv clean-previous-data
$ docker-compose run --rm django ./manage.py update_or_create_candidates /mnt/data/candidatura.csv
$ docker-compose run --rm django ./manage.py link_affiliations_and_candidates
$ docker-compose run --rm django ./manage.py link_politicians_and_election_results
$ docker-compose run --rm django ./manage.py load_assets /mnt/data/bemdeclarado.csv clean-previous-data
$ docker-compose run --rm django ./manage.py pre_calculate_stats
$ docker-compose run --rm django ./manage.py load_bills /mnt/data/senado.csv clean-previous-data
$ docker-compose run --rm django ./manage.py load_bills /mnt/data/camara.csv

Note: The code only updates data coming from the csv's to the database. It does not consider the possibility of changing data that is already in the database but does not appear in the csv for some reason (in this case the data in the database is kept untouched). Commands passing the clean-previous-data-option will replace all the data for the respective csv, thus changing all primary keys.

API

GET /api/candidate/<year>/<state>/<post>/

List all candidates from a certain state to a given post. For example:

/api/candidate/2018/df/deputado-distrital/

Post options for 2018 are:

  • 1o-suplente
  • 2o-suplente
  • deputado-distrital
  • deputado-estadual
  • deputado-federal
  • governador
  • presidente
  • senador
  • vice-governador
  • vice-presidente

State options are the abbreviation of the 27 Brazilian states, plus br for national election posts.

GET /api/candidate/<pk>/

Returns the details of a given candidate.

GET /api/economic-bonds/candidate/<pk>/

Get electoral income history for a given candidate and companies that have a partnership.

Returns an object with the structure:

{
  "companies_associated_with_politician": [
    {
      "cnpj": string,
      "company_name": string,
      "main_cnae": string,
      "secondary_cnaes": string (cnaes separated by ','),
      "uf": string,
      "foundation_date": string (date format 'YYYY/MM/DD'),
      "participation_start_date": string (date format 'YYYY/MM/DD')
    }
    // ... other companies in the same format as above ...
  ],
  "election_income_history": [
    {
      "year": int,
      "value": float,
      "donor_name": string,
      "donor_taxpayer_id": string
      "donor_company_name": string
      "donor_company_cnpj": string
      "donor_economic_sector_code": string,
      "donor_secondary_sector_codes": string
    },
    // ... other income statements in the same format as above ...
  ]
}

GET /api/stats/<year>/<post>/<characteristic>/

Get national statistics for a given characteristic in a elected post.

Post options are:

  • deputado-distrital
  • deputado-estadual
  • deputado-federal
  • governador
  • prefeito
  • presidente
  • senador
  • vereador

Characteristic options are:

  • age
  • education
  • ethnicity
  • gender
  • marital_status
  • occupation
  • party

GET /api/stats/<state>/<year>/<post>/<characteristic>/

Same as above but aggregated by state.

GET /api/asset-stats/

Returns an object with a key called mediana_patrimonios that is a list with the median of elected people's asset value aggregated by year.

optionally you can add query parameters to filter the results by state or by the candidate post (the valid posts are the same ones that are in the list above).

These parameters can support multiple values if you wish to filter by more than one thing.

Ex: /api/asset-stats?state=MG&state=RJ&candidate_post=governador&candidate_post=prefeito

Tests

$ docker-compose run --rm django py.test
$ docker-compose run --rm django black . --check

perfil-politico's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perfil-politico's Issues

[Otimização][spike] Investigar otimização do comando load_affiliations

Problema
O comando load_afilliations leva muito tempo para completar.

Descrição da tarefa
Explorar formas de otimização do comando e propor uma estratégia para implementar a otimização do comando

Critérios de aceite

  • Documentar a investigação nesta issue
  • Criar uma nova issue propondo uma estratégia de otimização para o comando load_affiliations

Add code climate.

@cuducos for adding codeclimate you just have to add the github app.

like:

  1. Open codeclimate.com
  2. Login on it.
  3. Select "Add repository"

image

  1. Add the repo you want to.

image

  1. Have a look at you repository on codeclimate

image

  1. You can install codeclimate app on the github repository by going to the tab of settings

image

Use the ORM to get most recent affiliation and his assets for each politician

Solve these TODO#load_affiliations and TODO#load_assets

def politicians_from_affiliation():
# TODO use the ORM (get most recent affiliation for each `voter_id`)
sql = """
SELECT core_affiliation.*
FROM core_affiliation
INNER JOIN (
SELECT voter_id, MAX(started_in) AS started_in
FROM core_affiliation
GROUP BY voter_id
) AS most_recent
ON most_recent.voter_id = core_affiliation.voter_id
WHERE status = 'R';
"""
yield from (
Politician(current_affiliation=affiliation)
for affiliation in Affiliation.objects.raw(sql).iterator()
)

@staticmethod
def assets_per_politician_per_year():
# TODO use the ORM?
Row = namedtuple("Row", ("politician_id", "year", "value"))
sql = """
SELECT
core_candidate.politician_id,
core_candidate.year,
SUM(core_asset.value) AS total
FROM core_asset
INNER JOIN core_candidate
ON core_candidate.id = core_asset.candidate_id
WHERE core_candidate.politician_id IS NOT NULL
GROUP BY core_candidate.politician_id, core_candidate.year
ORDER BY core_candidate.year DESC
"""

Is it possible to use the ORM in these cases? Will the change turn it more readable?

Swap the order between running and initial setup on README.md

I think in README.md the order of running and initial setup should be swapped.
When $ docker-compose up is runned first the API is not made available at localhost:8000, so I had to run $ docker-compose run django ./manage.py migrate and after that run $ docker-compose up.
Talking with other people in the chat during the sprint, they had a similar issue.

A documentação do argumento `clean-previous-data` não está correta | The documentation of the `clean-previous-data` argument is not correct

Português

A documentação do argumento clean-previous-data do comando base atualmente não corresponde ao funcionamento do argumento (remover todos os objetos existentes).

parser.add_argument(
"clean-previous-data",
default=False,
nargs="?",
help=(
"Creates politicians for all affiliations "
"regardless if the politician exists or not."
),
)

English

The documentation of the base command's clean-previous-data argument currently isn't related to the argument's actual function (delete all existing objects).

parser.add_argument(
"clean-previous-data",
default=False,
nargs="?",
help=(
"Creates politicians for all affiliations "
"regardless if the politician exists or not."
),
)

Discussion: Where the data should come from?

I think a valid discussion we could have is where the data should come from.
Currently some data comes from brasil.io website and they are not updated frequently. So if the idea behind this project is to have something that is updated with certain frequency maybe would be interesting to have a scrip to get the data from TSE website and make it available in the cloud.

Running tests with pipenv

Traceback (most recent call last):
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/bin/pytest", line 11, in <module>
    sys.exit(main())
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/config/__init__.py", line 55, in main
    config = _prepareconfig(args, plugins)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/config/__init__.py", line 180, in _prepareconfig
    pluginmanager=pluginmanager, args=args
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 617, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 222, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 216, in <lambda>
    firstresult=hook.spec_opts.get('firstresult'),
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 196, in _multicall
    gen.send(outcome)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/helpconfig.py", line 89, in pytest_cmdline_parse
    config = outcome.get_result()
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 76, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 180, in _multicall
    res = hook_impl.function(*args)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/config/__init__.py", line 612, in pytest_cmdline_parse
    self.parse(args)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/config/__init__.py", line 777, in parse
    self._preparse(args, addopts=addopts)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/_pytest/config/__init__.py", line 739, in _preparse
    early_config=self, args=args, parser=self._parser
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 617, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 222, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/__init__.py", line 216, in <lambda>
    firstresult=hook.spec_opts.get('firstresult'),
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 201, in _multicall
    return outcome.get_result()
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 76, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pluggy/callers.py", line 180, in _multicall
    res = hook_impl.function(*args)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/pytest_django/plugin.py", line 246, in pytest_load_initial_conftests
    dj_settings.DATABASES
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/django/conf/__init__.py", line 56, in __getattr__
    self._setup(name)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/django/conf/__init__.py", line 43, in _setup
    self._wrapped = Settings(settings_module)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/django/conf/__init__.py", line 106, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/amadeucavalcantefilho/Developer/perfil/api/api/settings.py", line 20, in <module>
    MONGO_URL = config('MONGO_URL')
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/decouple.py", line 197, in __call__
    return self.config(*args, **kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/decouple.py", line 85, in __call__
    return self.get(*args, **kwargs)
  File "/Users/amadeucavalcantefilho/.local/share/virtualenvs/perfil-hF4nD9Ih/lib/python3.7/site-packages/decouple.py", line 70, in get
    raise UndefinedValueError('{} not found. Declare it as envvar or define a default value.'.format(option))
decouple.UndefinedValueError: MONGO_URL not found. Declare it as envvar or define a default value.

I don't know whether I have be running mongo while runing test.

[Raspador Legislativo] Corrigir spider do senado

A spider do senado do Raspador Legislativo não está funciondo. Ela deve ser corrigida para podermos atualizar os dados de bills do Perfil Político.

Critério de aceite

  • PR para corrigir a spider do senado mergeada no master do Raspador Legislativo

New data of Affiliation with Null values on electoral_section and type has float

When try load new data from affiliation from Brasil.io using docker-compose run django python manage.py load_affiliations /mnt/data/filiacao.csv get somes errors:

ValueError: invalid literal for int() with base 10: ''

image

After remove some rows with nulls values on electoral_section filed , got a second error:

ValueError: invalid literal for int() with base 10: '83.0'

image

Model from Affiliation expect a Integer Field not null

electoral_section = models.IntegerField()
but new data contains null values and electoral_section are float.

[Integração de dados] Implementar live-update do comando load_bills

Para atualizar o banco de dados de produção sem perder as referências ao IDs existentes, é necessário implementar e documentar como é feita essa rotina de live-update dos projetos de lei.

Critérios de aceite:

  • Verificar necessidade de adicionar novas subrotinas para update do comando load_bills, caso necessário, tê-las implementadas e testadas
  • Documentar procedimento para realização do update do banco no README

Clarification on #142

PR #142 is a bit awkward to me. The description has no context or purpose for the changes, which break the tests:

Surely I can fix that, but before I would like to know more about the context of these changes.

My point is: as tests weren't updated, code edits were not mentioned or justified, and no one in the PR discussion seemed to worry about the CI being red I am not sure if this new state is intentional. Therefore I don't know the direction the fix should go:

  • Should the tests be updated to match the code state after the PR being merged?
  • Or should the code edited in the PR be reverted (keeping only the docs update)?

cc people involved in that PR @adorilson @dehatanes @BrunaNayara @julianyraiol @sergiomario
also cc @giuliocc

[Integração de dados] Verificar funcionamento do Raspador Legislativo

A atualização dos projetos de lei precisa dos dados do Raspador Legislativo. É necessário verificar se ainda está funcionando a integração dos dados do projeto com o Perfil.

Critérios de aceite:

  • Análise da integração do Raspador Legislativo realizada e documentada nessa issue
  • Caso algum problema ocorra, ter as tarefas necessárias para corrigí-los elaboradas
  • Caso nenhum problema ocorra, comando load_bills testado e funcionando

Get a new unique identifier for `Person` model

Since 'unique' fields are complicated to gather. A combination of name-birthday could be considered a "good enough" unique field for our dataset. since this could be a sensitive information, we can transform this slug into a hash.

This would facilitate working with difficult databases and use a lot of information that are not
rightly gathered by some sources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.