Giter Site home page Giter Site logo

alertadengue / pysus Goto Github PK

View Code? Open in Web Editor NEW
170.0 170.0 67.0 8.66 MB

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.

License: GNU General Public License v3.0

Python 98.33% Makefile 0.79% Dockerfile 0.73% Shell 0.15%
data-science geospatial health

pysus's Introduction

AlertaDengue

This repository contains the main applications and services for the InfoDengue web portal.

InfoDengue is an early-warning system to all states of Brazil, the system is based on the continuous analysis of hybrid data generated through the research of climate and epidemiological data and social scraping.

For more information, please visit our website info.dengue.mat.br to visualize the current epidemiological situation in each state.


Sponsors


How to contribute with InfoDengue

You can find more information about Contributing on GitHub. Also check our Team page to see if there is a work oportunity in the project.


How data can be visualized

The Infodengue website is accessed by many people and it is common for us to receive news that this information is used in the definition of travel and other activities. All data is compiled, analyzed and generated in a national level with the support of the Brazilian Health Ministry, the weekly reports can be found in our website through graphics or downloaded in JSON and CSV files via API.

API

The InfoDengue API will provide the data contained in the reports compiled in JSON or CSV files, it also provides a custom range of time. If you don't know Python or R, please check the tutorials here.

Reports

If you are a member of a Municipal Health Department, or a citizen, and you have interest in detailed information on the transmission alerts of your municipality, just type the name of the city or state here.


Where the data comes from

  • Dengue, Chikungunya and Zika data are provided by SINAN as a notification form that feeds a municipal database, which is then consolidated at the state level and finally, federally by the Ministry of Health. Only a fraction of these cases are laboratory confirmed, most receive final classification based on clinical and epidemiological criteria. From the notified cases, the incidence indicators that feed the InfoDengue are calculated.
  • Weather and climate data are obtained from REDEMET in the airports all over Brazil.
  • Epidemiological indicators require population size. Demographic data of Brazilian cities are updated each year in Infodengue using estimates IBGE.

Check out below the softwares we use in the project:

Django postgis docker
celery nginx plotly

pysus's People

Contributors

bcbernardo avatar daid-tw avatar danipj avatar dependabot[bot] avatar esloch avatar fccoelho avatar gabrielmcf avatar jpoehnelt avatar lfnovo avatar luabida avatar mcmrc avatar r3ck avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysus's Issues

Download of large files from SINAN

Since the decision of DATASUS of making only country-wide SINAN files available, on some years the file is to big to fit in the memory of an average personal computer.

We need to implement constant memory download of files so that file size is no longer an issue.

No module named 'pysus.utilities._readdbc'

I'm using ubuntu 18.04 and I have already installed libffi-dev, but when I try to do import from pysus.utilities.readdbc import read_dbc I got this error:

----> 1 from pysus.utilities.readdbc import read_dbc

~/.pyenv/versions/3.7.3/envs/informacao_salva/lib/python3.7/site-packages/pysus/utilities/readdbc.py in <module>
     10 from dbfread import DBF
     11 
---> 12 from pysus.utilities._readdbc import ffi, lib
     13 
     14 

ModuleNotFoundError: No module named 'pysus.utilities._readdbc'```

Support for chunksize

Considering some of SIA files like PA##.dbc, loading them into a dataframe directly basically burns 20gb RAM.

Should we add support for pandas Dataframe's chunksize parameter, to handle this correctly? If so, can you identify some sort of caveat about this approach @fccoelho ?

Remove Demography package from PySUS

The demography subpackage creates substantial installation problems due to its dependency on the GDAL Library.

This issue proposed to offload this subpackage into a separate library in to facilitate the installation and therefore the adoption of Pysus

Excessive memory consumption during SIA-PA download

There are some SIA-PA files wich are huge and it seems that converting from dbc to parquet is consuming too much memory.
Although the following example is somehow related to issue 27 (which occurs when a given month has a lot of data), to simulate the error, I collected the file manually through the sus ftp and called only the function that is failing.

from pysus.utilities.readdbc import dbc2dbf, read_dbc
infile = 'PASP2003a.dbc'
outfile = 'PASP2003a.dbf'

dbc2dbf(infile,outfile)
#exception is raised by the line bellow
df = read_dbc(infile)

I was using google colab and tried to hire the pro version for a month to increase the memory limit, but even so, it wasn't enough.

I don't know how to solve exactly with the dbf file. With csvs and pandas files, I usually tried to read the file in chunks to avoid excessive memory consumption.

Servidor DataSUS

O FTP do datasus está funcionando normalmente nos últimos dias?

Erro no cnes download EQ

Na hora de fazer o download do CNES Equipamentos o ano não faz parte da array de definição, portanto a função que verifica datas válidas da um problema de index fora de alcance.

Sugestão de correção: adicionar a data de 2005

    "EQ" :  ["Equipamentos - A partir de Ago/2005", 8],

SINAN earliest year is hardcoded to 2007

on line 85 of SINAN.py, the year 2007 is hardcoded as the earliest possible year, but this is not the case for all diseases, it must check the return from list_availble_years.

Doubt about downloading SIA files

The result of downloading SIA files cannot be read as dataframes. Instead, they come as a tuple. Why does this happen?

Regards,

Ricardo.

Problem reading the data: File is not available

Tried to follow the example from here: https://pysus.readthedocs.io/en/latest/SINAN.html
It looks like files are no longer available:

"""
Attempt to follow this example:
https://pysus.readthedocs.io/en/latest/SINAN.html
"""
from pysus.online_data import SINAN
import pandas as pd

SINAN.list_diseases()
"""Out[2]:
['Animais Peçonhentos',
'Botulismo',
'Chagas',
'Colera',
'Coqueluche',
'Dengue',
'Difteria',
'Esquistossomose',
'Febre Amarela',
'Febre Maculosa',
'Febre Tifoide',
'Hanseniase',
'Hantavirose',
'Hepatites Virais',
'Intoxicação Exógena',
'Leishmaniose Visceral',
'Leptospirose',
'Leishmaniose Tegumentar',
'Malaria',
'Meningite',
'Peste',
'Poliomielite',
'Raiva Humana',
'Tétano Acidental',
'Tétano Neonatal',
'Tuberculose',
'Violência Domestica']
"""

SINAN.get_available_years('RJ', 'chagas')

#ERROR example:
"""
SINAN.get_available_years('RJ', 'chagas')
Traceback (most recent call last):

File "", line 1, in
SINAN.get_available_years('RJ', 'chagas')

AttributeError: module 'pysus.online_data.SINAN' has no attribute 'get_available_years'
"""
df = SINAN.download('SP',2018,'Chagas')

#ERROR:
"""
df = SINAN.download('SP',2018,'Chagas')
Traceback (most recent call last):

File "/home/yanchik/anaconda3/lib/python3.8/site-packages/pysus/online_data/SINAN.py", line 66, in download
ftp.retrbinary('RETR {}'.format(fname), open(fname, 'wb').write)

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 425, in retrbinary
with self.transfercmd(cmd, rest) as conn:

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 382, in transfercmd
return self.ntransfercmd(cmd, rest)[0]

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 348, in ntransfercmd
resp = self.sendcmd(cmd)

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 275, in sendcmd
return self.getresp()

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 248, in getresp
raise error_perm(resp)

error_perm: 550 The system cannot find the file specified.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/yanchik/anaconda3/lib/python3.8/site-packages/pysus/online_data/SINAN.py", line 69, in download
ftp.retrbinary('RETR {}'.format(fname.upper()), open(fname, 'wb').write)

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 425, in retrbinary
with self.transfercmd(cmd, rest) as conn:

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 382, in transfercmd
return self.ntransfercmd(cmd, rest)[0]

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 348, in ntransfercmd
resp = self.sendcmd(cmd)

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 275, in sendcmd
return self.getresp()

File "/home/yanchik/anaconda3/lib/python3.8/ftplib.py", line 248, in getresp
raise error_perm(resp)

error_perm: 550 The system cannot find the file specified.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "", line 1, in
df = SINAN.download('SP',2018,'Chagas')

File "/home/yanchik/anaconda3/lib/python3.8/site-packages/pysus/online_data/SINAN.py", line 71, in download
raise Exception("{}\nFile {} not available".format(e,fname))

Exception: 550 The system cannot find the file specified.
File ANIMSP18.DBC not available
"""

Tratamento de dados faltantes do SIM

Acredito que o melhor lugar para implementar isso seja no pré-processamento. A princípio farei para 3 variáveis apenas: município, idade e sexo. O plano é distribuir as mortes com dados faltantes na classe pela proporção observada das entradas com dados completos (sou novo na área de saúde, então posso ter falado bobagem, mas foi o que eu entendi...).

As faixas de idade para o cálculo são variáveis, assim como as regiões que serão consideradas.

Criar dataframes geográficos a partir dos dados tratados no SIM

Na verdade já estou trabalhando em uma função que adiciona os dados de um dataframe a um GeoDataFrame do GeoPandas. O usuário escolhe a coluna com as o geocode, as colunas de título (para serem concatenadas e gerarem novas colunas em uma banco geográfico) e a coluna de valor (que preenche os valores dos features da nova tabela geográfica).

Essa abordagem tem uma limitação quando são usadas muitas variáveis e dados universais, pois então há uma explosão de combinações que resulta em uma explosão de colunas. Nesse caso talvez seja melhor utilizar tabelas separadas e fazer junções nos dados, o que pode ser alcançado inclusive com o QGis e similares.

Pysus not finding _readdbc

I'm getting this error when trying to use function read_dbc():

Traceback (most recent call last):
  File "../download.py", line 3, in <module>
    from pysus.utilities.readdbc import read_dbc
  File "/Users/lucas/Monografia/datasus-data-extractor/src-virtualenv-datasus/lib/python3.7/site-packages/pysus/utilities/readdbc.py", line 12, in <module>
    from pysus.utilities._readdbc import ffi, lib
ModuleNotFoundError: No module named 'pysus.utilities._readdbc'

I've already installed cffi.
I also have the files _readdbc.c, _readdbc.cpython-37m-darwin.so and _readdbc.o on the pysus/utilities/.

What could it be?

Direct Download of SUS databases

Implement direct download of databases publicly accessible through the web. We should start with SINASC to complement the DBC decompressing functionality

Add Support to Other tables of SIA

SIA contains other available tables which can be made available through PySUS.
I propose to add a new parameter to the download function of the SIA module, allowing the user to specify which table to download.

Support setting a custom CACHEPATH through an environment variable

Hello and thanks for the awesome lib!

As a small suggestion, I believe it could be somewhat useful to let the user set a location for caching downloaded DataSUS files other than the current default, ~/pysus.

As real use case example, I usually run PySUS in a Google Colab environment, where I always lose downloaded files when the runtime is reset. If I could set a custom path for the cache, I could persist them in the mounted Google Drive filesystem.

I suggest this would be done by the user optionally exporting a PYSUS_CACHEPATH environment variable, which the lib would try to read when setting the CACHEPATH variable. If the environment variable is absent, it would fall back to the default ~/pysus location - effectively leaving the cache location unchanged for existing users.

This should be a small patch in /pysus/online_data/__init__.py file (plus documenting this behavior). If the suggestion is accepted, I could PR this myself.

Função para tradução de valores das variáveis

Pensei de criar uma função de pre processamento para traduzir os valores das variáveis nas descrições das bases. Poderia ser num esquema como da FIOCRUZ, criando variáveis novas com a descrição, ou substituindo mesmo os valores (pode ter uma flag para dar essa opção também), seguindo os dados da estrutura do SIM (PDF em ftp://ftp.datasus.gov.br/dissemin/publicos/SIM/CID10/DOCS/Estrutura_SIM_para_CD.pdf), por exemplo.

Data is not loading correctly from sinan

Missing data for epidemiological weeks when using get_chunked_dataframe function.

Reproduce:

SINAN.download(year, disease, return_fname=True)
...
fetch_pq_fname = glob(f"{self.fname}.parquet/*.parquet")

chunks_list = [pd.read_parquet(f) for f in fetch_pq_fname]

df_from_chunks = pd.concat(chunks_list, ignore_index=True)

print(sorted(df_from_chunks.SEM_NOT.unique()))

['01601', '01602', '01603', '01604', '01605', '01606', '01607', '01608', '01609', '01610', '01611', '01612', '01613', '01614', '01615', '01616', '01617', '01618', '01619', '01620', '01621', '01622', '01623', '01627', '01628', '01629', '01630', '201552', '201601', '201602', '201603', '201604', '201605', '201606', '201607', '201608', '201609', '201610', '201611', '201612', '201613', '201614', '201615', '201616', '201617', '201618', '201619', '201620', '201621', '201622', '201623', '201624', '201625', '201626', '201627', '201628', '201629', '201630', '201631']


ZIKABR16_COUNT_SEM_NOT = df_from_chunks.SEM_NOT.value_counts()
ZIKABR16_COUNT_SEM_NOT.sort_index(axis=0, ascending=False, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)

SEM_N    COUNT
201631      748
201630     1774
201629     2290
201628     2426
201627     2830
201626     3024
201625     3674
201624     4638
201623     6030
201622     7096
201621     6956
201620    10010
201619    11240
201618    14546
201617    18162
201616    17652
201615    22822
201614    25652
201613    28940
201612    25630
201611    35864
201610    38580
201609    43538
201608    46810
201607    39644
201606    21056
201605    24872
201604    20788
201603    18372
201602    16046
201601    16522
201552     1510
01630         2
01629         2
01628         2
01627         4
01623         4
01622         4
01621         4
01620         8
01619         4
01618         4
01617         2
01616        12
01615         4
01614         8
01613        12
01612        12
01611        20
01610         8
01609        24
01608        40
01607        36
01606         8
01605         8
01604         4
01603         6
01602        12
01601         4

The correct data is:

SEM_N    COUNT
201652      469
201651      469
201650      531
201649      571
201648      566
201647      475
201646      439
201645      523
201644      385
201643      411
201642      483
201641      431
201640      372
201639      349
201638      424
201637      470
201636      563
201635      564
201634      747
201633      771
201632      871
201631      953
201630      887
201629     1145
201628     1213
201627     1415
201626     1512
201625     1837
201624     2319
201623     3015
201622     3548
201621     3478
201620     5005
201619     5620
201618     7273
201617     9081
201616     8826
201615    11411
201614    12826
201613    14470
201612    12815
201611    17932
201610    19290
201609    21769
201608    23405
201607    19822
201606    10528
201605    12436
201604    10394
201603     9186
201602     8023
201601     8261
201552      755
01633         1
01630         1
01629         1
01628         1
01627         2
01623         2
01622         2
01621         2
01620         4
01619         2
01618         2
01617         1
01616         6
01615         2
01614         4
01613         6
01612         6
01611        10
01610         4
01609        12
01608        20
01607        18
01606         4
01605         4
01604         2
01603         3
01602         6
01601         2

Add support to MTBR tables of SIH

Inspect MTBR tables to see if they should be made available through PySUS. If so, add it to the APIand provide usage examples in the documentation.

Erro na leitura de DBC

Estou tentando converter dbc com a nova funcionalidade no pacote.
No entanto retorna esse erro para a leitura dos últimos meses do CNES

Poderiam me ajudar?


error Traceback (most recent call last)
in ()
----> 1 df = read_dbc("/content/data/br_ms_cnes/input/STSP2108.dbc", encoding='iso-8859-1')

3 frames
/usr/local/lib/python3.7/dist-packages/dbfread/struct_parser.py in unpack(self, data)
34 def unpack(self, data):
35 """Unpack struct from binary string and return a named tuple."""
---> 36 items = zip(self.names, self.struct.unpack(data))
37 return self.Class(**dict(items))
38

error: unpack requires a buffer of 32 bytes

Refactor FTP operations to a single function

Many of our modules replicate basic ftp file retrieval operations
We should remove all the duplication of code by creating a single function for retrieving files through FTP.
this function should most likely belong to an ftptools module in the utilities package.

Update docs with SIM Example

The functionalities of PySUS to deal with SIM data are very good but so far, are still not documented in our docs.
We should start by including a Notebook with examples just like it has been done for other databases

Erro de instalação no Windows 10

Failed building wheel for PySUS
Running setup.py clean for PySUS
Failed to build PySUS
Installing collected packages: PySUS
Running setup.py install for PySUS ... error
Complete output from command C:\ProgramData\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\CALIXTO\AppData\Local\Temp\pip-install-65y7p2b6\PySUS\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\CALIXTO\AppData\Local\Temp\pip-record-iwk6emqg\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\pysus
copying pysus_init_.py -> build\lib.win-amd64-3.7\pysus
creating build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\CIHA.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\CNES.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\SIA.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\SIH.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\SIM.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data\sinasc.py -> build\lib.win-amd64-3.7\pysus\online_data
copying pysus\online_data_init_.py -> build\lib.win-amd64-3.7\pysus\online_data
creating build\lib.win-amd64-3.7\pysus\preprocessing
copying pysus\preprocessing\decoders.py -> build\lib.win-amd64-3.7\pysus\preprocessing
copying pysus\preprocessing\sinan.py -> build\lib.win-amd64-3.7\pysus\preprocessing
copying pysus\preprocessing_init_.py -> build\lib.win-amd64-3.7\pysus\preprocessing
creating build\lib.win-amd64-3.7\pysus\tests
copying pysus\tests\test_decoders.py -> build\lib.win-amd64-3.7\pysus\tests
copying pysus\tests\test_sinan.py -> build\lib.win-amd64-3.7\pysus\tests
copying pysus\tests\test_utilities.py -> build\lib.win-amd64-3.7\pysus\tests
copying pysus\tests_init_.py -> build\lib.win-amd64-3.7\pysus\tests
creating build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities\readdbc.py -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities_build_readdbc.py -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities_init_.py -> build\lib.win-amd64-3.7\pysus\utilities
creating build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data\test_ciha.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data\test_sia.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data\test_sih.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data\test_sim.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data\test_sinasc.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\tests\test_data_init_.py -> build\lib.win-amd64-3.7\pysus\tests\test_data
copying pysus\utilities\blast.c -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities_readdbc.c -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities\blast.h -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities\blast.o -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities_readdbc.o -> build\lib.win-amd64-3.7\pysus\utilities
copying pysus\utilities_readdbc.cpython-36m-x86_64-linux-gnu.so -> build\lib.win-amd64-3.7\pysus\utilities
running build_ext
generating cffi module 'build\temp.win-amd64-3.7\Release\_readdbc.c'
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
building '_readdbc' extension
creating build\temp.win-amd64-3.7\Release\build
creating build\temp.win-amd64-3.7\Release\build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release\build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\pysus
creating build\temp.win-amd64-3.7\Release\pysus\utilities
creating build\temp.win-amd64-3.7\Release\pysus\utilities\c-src
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ipysus/utilities\c-src/ -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.20.27508\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /Tcbuild\temp.win-amd64-3.7\Release_readdbc.c /Fobuild\temp.win-amd64-3.7\Release\build\temp.win-amd64-3.7\Release_readdbc.obj
_readdbc.c
build\temp.win-amd64-3.7\Release_readdbc.c(523): fatal error C1083: NÆo ‚ poss¡vel abrir arquivo incluir: 'unistd.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe' failed with exit status 2

----------------------------------------

Command "C:\ProgramData\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\CALIXTO\AppData\Local\Temp\pip-install-65y7p2b6\PySUS\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\CALIXTO\AppData\Local\Temp\pip-record-iwk6emqg\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\CALIXTO\AppData\Local\Temp\pip-install-65y7p2b6\PySUS\

refactor SIM module

There some functions in the SIM modules which should actually be on a miscelaneous.py module because they can be useful to analyses not related to SIM. They are:

  • get_CID10_table
  • get_CID9_table
  • get_municipios
  • get_ocupations

Installation Error

Olá

Tudo bem?
Parabéns pela iniciativa!
Por favor, em uma máquina com Ubuntu e Python 3.6 eu tentei instalar, mas tive esses erros abaixo
Eu preciso instalar outras bibliotecas ou outra ação?

Obrigado
Reinaldo

Hello

Congratulations for the initiative!
Please, on a machine with Ubuntu and Python 3.6 I tried to install, but I had these errors below
Do I need to install other libraries or other action?

Thank you
Reinaldo

$ pip install PySUS
Collecting PySUS
Using cached https://files.pythonhosted.org/packages/bf/7b/28dd138e51a3f8774b96b535fb872e08c1f5cf974f53286e11de0d90259c/PySUS-0.1.12.tar.gz
Complete output from command python setup.py egg_info:
Package libffi was not found in the pkg-config search path.
Perhaps you should add the directory containing libffi.pc' to the PKG_CONFIG_PATH environment variable No package 'libffi' found Package libffi was not found in the pkg-config search path. Perhaps you should add the directory containing libffi.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libffi' found
Package libffi was not found in the pkg-config search path.
Perhaps you should add the directory containing libffi.pc' to the PKG_CONFIG_PATH environment variable No package 'libffi' found Package libffi was not found in the pkg-config search path. Perhaps you should add the directory containing libffi.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libffi' found
Package libffi was not found in the pkg-config search path.
Perhaps you should add the directory containing `libffi.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libffi' found
c/_cffi_backend.c:15:17: fatal error: ffi.h: Arquivo ou diretório não encontrado
compilation terminated.
Traceback (most recent call last):
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 169, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 155, in call_command
    self.run_command(cmdname)
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/usr/lib/python3.6/distutils/command/install_lib.py", line 109, in build
    self.run_command('build_ext')
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 75, in run
    _build_ext.run(self)
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "/usr/lib/python3.6/distutils/ccompiler.py", line 574, in compile
    self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
  File "/usr/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
    raise CompileError(msg)
distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 158, in save_modules
    yield saved
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 199, in setup_context
    yield
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 254, in run_setup
    _execfile(setup_script, ns)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 49, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-s_38by4p/cffi-1.11.5/setup.py", line 240, in <module>
  File "/usr/lib/python3.6/distutils/core.py", line 163, in setup
    raise SystemExit("error: " + str(msg))
SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 1104, in run_setup
    run_setup(setup_script, args)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 257, in run_setup
    raise
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 199, in setup_context
    yield
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 170, in save_modules
    saved_exc.resume()
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 145, in resume
    six.reraise(type, exc, self._tb)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/pkg_resources/_vendor/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 158, in save_modules
    yield saved
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 199, in setup_context
    yield
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 254, in run_setup
    _execfile(setup_script, ns)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/sandbox.py", line 49, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-s_38by4p/cffi-1.11.5/setup.py", line 240, in <module>
  File "/usr/lib/python3.6/distutils/core.py", line 163, in setup
    raise SystemExit("error: " + str(msg))
SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-ixt2heob/PySUS/setup.py", line 41, in <module>
    install_requires=['pandas', 'dbfread', 'cffi>=1.0.0', 'geocoder', 'requests']
  File "/usr/lib/python3.6/distutils/core.py", line 108, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/dist.py", line 335, in __init__
    self.fetch_build_eggs(attrs['setup_requires'])
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/dist.py", line 456, in fetch_build_eggs
    replace_conflicting=True,
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/pkg_resources/__init__.py", line 826, in resolve
    dist = best[req.key] = env.best_match(req, ws, installer)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1092, in best_match
    return self.obtain(req, installer)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1104, in obtain
    return installer(requirement)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/dist.py", line 522, in fetch_build_egg
    return cmd.easy_install(req)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 672, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 698, in install_item
    dists = self.install_eggs(spec, download, tmpdir)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 879, in install_eggs
    return self.build_and_install(setup_script, setup_base)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 1118, in build_and_install
    self.run_setup(setup_script, setup_base, args)
  File "/home/reinaldo/Documentos/Code/raspa/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 1106, in run_setup
    raise DistutilsError("Setup script exited with %s" % (v.args[0],))
distutils.errors.DistutilsError: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-ixt2heob/PySUS/

SINAN disease list is incomplete

The SINAN disease list doesn't have Syphilis!
The SINAN system recorded Syphilis notifications since 2007, is there some reason the Syphilis notifications are not available?

Fix failed tests

pysus/tests/test_cnes.py::CNESTestCase::test_fetch_estabelecimentos FAILED
pysus/tests/test_init.py::TestInitFunctions::test_last_update 
pysus/tests/test_sim.py::TestDecoder::test_group_and_count FAILED
pysus/tests/test_sim.py::TestDecoder::test_redistribute_missing FAILED
pysus/tests/test_utilities.py::TestReadDbc::test_read_dbc FAILED
pysus/tests/test_utilities.py::TestReadDbc::test_read_dbc_dbf FAILED
pysus/tests/test_data/test_sinan.py::TestSINANDownload::test_filename_only FAILED

Pandas does not concatenate the parquet files

Unable to concatenate parquet files to dataframe.

disease dengue
year 2020

https://github.com/AlertaDengue/PySUS/blob/master/pysus/tests/test_data/test_sinan.py#L43

48 DENGBR20.parquet/6d35585c41984b459cc3f72986f9aa6c-0.parquet
49 DENGBR20.parquet/f1d9168f79e1431a933faa433859c305-0.parquet
50 DENGBR20.parquet/5d8a203ccca04e87b12c5ef465a4d32d-0.parquet
Killed
pysus/tests/test_data/test_sinan.py::TestSINANDownload::test_chunked_df_size Killed

Reproduce the error:

fn = download(2020, "dengue", return_fname=True)
for i, f in enumerate(glob(f"{fn}/*.parquet")):
    if i == 0:
        df2 = pd.read_parquet(f)
    else:
        df2 = pd.concat([df2, pd.read_parquet(f)], ignore_index=True)

Update download method of SIA data for PA

Currently, DBC files of SIA data for PA have prefixes a,b and c. The following image show that:
image

So, the download method pysus.online_data.SIA.download( estado, ano, mes ) is outdated.

Import error on Mac OS X 10.13.6

That error below occurs when a from pysus.utilities.readdbc import read_dbc using python 3 version:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-7723051aeb1f> in <module>()
----> 1 from pysus.utilities.readdbc import read_dbc

~/.pyenv/versions/3.7.0a3/envs/PySus/lib/python3.7/site-packages/pysus/utilities/readdbc.py in <module>()
     10 from dbfread import DBF
     11 
---> 12 from pysus.utilities._readdbc import ffi, lib
     13 
     14 

ImportError: dlopen(/Users/marcelorsa/.pyenv/versions/3.7.0a3/envs/PySus/lib/python3.7/site-packages/pysus/utilities/_readdbc.abi3.so, 2): Symbol not found: _error
  Referenced from: /Users/marcelorsa/.pyenv/versions/3.7.0a3/envs/PySus/lib/python3.7/site-packages/pysus/utilities/_readdbc.abi3.so
  Expected in: flat namespace
 in /Users/marcelorsa/.pyenv/versions/3.7.0a3/envs/PySus/lib/python3.7/site-packages/pysus/utilities/_readdbc.abi3.so

Função readdbc com geopandas - gera arquivos dbf que não são apagados

A função read_dbc_geopandas gera um arquivo de saida que se mantém até que o computador seja reiniciado.
Isso gera um problema para programas que fazem download de varios arquivos. Por exemplo 5 anos inteiros do CNES todos os arquivos dbfs são mantidos na pasta temporária.
Exemplo de código:

from pysus.online_data import CNES
from itertools import product

estados = ['RO','AC','AM','RR','PA','AP','TO','MA','PI','CE','RN','PB','PE','AL','SE','BA','MG','ES','RJ','SP','PR','SC','RS','MS','MT','GO','DF']
meses = range(1,13)
anos = range(2006,2011)
grupos = ['ST','LT','DC']
for grupo,estado,ano,mes in product(grupos,estados,anos,meses):
    df = CNES.download(grupo,estado,ano,mes)

Função read_dbc_geopandas faz o unlink do arquivo mas ao mesmo tempo gera um .dbf pela variavel out que não é deletado exceto quando o computador é reiniciado

def read_dbc_geopandas(filename, encoding='utf-8'):
    """
    Opens a DATASUS .dbc file and return its contents as a pandas
    Dataframe, using geopandas
    :param filename: .dbc filename
    :param encoding: encoding of the data
    :return: Pandas Dataframe.
    """
    if isinstance(filename, str):
        filename = filename
    with NamedTemporaryFile(delete=False) as tf:
        out = tf.name + '.dbf'
        dbc2dbf(filename, out)
        dbf = gpd.read_file(out, encoding=encoding).drop("geometry", axis=1)
        df = pd.DataFrame(dbf)
    os.unlink(tf.name)

    return df

Sugestão fazer o unlink do dbf ao terminar a leitura do dataframe.

def read_dbc_geopandas(filename, encoding='utf-8'):
    """
    Opens a DATASUS .dbc file and return its contents as a pandas
    Dataframe, using geopandas
    :param filename: .dbc filename
    :param encoding: encoding of the data
    :return: Pandas Dataframe.
    """
    if isinstance(filename, str):
        filename = filename
    with NamedTemporaryFile(delete=False) as tf:
        out = tf.name + '.dbf'
        dbc2dbf(filename, out)
        dbf = gpd.read_file(out, encoding=encoding).drop("geometry", axis=1)
        df = pd.DataFrame(dbf)
        os.unlink(out)
    os.unlink(tf.name)

    return df

Ways to avoid memory crash?

This is more of a doubt than an issue. When I try to load a +60 MB DBC file using PySUS, the 16 GB RAM are not enough and my desktop crashes. Considering I'm used to work smoothly with +3 million lines DataFrames in it, is there something I could do to avoid the crash with PySUS? I considered using multiprocessing, Is this feasible? Thank you

ESUS: __init__() got an unexpected keyword argument 'send_get_body_as'

Hello! I've been trying to acquire data from ESUS using PySUS but I always get this error:

    def test_ingest_pysus_data():
        pai = PysusApiIngestion()
        requested_dataframe = pai.ingest_covid_data(uf = mock_state)

tests/test_ingestion.py:35: 

ingestion/ingest_pysus.py:20: in ingest_covid_data
    dataframe = pd.DataFrame(download(uf=uf))
/usr/local/lib/python3.9/site-packages/pysus/online_data/ESUS.py:36: in download
    fname = fetch(base, uf, url)

base = 'desc-notificacoes-esusve-pe', uf = 'pe', url = 'https://user-public-notificacoes:[email protected]'

    def fetch(base, uf, url):
        UF = uf.upper()
        print(f"Reading ESUS data for {UF}")
        es = Elasticsearch([url], send_get_body_as="POST")
        TypeError: __init__() got an unexpected keyword argument 'send_get_body_as'

/usr/local/lib/python3.9/site-packages/pysus/online_data/ESUS.py:58: TypeError

This is the stack trace. My test function is the one defined in the first def as test_ingest_pysus_data(): .

My code is as follows:

from typing_extensions import Self
import pandas as pd
from pandas import DataFrame, array
from pysus.online_data.ESUS import download



class PysusApiIngestion():
    """A class to ingest data from TABNET/DATASUS/SUS using pysus library"""

    def ingest_covid_data(self, uf: str) -> DataFrame:
        """
        Function to ingest covid data from Tabnet using PySUS

        params:
            uf: brazilian state for ingestion reference
        """
        dataframe = pd.DataFrame(download(uf=uf))
        
        return dataframe

    def __init__(self) -> None:
        """Init method to call class"""

        pass

Is this supposed to be happening? I am using TDD as a design for my code creation.

Update README

  • Add CONDA usage in settings to development environment
  • Add use of make commands to deploy jupyter in container

setup.py not compiling the C extension

The C extension compiles correctly when you run the build script by itself, but not when you call python setup install

It must be something missing in the parameterization of the setup.py

Tabelas do SIM tem geocode de tamanho variado

Abri essa issue para discutir a melhor maneira de resolver essa questão. O código de municípios do IBGE possui 7 dígitos mas em várias tabelas do SIM (SP, 2010, por exemplo) ele é apresentado com apenas 6. Encontrei no site do datasus uma tabela contendo os códigos de municípios utilizados nas bases e creio que talvez o melhor caminho seja seguir essa tabela. A tabela em si pode ser encontrada em ftp://ftp.datasus.gov.br/territorio/tabelas/base_territorial.zip, enquanto o documento com a estrutura está em ftp://ftp.datasus.gov.br/territorio/doc/bases_territoriais.pdf.

Minha sugestão é mudar a função is_valid_geocode para conferir os valores constantes nessa tabela fornecida pelo datasus, ao invés de verificar o sétimo dígito. Esse método tem a vantagem de funcionar também para as tabelas do SIM que possuem apenas 6 dígitos de código de município, que é considerado válido para o datasus.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.