Giter Site home page Giter Site logo

urna-logs-data-eng's Introduction

Processing Logs of Electronic Ballot Boxes

This repository contains Python + DuckDB scripts for processing logs from Brazilian Electronic Ballot Boxes to compute several time-related metrics (mean vote time, number of votes computed in 5min, percentage of biometric identification success).

The Data

The logs from the voting machines can be directly downloaded from the TSE open data website. This repository contains Python scripts that automatically download and extract the logs.

What are the logs of the Electronic Ballot Boxes?

Files that contain all operations performed on the machine, from the initial setup to the end of voting in the second round (if applicable). The files are stored in plain text, with each line representing an event. See an example below:

21/09/2022 17:21:41	INFO	67305985	LOGD	Start of logd operations	                  FDE9B0FC7A079096
21/09/2022 17:21:41	INFO	67305985	LOGD	Machine turned on on 21/09/2022 at 17:20:16	B637C17E565B039B
21/09/2022 17:21:41	INFO	67305985	SCUE	Starting application - Official - 1st round	F82E007ACCAF93A5
21/09/2022 17:21:41	INFO	67305985	SCUE	Application version: 8.26.0.0 - Jaguar	    D499E9A173814A70

With these logs, it is possible to extract numerous pieces of information about the electoral process. Due to their verbosity, the logs of the Ballot Boxes are very heavy. In their original format, the set of log files for a single Brazilian state can range from 2GB to over 50GB, with all the files combined reaching 450GB! Therefore, robust processing tools and optimized file formats are indispensable.

Note on Approximations and Errors

Processing the logs of the voting machines is not a simple task. Although they are easy to read, defining a process that perfectly isolates each vote is a complex task because numerous situations can occur during the voting process.

The scripts coded here attempt to be as generic and simple as possible, to facilitate understanding, maintenance, and reduce the computational cost of processing. Therefore, they may occasionally not capture ALL votes perfectly. The error rate (uncaptured votes) considering the official count from the TSE is ~3% (experiment conducted with RN data).

urna-logs-data-eng's People

Contributors

jaumpedro214 avatar

Stargazers

Malik Salami avatar Sajat Shrestha avatar Kyle G. Lundstedt avatar Ibrahim Sherif avatar naiidiine avatar João Fernando Apel Miguel avatar Yuri H. Galvao avatar  avatar Victor Coutinho avatar Prakash avatar Monika Anna Tomaszewska avatar  avatar  avatar Michael Whitaker avatar Abhishesh Sharma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.