Giter Site home page Giter Site logo

weat-wefat's Introduction

WEAT

Replicação de "Implementation of WEAT (and WEFAT) (Caliskan et al., 2017)."

Como usar:

Primeiro, visite http://nilc.icmc.usp.br/nilc/index.php/repositorio-de-word-embeddings-do-nilc para baixar o modelo pré-treinado em português. Os embeddings na pasta data usa GloVe de 300 dimensões. Baixe o arquivo txt e coloca no diretório raiz.

Instale os pacotes em requirements.txt com pip install.

Depois rode o comando com:

python main.py --data_file_name DATA_FILE_NAME --embedded_data_file_name EMBEDDED_DATA_FILE_NAME --glove_file_name GLOVE_FILE_NAME --wefat_association_file_name WEFAT_ASSOCIATION_FILE_NAME --test TEST --iterations N --distribution_type DISTRIBUTION_TYPE

Onde:
DATA_FILE_NAME: nome dos arquivos de palavras-alvo/atributos. Neste exemplo: os arquivos data/weat_...json para rodar no teste WEAT ou data/wefat_1.json para rodar o teste WEFAT.
EMBEDDED_DATA_FILE_NAME: nome do arquivo onde é armazenado os embeddings das palavras dos testes. Informe um nome identificador, que o algoritmo irá criar caso não existir. (Neste caso, não é necessário criar à priori)
GLOVE_FILE_NAME: nome do arquivo txt do GloVe; obrigatório caso o arquivo de EMBEDDED_DATA_FILE_NAME não exista
WEFAT_ASSOCIATION_FILE_NAME: Arquivo usado para o teste WEFAT onde contém o mapeamento das profissões para dos dados do mundo real que você deseja analisar. Neste exemplo, data/wefat_1_percentage_women.json utiliza as proporções de mulheres no mercado de trabalho brasileiro.
TEST: WEAT ou WEFAT
N: número de interações para calcular o valor-p.
DISTRIBUTION_TYPE: tipo de distribuição usada para calcular o valor-p (normal ou empirical)

WEAT

Mostra o effect size e o valor-p.

WEFAT

Mostra o gráfico da teste estatístico com as estatísticas do mundo real junto com o coeficiente de correlação de Pearson. Caso o wefat_assocation_file_name foi especificado. Caso contrário, mostra o effect size e o valor-p para cada palavra de atributo.

Referências

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230

Pennington, J. (2014). GloVe: Global Vectors for Word Representation. Stanford.edu. https://nlp.stanford.edu/projects/glove/

Guide to Using Pre-trained Word Embeddings in NLP. (2021, June 1). Paperspace Blog. https://blog.paperspace.com/pre-trained-word-embeddings-natural-language-processing/#using-glove-word-embeddings

U.S. Bureau of Labor Statistics. (2019, January 18). Employed persons by detailed occupation, sex, race, and Hispanic or Latino ethnicity. Bls.gov. https://www.bls.gov/cps/cpsaat11.htm

Toney-Wails, A., & Caliskan, A. (2021). ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across Languages and Over Centuries. ArXiv:2006.03950 [Cs]. https://arxiv.org/abs/2006.03950

Caliskan, A. (2017). Replication Data for: WEFAT and WEAT(UNF:6:C9yaa+UeGfFmtuHz734iKw==, V2). [Data set]. Harvard Dataverse. https://doi.org/10.7910/DVN/DX4VWP

weat-wefat's People

Contributors

e-mckinnie avatar nandayot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.