Giter Site home page Giter Site logo

shimo-lab / universal-geometry-with-ica Goto Github PK

View Code? Open in Web Editor NEW
15.0 2.0 1.0 11.02 MB

Discovering Universal Geometry in Embeddings with ICA

Home Page: https://aclanthology.org/2023.emnlp-main.283/

Dockerfile 0.35% Shell 0.62% Python 99.03%
cross-lingual embeddings ica independent-component-analysis interpretability pca principal-component-analysis whitening isotropy emnlp

universal-geometry-with-ica's Introduction

Universal-Geometry-with-ICA

Discovering Universal Geometry in Embeddings with ICA
Hiroaki Yamagiwa*, Momose Oyama*, Hidetoshi Shimodaira
EMNLP 2023

English word embeddings

Heatmap of ICA-transformed word embeddings

heatmap

Cross-lingual embeddings

Heatmaps of ICA-transformed word embeddings

cross-lingual heatmap

Spiky shape of embedding distributions

ica shape

Scatter plots of ICA-transformed word embeddings

English Spanish
ica en ica es
Russian Arabic Hindi Chinese Japanese
ica ru ica ar ica hi ica zh ica ja

Code and Data

  • The code for English embeddings is currently being prepared.
  • For cross-lingual embeddings, dynamic embeddings, and image model embeddings, please refer to the universal directory.

Citation

If you find our code or data useful in your research, please cite our paper:

@inproceedings{DBLP:conf/emnlp/YamagiwaOS23,
  author       = {Hiroaki Yamagiwa and
                  Momose Oyama and
                  Hidetoshi Shimodaira},
  editor       = {Houda Bouamor and
                  Juan Pino and
                  Kalika Bali},
  title        = {Discovering Universal Geometry in Embeddings with {ICA}},
  booktitle    = {Proceedings of the 2023 Conference on Empirical Methods in Natural
                  Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},
  pages        = {4647--4675},
  publisher    = {Association for Computational Linguistics},
  year         = {2023},
  url          = {https://aclanthology.org/2023.emnlp-main.283},
  timestamp    = {Wed, 13 Dec 2023 17:20:20 +0100},
  biburl       = {https://dblp.org/rec/conf/emnlp/YamagiwaOS23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

universal-geometry-with-ica's People

Contributors

shimosan avatar sun-jacobi avatar ymgw55 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sun-jacobi

universal-geometry-with-ica's Issues

Why ica.mixing_?

https://github.com/shimo-lab/Universal-Geometry-with-ICA/blob/92a1c4fd628f2c9457df710b461370fa1ecdcc65/universal/src/crosslingual_save_pca_and_ica_embeddings.py#L254C1-L257C30

ica = FastICA(**ica_params)
ica.fit(pca_embed)
R = ica.mixing_
ica_embed = pca_embed @ R

Hi, authors. Thank you for your work. I would like to ask why here using "ica.mixing_", rather than "ica.components_"? I think pca_embed @ ica.components_ is to transform the original data space into a more independent one and it is equal to "ica.fit_transform". I don't know if I am right when I am reading this sklearn guidance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.