Giter Site home page Giter Site logo

xenoash / coen691-gender-neutralizer Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 1.74 MB

A simple web service that is a RESTful API that converts gender-specific text into gender-neutral text

Python 67.68% HTML 24.76% CSS 5.64% JavaScript 1.92%
django python27 rest beautifulsoup nytimes-apis cloud

coen691-gender-neutralizer's Introduction

gender-neutralizer

Gender Neutralizer is a simple web service that is a RESTful API that simply converts gender-specific text into gender-neutral text. There is another advanced version of this tool that utilizes a NoSQL datastore for the gender dictionary (instead of storing it as a local file) which was tested on Google App Engine (GAE) standard enviornment on the cloud with Python 2.7 and django non-rel package for the NoSQL Google Datastore implementation.

The tool is mashed up with NYT search API to look for NYT articles to convert their text into geneder-neutral text. However, the built-in neutralize function of the tool can basically work with any url that we insert in the url as in the following URL pattern:

/view/[put url that starts with http://]/

The tool also provides a RESTful API functionality by taking user input from the url to generate JSON data that includes the user input, generated gender-neutral output, gender-specific words, and their respective equivalent gender-neutral words.

Inserting user input can be done using the following URL patterns after the domain name, to receive html or JSON-formatted data respectively:

/view/[put userInput here with spaces]

/view/[put userInput here with spaces]/json

Additionally, the service allows to upload text documents to convert their text into gender-neutral form.

Implementation:

The main app contains the following functions which implements the main functionality of the web service.

extract function: This function basically extracts the text of any NYT article chosen by the user or the file content in the case of an uploaded file . To extract the text from NYT articles, we used a python package called Beautiful soup. First, we collect the html content of the page of the article by requesting the URL of the page.

soup = BeautifulSoup(html, 'html.parser')

Next, we decompose any returned tags that we do not need such as header, footer, advertisements, and scripts.

for tag in soup(['script', 'header', 'nav', 'span' ...):
tag.decompose()  

Afterwards, we will be left with the <p> and <a> html tags that contain the contents of the article from which we extract the raw text by the get_text() function of the beautiful soup package.

text = soup.get_text()

neutralize function: This function contains our main algorithm for the project. In this simple version of the tool, the neitralize algorithm is designed to use a dictionary text file stored in proj directory (note that this dictionary can always be modified by updating or deleting its entries) to gender-neutralize any text. We access the dictionary text file with this code snippet

file = open (os.path.join(settings.PROJECT_ROOT,"dictionary.txt"))
gender_to_neutral = {}
for line in file:
	line.strip()
	words = line.split(",")
	key = words[0]
	val = words[1]
	val_length=len(val)-1
	value = val[0:val_length]
	gender_to_neutral[key]=value
file.close()

Once we have this dictionary, we then neutralize the text chosen from NYT or uploaded by the user

ntext = replace_all(text, gender_to_neutral)

coen691-gender-neutralizer's People

Contributors

xenoash avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.