Giter Site home page Giter Site logo

icpsr's Introduction

ICPSR

This repo supports ICPSR NLP & Text Mining 2024 Summer Topic Workshop.

Software requirements

This course requires R, R-studio both freely available.

Additionally on day 3, we will explore programmatic requests to "local LLMs". This requires LM-Studio. However students with Intel Macs cannot use this software. Only Mac M1,2,3, Windows and Linux machines are supported. Students with this issue should install Ollama as an alternative. If you use Ollama instead of lm-studio, you will have to adjust our class code based on the example below and possibly install a user interface such as webui

Students with older computers, old GPUs or a small amount of RAM, may not be able to execute this portion of the lesson

For students needing Ollama (less preferred), download it then perform the tasks at the end of this readme.

Lessons

Day1:

  • R Setup & Logistics
  • What is NLP, git, r syntax, r-studio?
  • Preprocessing steps: string manipulation, term frequency, Chapter 1

Day2:

  • Bag of Words DTM/TDM
  • Visualizations: wordclouds, histograms, pyramid plots, word networks, dendrograms, associations, dendrograms - “Homework” HW1-Basics of R Coding

Day3:

  • Basic sentiment analysis with lexicons
  • Basic document clustering
  • LLM Basics, prompting & (time permitting) prompt chains/agentic workflows

Packages for R

R is customized for specific functions using libraries or packages. In this class we will use the following packages. Once you have R and R studio installed run the following command in your console. Don't worry if you struggle, on day 1 we will set aside time to help though we aren't performing technical support.

# Install library pacman
install.packages('pacman')

# Use pacman to install other libraries)
pacman::p_load(dplyr, ggplot2, ggthemes, igraph, networkD3, qdapRegex, slam, stringi, stringr, tm)

Please install lm studio. If you have to install ollama instructions are below.

For students unable to use LM Studio here are some set up and testing instructions for Ollama.

  1. In terminal

run ollama

  1. Install a small llm for testing, takes a few minutes.

ollama run gemma:2b

  1. You will see a prompt in your terminal like this. You can ask a simple question, "What is the capital of France?" in the terminal.

>>> Send a message (/? for help)

>>> What is the captial of France?

  1. Next, let's perform a programmatic request while Ollama is running. Open R and try this code. If you get a response in R, your instance is working as intended. You will have to adjust our class code to fit this example API request which is slightly different. For additional help, this is a great site to help convert CURL requests to multiple languages.
# Libraries
library(httr)
library(jsonlite)

# Inputs
prompt <- "What is the capital of France?" 

# API call inputs
headers <- c(`Content-Type` = "application/json") 
data    <- list(model = "gemma:2b", # Be sure to change to the model name you're using
                prompt = prompt)

# API Request
res <- httr::POST(
  url = "http://localhost:11434/api/generate", 
  httr::add_headers(.headers=headers), 
  body = jsonlite::toJSON(data, auto_unbox = TRUE), 
  encode = "json")

# Parse the streaming in JSON
llmResponse <- httr::content(res,as = "text", encoding = "UTF-8")
llmResponse <- strsplit(llmResponse, "\n")[[1]]
llmResponse <- lapply(llmResponse, fromJSON)
llmResponse <- paste(unlist(lapply(llmResponse, '[', 'response')), collapse = '')
llmResponse
  1. To exit Ollama in terminal run this command. The API will still be running so you could still execute step 4.

>>> /bye

  1. To stop Ollama in terminal run this command. The in the upper toolbar, there is a llama icon. You have to click the icon and stop running it

brew services stop ollama

If that doesn't work try running in terminal. This will return a number.

$ pgrep ollama

To kill that process take the number presented and use this command in terminal.

kill 74877

  1. To fully uninstall and remove Ollama use this site, though commands for mac are slightly different and are cited here.

icpsr's People

Contributors

kwartler avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.