BALM makes processing batches of documents via large language models easy. Features include:
- Accessible user interface without coding.
- Support for zero-shot and few-shot learning.
- Easy profiling of models on the same task.
- Automatic result aggregation and visualization.
The following installation instructions have been tested with Python 3.10 on Ubuntu Server 22.04.
- Download this repository, e.g., by executing
git clone https://github.com/itrummer/balm
- Make sure that pip is installed:
pip --version
If you get an error message, install pip:
sudo apt-get update
sudo apt install python3-pip
- Change into the
balm
directory and install requirements:
cd balm
sudo pip install -r requirements.txt
From the balm
directory, execute:
./start.sh
You should see the message You can now view your Streamlit app in your browser.
, followed by a Network URL and an External URL. Enter the Network URL to access a local BALM installation and the External URL to access a remote BALM server. If using BALM remotely, make sure that port 8501 is reachable. E.g., when running BALM on an Amazon EC2 instance, change the Inbound Rules by adding a custom TCP rule for port 8501.
We will introduce the BALM interface by an example scenario. You find the example data here. It is a .csv file containing 100 movie reviews in the first column. We will use language models to map reviews to a sentiment (positive or negative) in the following. This example uses language models by OpenAI and requires a corresponding account (see here).
- Open the BALM interface in your Web browser (this example was tested on Google Chrome but most browsers will work).
- Click on the
Credentials
box. Copy your OpenAI API key into the corresponding field (see here). - Click on the
Models
box. Leave the default (1) for the number of models. E.g., select the gpt-3.5-turbo model. - Enter a task description in the prompt field. For instance:
Is the sentiment positive (Yes/No)?
- Optionally, specify examples to increase output quality (few-shot learning). Click on the
Examples
box, choose the number of examples, then enter example input and output. E.g.:
Example input: This movie was really bad.
Example output: No
- Select CSV for the input type (the default), then click on
Browse files
and select the input file you previously downloaded. - Movie reviews are stored in the first column, therefore use 0 (default) for the column index (we count starting from zero).
- Optionally, restrict the number of reviews to process by clicking on the
Limit rows
checkbox and setting a maximal number. - Click on the
Process Data
button to start processing.
You will see results in the result table as they become available from the language model. After processing all input, BALM automatically generates several aggregate statistics. Click on the Output Distribution
box to obtain a visualization, showing how often specific outputs were produced by the language model. The Model Agreement
box is only interesting if multiple models are applied to the same data (see the following section). Finally, you can download results as .csv file by clicking on the Download Results
button. Note that this will erase the current results and reset the interface.
OpenAI and other providers offer language models in different sizes. Using smaller models is often hundreds of times cheaper (per token) than using large models like GPT-4. To avoid overpaying, it is good practice to compare the output of different models on a data sample before processing a large batch.
BALM makes comparing models easy:
- Continuing with the previous example, click on the
Models
box and increase the number of models to two. Select a cheap model likeAda
in addition togpt-3.5-turbo
. - Select the same .csv file as before and check the
Limit rows
checkbox to limit the number of rows, e.g., to ten. Then click onProcess Data
again to start processing.
The result table now contains one column for each of the selected columns (the column header is the model name). In that column, you find the output generated by the corresponding columns.
After processing, click on Model Agreement
to see aggregate statistics on output consistency between models. The section contains a table with rows and columns labeled by model names. In each cell, you find the ratio of input documents for which the two models produced exactly the same output. If this ratio is close to one, both models can be used interchangeably. In those cases, select the cheapest of all equivalent models.
You can add new models by changing the models.json
file in the configuration
folder. Each entry maps a model label (shown in the model selection drop-down menu) to a model description. This description is a dictionary with the following keys:
- name: the model name assigned by the provider (which may differ from the label).
- provider: currently, this has to be set to "OpenAI" (support for other providers is coming soon).
- type: either "chat" for chat models or "default".
By default, BALM restricts the size of input files to 10 MB. You can change that number in the file .streamlit/config.toml
.
Please cite the following paper to refer to BALM:
@article{Trummer2023balm,
author = {Trummer, Immanuel},
journal = {CoRR},
title = {{BALM: Batch Analysis with Language Models}},
year = {2023}
}