Giter Site home page Giter Site logo

spartypkp / legalai Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 4.4 MB

LegalAI is a passion project which explores and simplifies the complexities of obtaining legal information using LLMs.

Python 17.68% Jupyter Notebook 55.21% HTML 26.53% CSS 0.58%
embeddings-similarity legal legal-analytics-and-data-science openai postgresql psycopg2 python3 scraping-python

legalai's Introduction

legalAI (Name subject to change)

created by Will Diamond

1. What is legalAI?

LegalAI is a passion project which explores the applications of novel generative AI techniques to simplify the complexities of obtaining accurate legal information. I have created an AI program called Abe who is able to answer complex user legal questions. Abe will take in a user query, search for all relevant legal documentation, and create an in-depth answer which is returned to the user. Below is an example question answered by Abe.

Example Question: Can I smoke cannabis?

Abe's Answer:

Understanding the Legality of Smoking Cannabis

TLDR

According to Cal. HSC § 11362.1, individuals aged 21 and older can smoke cannabis under specific conditions. However, applicable laws place numerous restrictions discussed below.

Rights Granted To Users

Possessing, Processing, And Transporting Cannabis General Possession:

According to Cal. HSC § 11362.1, you can possess, process, transport, purchase, or give away up to 28.5 grams of cannabis, and up to eight grams in its concentrated form under state and local law.

Growing Your Own Cannabis:

Cal. HSC § 11362.1 also permits possessing, planting, cultivating, harvesting, drying, or processing up to six living cannabis plants and possessing the product of these plants.

Using Cannabis Accessories:

Cal. HSC § 11362.1 also gives rights to individuals over 21 to possess, transport, purchase, obtain, use, manufacture, or give cannabis accessories to individuals 21 years of age or older.

Consuming Cannabis Cannabis Consumption:

Cannabis and cannabis products can be consumed under Cal. HSC § 11362.1.

Nonmedical Consumption:

According to Cal. CIV § 1550.5, users over 21 years can consume nonmedical marijuana if they comply with California law, local standards, and regulations.

Cannabis Usage Restrictions

Restrictions on Public Consumption Prohibited Public Spaces:

Cal. HSC § 11362.3 & § 11362.45 restrict the smoking or ingestion of cannabis/ cannabis products in public places, where tobacco smoking is prohibited, within facilities or institutions, and on state or local government property.

Restricted Areas Near Children's Centers:

According to Cal. HSC § 11362.3, cannabis cannot be consumed within 1,000 feet of a school, day care center, or youth center while children are present.

Restrictions on Operating Vehicles While Operating a Vehicle:

Cal. HSC § 11362.3 & Cal. HSC § 11362.45 prohibit smoking or ingesting cannabis or cannabis products while driving a vehicle, boat, vessel, or aircraft. A user also cannot possess an open container or package of cannabis or cannabis products while operating a vehicle.

Workplace Restrictions At Work:

As per Cal. HSC § 11362.45 & Cal. GOV § 12954, employers have the right to maintain a drug and alcohol-free workplace, and they can prohibit the use of cannabis by employees and prospective employees.

On Job Sites:

According to Cal. HSC § 11362.785, a user cannot smoke cannabis on the job or on the premises of their place of employment during work hours.

Restrictions on Cultivations Personal Cultivations:

Per Cal. HSC § 11362.2, restrictions apply to cultivating cannabis within a person's private residence. Such cultivation must be done in a locked space not visible from a public place.

Other Restrictions Cannabis Use in Healthcare Facilities:

As per Cal. HSC § 1649.2, you cannot smoke or vape cannabis in a healthcare facility.

Underage Restrictions:

According to Cal. BPC § 26140, no one under 21 years of age can purchase or possess cannabis or cannabis products.

Exceptions to Restrictions

Medical Usage Medical Exception:

Cal. HSC § 49414.1 allows a parent or guardian of a qualified patient to administer medicinal cannabis at a schoolsite, following certain conditions. Cal. HSC § 11362.765 lets qualified patients or their caregivers transport, process, administer, deliver, or give medicinal cannabis.

During Probation or Bail:

As per Cal. HSC § 11362.795, a user who qualifies for medicinal cannabis use can request the court to confirm their right to use cannabis while on probation, bail, or released on parole.

Local Jurisdictional Regulation

Local Rules

According to Cal. BPC § 26200 & Cal. HSC § 11362.83, local jurisdictions can enact ordinances to regulate the location, operation, or establishment of a medicinal cannabis cooperative or collective. Also, local jurisdictions can adopt rules to regulate businesses licensed under this division. According to Cal. BPC § 26322, local jurisdiction cannot prohibit the purchase and delivery of medicinal cannabis, but can enforce reasonable regulations related to zoning requirements, security, public health and safety, licensing, and taxes.

User Discrimination Protection

Employment Discrimination

Cal. GOV § 12954 protects users from discrimination in hiring, termination, or any term or condition of employment based on that use.

Cannabis Testing Quality Control:

Under Cal. BPC § 26104, users may have cannabis products tested for quality control purposes by a licensed testing laboratory.

Accommodating Medical Needs Medical Needs Exception:

Cal. HSC § 11362.77 states that if a physician recommends that the standard amount does not meet the patient's needs, they may possess an amount of cannabis consistent with the patient's needs. The local counties may also have guidelines allowing patients or primary caregivers to exceed the state limits.

Project Goals

  1. Allow for smarter legal information search which includes more than just a simple answer to the question: "Is X legal?". Given a simple legal topic or question, legalAI will be able to provide all useful information about the topic, answered by GPT 4 following the format of a "Universal Question Answer". These universal questions are outlined in the universal answer document and will eliminate the need for followup queries of basic information. In one "run" of the program, a user should be able to ask a simple question and receive an answer which includes everything they want to know (even though they may not have directly asked for it).
  2. Simplify the surprisingly difficult task of searching for legal information. Complex legal questions about very specific topics can be hard to find on the internet and combing through the legal code is no easy task (trust me). LegalAI strives to provide accurate legal information to someone regardless of their legal knowledge or technological ability.
  3. Show exact source text and use citations when answering a complex legal question. It's important to me that our process to answer legal questions is transparent and we show exactly where our answers are found in official legislation. Answers to users that incorporate multiple ideas from different sections will cite those different sections in-line. After each answer we will provide to the user links and references to the exact text and legislation referred to in the answer, if they so wish to read for themselves.
  4. Remove the barriers to access legal information. Aware of legal information or not, citizens are governed by a multitude of laws which can be difficult to comprehend. I believe it's important for everyone to be able to find out exactly what law, statues, regulations, or legislation applies to them wherever they may be.

The ultimate goal of the current iteration of legalAI is to create an AI chatbot capable of conversing with a user, summarizing and extracting relevant text, and answering all legal questions with legal information to the best of its ability. All source material is directly scraped from the official California Legislature Legal Code. Exact text can be provided, along with citations, and instructions on how to find the information on the official documentation itself. In uncertain times it's difficult to fully understand all your rights as a citizen of the United States. LegalAI's goal is to be a tool for everyday citizens to provide accurate legal information easily, quickly, and with the ability to answer questions about every official piece of legislation that affects you.

2. What is legalAI not?

LegalAI is NOT a replacement for a licensed legal professional.

LegalAI is NOT intended or designed to do the following:

  1. Provide any legal advice.
  2. Give recommendations or instructions on a legal course of action.
  3. Be used in a legal defense.
  4. Replace a human lawyer, attorney, or licensed legal professional.
  5. Give advice on ANYTHING to do or say in a legal court of law.

LegalAI is simply intended as a tool to provide legal information, nothing more. Currently it's at the proof of concept stage, but I am excited to be able to work on it and advance the project to its goals.

3. Project Status

Legal AI is still in early development. So far, the entirety of the California Legal Code has been scraped from the official .gov website documentation (https://leginfo.legislature.ca.gov/faces/codes.xhtml). Using python, the data is read in, cleaned, and features are extracted from the raw text. Definitions, addendums, and section titles are extracted from the text and stored in their own columns. Each row in the database corresponds to a distinct section of the California Legal Code. Sections can be considered "leaf nodes" of the legal code tree, as a general rule being the smallest divisible piece of text. Codes are at the top level of the legal tree, followed by divisions/titles/.../Chapters which all are considered "parent" sections of a given section. All parent section's values are a useful positional identifier for a given leaf-node section. After cleaning and extraction, rows are inserted into a PostgreSQL table through the python package Psycopg2. Below is an example row containing the first section in the California constitution.

ID Code Division Title Part Chapter Article Section Raw Text Excluding Addendnum/Definitions Addendnum - Date Added Link
1 CONS 0 0 0 0 I 1 All people are by nature free...blahblah (added Nov. 5, 1974.. htt..

Headers for parent sections are given their own rows. After scraping and inserting to PostgreSQL, we have a table with 178,564 unique rows corresponding to actual sections in the California Legal Code.

Some quick analysis of the California Legal Code shows two things:

  1. 99% of sections are short in length measured in tokens. log_token_distribution
  2. Certain codes contain MUCH more content than others. section_code_distribution

It's interesting to help us understand the "normal" structure and format of a single law. This will be useful later when designing GPT prompts at scale, ie: Focus on finetuning prompts for dealing with 99% of the data formatting, worry about outliers later.

The next step is preparing the dataset for more efficient search and retrieval. Using OpenAI's embedding model "text-embeddings-ada-002", vector embeddings are automatically created for each section's:

  1. Raw section text
  2. Legal definitions applying specificially to this section and definitions applying to ALL parent sections.
  3. Titles of section and parent sections as if you were traversing the tree top to bottom. a) For section X, the path would look like("Code BPC, Division 10, Title 0, Part 0, Chapter 22, Article 7 , Section X") b) The title path trace would be ("Business Professions Code, Cannabis, Cannabis Cooperative Associations, Powers, Section X")

Below is a flowchart showing the flowchart of one "run" of the system. This flowchart assumes all previous data and embedding collection is complete.

Flowchart

I'm hoping to get a working prototype out within the next few weeks. Check back in at a later date or feel free to hit me up on linkedin.

4. A guide to installation and use

TO DO: The project is currently not in a fully working state.

5. A list of technology used and any links to further information related to this technology

Built in Python 3.8.9 in Visual Studio Code. Major python packages: psycopg2, openai, tokenify, beautifulSoup.

  • TODO: More version information PostgreSQL is used to store the database after scraping and cleaning.
  • TODO: More version information OpenAI: OpenAI is used to get text embeddings from the text-embeddings-ada-02 model. GPT 3.5 Turbo (maybe 4.0) will be used in the near future to incorporate embeddings into requests for legal information.

6.Open-source projects that the developers independently modify or expand should be contained in a section on “desired collaboration” in the readme.md file. How should problems be handled? How should developers advance the changes?

Hypothetical Document Embedding HyDE: (https://arxiv.org/pdf/2212.10496.pdf)

7. Known bugs and any bug fixes

A lot. Will update later upon public release.

8. FAQ section with all previously asked questions

Will update later. Please reach out to me on LinkedIn if you have any questions, I would love to talk about the project! Please hire me.

9. Copyright and licensing information

Will update later.

legalai's People

Contributors

spartypkp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.