function test for repetitive output dedupulication of LLM
This repository demonstrates three techniques for removing repeating patterns from strings: LZ78, KMP, and Suffix Arrays.
- LZ78: A compression algorithm that adds consecutive duplicate substrings to a dictionary.
- KMP: A string searching (or substring searching) algorithm which pre-processes the pattern to derive a failure function.
- Suffix Arrays: An array of integers giving the starting positions of suffixes of a string in lexicographic order.
import deduplication
text = "Your sample text here..."
dedup_text = deduplication.remove_with_suffix_array(text)
print(dedup_text)
## Example:
text = "圧縮データを短いテキストに変換する圧縮データを短いテキストに変換する圧縮データを短いテキストに変換する"
print(remove_longest_repeating_substring(text))
#output: 圧縮データを短いテキストに変換する