Giter Site home page Giter Site logo

Comments (7)

vwilson avatar vwilson commented on July 29, 2024 2

I have a branch using RegexOptions.Compiled in several places. I think this will improve performance in long-lived instances of the GptEncoding class, but it actually runs the extant unit tests slightly slower than the main branch. I'm still playing with it and have several MB of public domain novels to crunch. Maybe a consumer-level option to use RegexOptions.Compiled in a couple places at least would be called for. I will let you know when I have finished.

from sharptoken.

Anurag-RTS avatar Anurag-RTS commented on July 29, 2024 1

Ran a quick encode benchmark with 5500 words input:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3086/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.203
  [Host]     : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
Method Mean Error StdDev
TokenizerEncode 450.5 us 4.61 us 4.31 us
TikTokenEncode 504.1 us 3.82 us 2.99 us
SharpTokenEncode 2,704.0 us 21.60 us 20.20 us

from sharptoken.

dmitry-brazhenko avatar dmitry-brazhenko commented on July 29, 2024 1

Here is an updated benchmarks that I calclulated. Will check what can be improved

Method Categories Data Mean Median Ratio
SharpTokenV1_0_28_Encode Encode 1. (...)57. [19866] 3,374,121.7 ns 3,227,934.8 ns 1.00
TiktokenSharpV1_0_5_Encode Encode 1. (...)57. [19866] 3,415,080.4 ns 3,399,812.1 ns 1.11
TokenizerLibV1_3_2_Encode Encode 1. (...)57. [19866] 2,060,565.1 ns 2,027,039.6 ns 0.63
Tiktoken_Encode Encode 1. (...)57. [19866] 921,540.6 ns 895,316.2 ns 0.28

from sharptoken.

dmitry-brazhenko avatar dmitry-brazhenko commented on July 29, 2024 1

There is a new version that has a significant improvements thanks to @r-Larch 086544d

from sharptoken.

dmitry-brazhenko avatar dmitry-brazhenko commented on July 29, 2024

Hi @lofcz !

Thanks for reaching out!

I planned to implement some performance improvements and compare it with other libs. But not yet. If you have any ideas of improvements, please let me know and I will use them for improvement :)

from sharptoken.

dmitry-brazhenko avatar dmitry-brazhenko commented on July 29, 2024

Hey @UX-Ayoma !

THanks a lot for sharing!
I will explore what can be improved in the lib. I would say that there are lot of stuff to be done.

I started with some easy and added some optimisation for Decoding (#9).
I will explore and implement fixes for Encoding as well.

from sharptoken.

dmitry-brazhenko avatar dmitry-brazhenko commented on July 29, 2024

Fixed encoding here: PR #11.

Will update with benchmarks and more improvements later.

from sharptoken.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.