cross-posted from: https://programming.dev/post/51407459

Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!