cross-posted from: https://programming.dev/post/51407459
Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!
cross-posted from: https://programming.dev/post/51407459
Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!
I am regretting having done so much crossposts. It was an impulse, not being sure where to post… I was also interested on seeing what was the take from fellow lemmings. And it’s been enriching to me in that regard. I wonder how much were people really self hosting LLMs… Apparently not that much.