Doing the Lord’s work in the Devil’s basement

  • 0 Posts
  • 22 Comments
Joined 1 year ago
cake
Cake day: May 8th, 2024

help-circle












  • When you read that stuff on reddit there’s a parameter you need to keep in mind : these people are not really discussing Lemmy. They’re rationalizing and justifying why they are not on Lemmy. Totally different conversation.

    Nobody wants to come out and say “I know mainstream platforms are shit and destroying the fabric of reality but I can’t bring myself to be on a platform except it is the Hip Place to Be”. So they’ll invent stuff that paints them in a good light.

    You’ll still see people claiming that Mastodon is unusable because you have to select an instance - even though you don’t have to, you can just type Mastodon on Google, click the first link, and create an account in 2 clicks. It’s been ages. But the people still using Twitter need the excuse because otherwise what does it make them?





  • Yeh, i did some looking up in the meantime and indeed you’re gonna have a context size issue. That’s why it’s only summarizing the last few thousand characters of the text, that’s the size of its attention.

    There are some models fine-tuned to 8K tokens context window, some even to 16K like this Mistral brew. If you have a GPU with 8G of VRAM you should be able to run it, using one of the quantized versions (Q4 or Q5 should be fine). Summarizing should still be reasonably good.

    If 16k isn’t enough for you then that’s probably not something you can perform locally. However you can still run a larger model privately in the cloud. Hugging face for example allows you to rent GPUs by the minute and run inference on them, it should just net you a few dollars. As far as i know this approach should still be compatible with Open WebUI.