In the space of 1 week, a second open-source Chinese AI model equals the best investors are pouring tens of billions of dollars into.

schizoidman@lemm.ee · 11 months ago

In the space of 1 week, a second open-source Chinese AI model equals the best investors are pouring tens of billions of dollars into.

planish@sh.itjust.works · 11 months ago

Looks like it has 32B in the name, so enough RAM to hold 32 billion weights plus activations (current values for the layer being run right now, which I think should be less than a gigabyte). It is probably made of 16 bit floats to start with, so something like 64 gigabytes, but if you start quantizing it to cram more weights into fewer bits, you can go down to like 4 bits per weight, or more like 16 gigabytes of memory to run (a slightly worse version of) the model.

Avid Amoeba@lemmy.ca · 11 months ago

So you’re telling me there’s a chance.

planish@sh.itjust.works · 11 months ago

I think there are consumer-grade GPUs that can run this on a single card with enough quantization. Or if you want to run it on CPU you can buy and plug in enough DIMMs if you have an only somewhat large amount of money.

Avid Amoeba@lemmy.ca · edit-2 11 months ago

Pulled whatever is available on Ollama by this name and it seems to just fit on a 3090. Takes 23GB VRAM.