• 5 Posts
  • 338 Comments
Joined 9 months ago
cake
Cake day: March 22nd, 2024

help-circle






  • Their reputation and past reporting is supposed to back up things they state as facts (like assuming that reviews they cite are real) for practicality and brevity. Imagine having to document every bit of background research in a presentable way.

    They could have included screenshots though.

    And the skepticism is healthy. I do personally ‘trust’ Axios (which I read almost daily but regularly double check).












  • Just that bursts of inference for a small model on a phone or even a desktop is less power hungry than a huge model on A100s/H100s servers. The hardware is already spun up anyway, and (even with the efficiency advantage of batching) Nvidia runs their cloud GPUs in crazy inefficient voltages/power bands just to get more raw performance per chip and squeak out more interactive gains, while phones and such run at extremely efficient voltages.

    There are also lots of tricks that can help “local” models like speculative decoding or (theoretically) bitnet models that aren’t great for cloud usage.

    Also… GPT-4 is very inefficient. Open 32B models are almost matching it at a fraction of the power usage and cost, even in servers. OpenAI kind of sucks now, but the larger public hasn’t caught on yet.