• 0 Posts
  • 50 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle
  • I’m a researcher in ML and LLMs absolutely fall under ML. Learning in the term “Machine Learning” just means fitting the parameters of a model, hence just an optimization problem. In the case of an LLM this means fitting parameters of the transformer.

    A model doesn’t have to be intelligent to fall under the umbrella of ML. Linear least squares is considered ML; in fact, it’s probably the first thing you’ll do if you take an ML course at a university. Decision trees, nearest neighbor classifiers, and linear models all are machine learning models, despite the fact that nobody would consider them to be intelligent.



  • Useless is a strong term. I do a fair amount of research on a single 4090. Lots of problems can fit in <32 GB of VRAM. Even my 3060 is good enough to run small scale tests locally.

    I’m in CV, and even with enterprise grade hardware, most folks I know are limited to 48GB (A40 and L40S, substantially cheaper and more accessible than A100/H100/H200). My advisor would always say that you should really try to set up a problem where you can iterate in a few days worth of time on a single GPU, and lots of problems are still approachable that way. Of course you’re not going to make the next SOTA VLM on a 5090, but not every problem is that big.



  • Exactly, the assumption (known as the inductive hypothesis) is completely fine by itself and doesn’t represent circular reasoning. The issue in the “proof” actually arises from the logic coming after this, in which they assume that they can form two different overlapping sets by removing a different horse from the total set of horses, which fails if n=1 (as then they each have a single, distinct horse).


  • I’m fairly certain blockchain GPUs have very different requirements than those used for ML, especially not LLMs. In particular they don’t need anywhere as much VRAM and generally don’t require floating point math, nor do they need features like tensor cores. Those “blockchain GPUs” likely didn’t turn into ML GPUs.

    ML has been around for a long time. People have been using GPUs in ML since AlexNet in 2012, not just after blockchain hype started to die down.






  • This was also one of my concerns with the hype surrounding low cost SLS printers like Micronics, especially if they weren’t super well designed. The powder is incredibly dangerous to inhale so I wouldn’t want a home hobbyist buying that type of machine without realizing how harmful it could be. My understanding is even commercial SLS machines like HP’s MJF and FormLab’s Fuse need substantial ventilation (HEPA filters, full room ventilation, etc.) in order to be operated safely.

    Metal is of course even worse. It has all the same respiratory hazards (the fine particles will likely cause all sorts of long-term lung damage) but it also presents a massive fire and explosion risk.

    I can’t see these technologies making it into the home hobbyist sphere anytime soon as a result, unfortunately.




  • I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I’ve found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

    I’m aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it’s nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.


  • It might be trivial to a tech-savvy audience, but considering how popular ChatGPT itself is and considering DeepSeek’s ranking on the Play and iOS App Stores, I’d honestly guess most people are using DeepSeek’s servers. Plus, you’d be surprised how many people naturally trust the service more after hearing that the company open sourced the models. Accordingly I don’t think it’s unreasonable for Proton to focus on the service rather than the local models here.

    I’d also note that people who want the highest quality responses aren’t using a local model, as anything you can run locally is a distilled version that is significantly smaller (at a small, but non-trivial overalll performance cost).