deleted by creator
I’d guess that the ‘realtime’ is a quote from StabilityAI and of course they’re running that stuff on an A100. A couple of seconds is still interactive rate as generally speaking you want to think about the changes you’re making to your conditioning.
Haven’t tried yet but if individual steps of XL Turbo take ballpark as much time as LCM steps then… well, it’s four to eight times faster. As quality generally isn’t production-ready we’re generally speaking about rough prompt prototyping, testing out an animation pipeline, such stuff, but that has the caveat that increasing step size often leads to markedly different results (complete change of composition, not just details) so the information you gain from those preview-quality images is limited.
Oh, “production ready quality”: image quality being roughly en par with 4-step LCM means that it’s nowhere near production grade. For the final render you still want to give the model more steps. OTOH I’ve found that some LCM-based merges do in 30 steps what other models need 80 steps for so improvements are always welcome. But I’m also worried about these distilled models being less flexible, pruning only slightly trodden paths that you actually might want the model to take.
EDIT: Addendum: I’m not seeing anything about using this stuff as a Lora. The nice thing about LCM is that you can take any model you have on your disk and turn it pretty much instantly into a model that can generate fast previews. Also, VAE decoding already can be slower than generation with LCM, so, yeah. I guess having something in between the full VAE and TAESD would be nice, TAESD is fast but is quite limited both when it comes to details, so much that you might not even be able to see what kind of texture SD generated. Oh and it also tends to get colours wrong, at least in my experience it tends to be oversaturated.
Well, it is technically as fast as you can type if you’re running a better GPU. The 3060 is pretty mid-tier at this point.
Low end card.
I’ll get crucified for saying that because people will interpret that as an attack on their PC or something daft like that. It’s not.
It’s Ampere, a GPU architecture from 3.5 years ago. And even then, here’s what the desktop stack was like:
-
3090 Ti (GA102)
-
3090 (GA102)
-
3080 Ti (GA102)
-
3080 12GB (GA102)
-
3080 (GA102)
-
3070 Ti (GA102/GA104)
-
3070 (GA104)
-
3060 Ti (GA104/GA103)
-
3060 (GA106/GA104)
-
3050 (GA106/GA107)
It was almost at the bottom of Nvidia’s stack 3 years ago. It was a low end card then (because, you know, it was at the bottom end of what they were offering). It’s an even more low end card now.
People are always fooled by Nvidia’s marketing and thinking they’re getting a mid range card when in reality Nvidia’s giving people the scraps and pretending they’re giving you a great deal. People need to demand more from these companies.
Nvidia takes a low end card, slaps a $400 price tag on it, calls it mid range, and people lap it up every time.
I know it’s low-end when compared to the newer generations but if we call a 3060 low-end then what do we call people with older GPUs like a 1070?
Should we not compare the 3060 against its own generation/the current one? To me that makes more sense than including the 1000 series or 900 series or something. How far would we go back? Are all cards sold now high end because they’re faster than a GTX 960? Earlier?
Personally my cut off was cards still on sale either right now or very recently, say within the past year.
-
deleted by creator
I’m on a 3060 and with 4x upscaling it takes about a second and a half.
This isn’t free BTW folks
I haven’t messed with any AI imaging stuff yet. And free recommendations to just have some fun?
Bing and Open AI still and free stuff. Bing’s is actually really good.
This is great news for people who make animations with deforum as the speed increase should make Rakile’s deforumation GUI much more usable for live composition and framing.
Great, even more online noise that I can look forward to.
And the resulting faces still all have lazy eyes, asymmetric features, and significantly uncanny issues.
Humans have asymmetric features. No one is symmetrical
These features are abnormally asymmetric to the point of being off-putting. General symmetry of features is a significant part of what attracts people one to another, and why facial droops from things like Bells Palsy or strokes can often be psychologically difficult for the patient who experiences them.
General symmetry, not exact symmetry.
Anecdote: I think Denzel Washington is supposed to have one of the most symmetrical faces.
That’s impressive
Does it actually run any faster though? For instance, if I manually spun a model with the diffusers library and ran it locally on dml, would there be any difference?
Edit: Assuming we’re normalizing the output to something reasonable, e.g. a recognizable picture of a dog.