Biophilic

Turd Ferg@sh.itjust.works · edit-2 3 months ago

Biophilic

Turd Ferg@sh.itjust.works · 3 months ago

I never got the dog.

rnercle@sh.itjust.works · 3 months ago

did you try inpainting the dog?

Turd Ferg@sh.itjust.works · 3 months ago

Can you do that with artbot? Im using the free version.

tal@lemmy.today · edit-2 3 months ago

I’m not familiar with Artbot.

investigates

Yes, it looks like it supports inpainting:

https://tinybots.net/artbot/create

Look down in the bottom section, next to “Image-to-image”.

That being said, my experience is that inpainting is kind of time-consuming. I could see fine-tuning the specific look of a feature – like, maybe an image is fine except for a hand that’s mangled, and you want to just tweak that bit. But I don’t know if it’d be the best way to do this.

I don’t know if this is actually true, but I recall reading that prompt term order matters for Stable Diffusion (assuming that that is the model you are using; it looks like ArtBot lets you select from a variety of models). Earlier prompt terms tend to define the scene. While I’ve tended to do this, I haven’t actually tried to experiment enough to convince myself that this is the case. You might try sticking the “dog” bit earlier in the prompt.
If this is Stable Diffusion or an SD-derived model and not, say, Flux, prompt weighting is supported (or at least it is when running locally on Automatic1111, and I think that that’s a property of the model, not the frontend). So if you want more weight to be placed on a prompt term, you can indicate that. Adding additional parentheses will increase weight of a term, and you can provide a numeric weight: A cozy biophilic seaport village. In the distance there are tall building and plants. There are spaceships flying above. In the foreground there is a cute ((dog)) sitting on a bench. or A cozy biophilic seaport village. In the distance there are tall building and plants. There are spaceships flying above. In the foreground there is a cute (dog:3) sitting on a bench.
In general, my experience with Stable Diffusion XL is that it’s not nearly as good as Flux at taking in English-language descriptions of relationships between objects in a scene. That is, “dog on a bench” may result in a dog and a bench, but maybe not a dog on a bench. The images I tend to create with Stable Diffusion XL tend to be a list of keywords, rather than English-language sentences. The drawback with Flux is that it’s heavily weighted towards creating photographic images, and I’m guessing, from what you submitted, that you’re looking more for a “created by a graphic artist” look.

EDIT: Here’s the same prompt you used fed into stoiquoNewrealityFLUXSD35f1DAlphaTwo, which is derived from Flux, in ComfyUI:

Here it is fed into realmixXL, which is not derived from Flux, but just from SDXL:

The dog isn’t on the bench in the second image.

Turd Ferg@sh.itjust.works · 3 months ago

Thank you so much, that was incredibly educational. The stoiquoNewrealityFLUXSD35f1DAlphaTwo version is what I was picturing in my head.

tal@lemmy.today · edit-2 3 months ago

It does look like they have at least one Flux model in that ArtBot menu list of models, so might try playing around with that, see if you’re happier with the output. I also normally use 25 steps with Flux rather than 20, and the Euler sampler, both of which it looks like it can do.

EDIT: Looks like for them, “Euler” is “k_euler”.

tal@lemmy.today · 3 months ago

Also responding in response to a private message in hopes that some information might be useful to others:

To be honest, I understood about half of it haha.

rubs chin

So, I’m not sure what bits aren’t clear, but if I had to guess as to terms in my comments, you can mostly just search for and get a straightforward explanation, but:

inpainting

Inpainting is when you basically “erase” part of an already-generated image that you’re mostly happy with, and then generate a new image, but only for that tiny bit. It’s a useful way to fine-tune an image that you’re basically happy with.

“Image-to-image”.

That’s an Automatic1111 term, I think. Oh, Automatic1111 is a Web-based frontend to run local image generation, as opposed to ArtBot, which appears to be a Web-based frontend to Horde AI, which is a bunch of volunteers who donate their GPU time to people who want to do generation on someone else’s GPU. I’m guessing that ArtBot got it from there.

Automatic1111 is was widely-used, and IMHO is easier to start out with, but ComfyUI, which has a much steeper learning curve but is a lot more powerful, is displacing it as the big Web UI for local generation.

Basically, Automatic1111, as it ships without extensions, has two “tabs” where one does image generation. The first is “text-to-image”. You plug in a prompt, you get back an image. The second is “image-to-image”. You plug in an image and a prompt and process that image to get a new image. My bet is that ArtBot used that same terminology.

prompt

This is just the text that you’re feeding a generative image AI to get an image. A “prompt term” is one “word” in that.

Stable Diffusion

This is one model (well, a series of models). That’s what converts your text into an image. It was the first really popular one. Flux, which I referenced above, is a newer one. It’s possible for people who have enough hardware and compute time to create “derived models” — start from one of those and then train models on additional images and associated terms to “teach” them new concepts. Pony Diffusion is an influential model derived from Stable Diffusion, for example.

A popular place to download models — the ones that are freely distributable — for local use is civitai.com. That also has a ton of AI-generated images and shows the model and prompts used to generate them, which IMHO is a good way to come up to speed on what people are doing.

Horde AI — unfortunately but understandably — doesn’t let people upload their own models to the computers of the people volunteering their GPUs, so if you’re using that, you’re going to be limited to using the selection of models that Horde has chosen to support.

Models have different syntax. Unfortunately, it looks like ArtBot doesn’t provide a “tutorial” for each or anything. There are guides for making prompts for various “base” models, like Stable Diffusion and Flux, and generally you want to follow the “base” model’s conventions.

SD

A common acronym for “Stable Diffusion”.

sampler

So, the basic way these generative AIs work is by starting with what amounts to being an image full of noise – think of a TV just showing static. That static is randomly-generated. On computers, random numbers are usually generated via pseudo-random number generators. These PRNGs start with a “seed” value, and that determines what sequence of random numbers they come up with. Lots of generative AI frontends will let you specify a “seed”. That will, thus, determine what static you’re starting out with. You can have a seed that changes each generation, which many of them do and I think that ArtBot does, looking at its Web UI, since it has a “seed” field that isn’t filled in by default. IMHO, this is a bad default, since if you do that, each image you generate will be totally different — you can’t “refine” one by slightly changing the prompt to get a slightly-different image.

Anyway, once they have that “static” image, then they perform “steps”. Each “step” takes the existing image and uses the model, the prompt, and the sampler to determine a new state of the image. You can think of this as “trying to see images in the static”. They just repeat this a number of times, however many steps you have them set to run. They’ll tend to wind up with an image that is associated with the prompt terms you specified.

An easy way to see what they’re doing is to run a generation with a fixed seed set to 0 steps, then one set to 1 step, and so forth.

You seem super knowledgeable on the topic, where did you learn so much?

I honestly don’t, because for me, this is a part-time hobby. Probably the people who you can access who are most-familiar with it that I’ve seen are on subreddits on Reddit dedicated to this stuff. I’m trying to bring some of it over to the Threadiverse.

Civitai.com is a good place to see how people are generating images, look at their prompt terms.
Here and related Threadiverse communities, though there’s not a lot of talk on here, mostly people showing off images (though I’m trying to improve that with this comment and some of my past ones!). [email protected] tends towards more the technical side. [email protected] has porn, but not a lot of discussion, though I remember once posting an introduction to use of the Regional Prompting extension for Automatic1111 there.
Reddit’s got a lot more discussion; last I looked, mostly on /r/StableDiffusion, though the stuff there isn’t all about Stable Diffusion.
There are lots of online tutorials talking about designing a prompt and such, and these are good for learning about a particular model’s features.

Some stuff is specific to one particular model or frontend, and some spans multiple, and while there’s overlap today, that information isn’t exactly nicely and neatly categorized. For example, “negative prompts” are a feature of Stable Diffusion, and are invaluable there — are prompt terms that it tries to avoid rather than include — but Flux doesn’t support them. DALL-E, a commercial service, doesn’t support negative prompts. Midjourney, another commercial service, does. Commercial services also aren’t gonna tell everyone exactly how everything they do works. Also, today this is a young and very fast-moving field, and information that’s a year old can be kind of obsolete. There isn’t a great fix for that, I’m afraid, though I imagine that it may slow down as the field matures.

rnercle@sh.itjust.works · 3 months ago

this should be a pinned post on [email protected]

rnercle@sh.itjust.works · 3 months ago

apparently yes ☞ https://tinybots.net/artbot/create?panel=inpainting