Hi there,
I'm curious to know other people's approach in working with Stable Diffusion. I'm just a hobbyist myself and work on creating images to illustrate the fictional worlds I'm building for fun.
However, I find that getting very specific images (that are still visually pleasing) is really difficult.
So, how do you approach it? Are you trying to "force" your imagined picture out by making use of control net, inpainting and img2img? I find that this approach usually leeds to the exact image composition I'm after but will yield completely ugly pictures. Even after hours of inpainting the best I can get to is "sorta ok'ish", surely far away from "stunning". I played around with control net for dozens of hours already, experimenting with multi-control, weighting, control net only in parts of the image, different starting and ending steps, ... but it's only kinda getting there.
Now, opposed to that, a few prompts can generate really stunning images, but they will usually only vaguely resemble what I had in mind (if it's anything else than a person in a generic pose). Composing an image by only prompts is by no means easier/faster than the more direct approach mentioned above. And I seem to always arrive at a point where the "prompt breaks". Don't know how to describe this, but in my experience when I'm getting too specific in prompting, the resulting image will suddenly become ugly (like architecture that is too closely described in the prompt having all wrong angles suddenly).
So, how to you approach image generation? Do you give a few prompts and see what SD can spit out with that? Taking delight in the unexpected results and explore visual styles more than specific image compositions? Or are you trying to be stubborn like me and want to use it as a tool for illustrating imagination - which at the latter it doesn't seem nearly as good at as at the former.
I usually let it do what it wants more, in the interest of good looking outputs. For more complex things I use a combination of the things you describe - prompt and controlnet tweaks - and img2img. You can let the original generation be ugly and overbaked, but as long as it has the right composition, you can then send it through img2img with a reduced prompt based more around style than composition. Or if you're really having the trouble getting the composition you want, you can even make a sketch or rough edit of it, then run that through img2img.
You're probably aware but it's worth mentioning: CFG scale and the model you use have a huge impact on overbaking (i.e. the ugly over+contrasted look with weird artifacts that happens when there's too detailed of a prompt). Any model trained to do something in particular will be much more prone to this; Deliberate v2 is my preferred model for how flexible it is, it takes a lot to get overbaked outputs. Also, lowering the CFG reduces overbaking risk a lot, and while it does add more 'randomness' it can sometimes be worth it. All about balancing it with your prompt.
Protip - If an image is good but not quite perfect, stick to the same seed and use the X/Y script to run the image lots of times at different CFG levels.