What I dislike about lower quantization is quality degradation. In my small experience, i find 7b models dumb (I’ve only tested Q4KM GGUF), and needed to be provided proper context before moving forward with the constructive conversation (be chat or instruct).
If this issue can be circumvented in lower quantization, I’m all in.
In context of SD, going below fp16 would only make things faster at cost of quality, and I personally like to go in depth with my prompts. For simpler prompts sure, even lighting and turbo are good in that regard.
You can’t shrink a model to 1/8 the size and expect it to run at the same quality. Quantization allows me to move from a cloud gpu to my laptops crappy cpu/igpu, so I’m ok with that tradeoff.
What I dislike about lower quantization is quality degradation. In my small experience, i find 7b models dumb (I’ve only tested Q4KM GGUF), and needed to be provided proper context before moving forward with the constructive conversation (be chat or instruct).
If this issue can be circumvented in lower quantization, I’m all in.
In context of SD, going below fp16 would only make things faster at cost of quality, and I personally like to go in depth with my prompts. For simpler prompts sure, even lighting and turbo are good in that regard.
You can’t shrink a model to 1/8 the size and expect it to run at the same quality. Quantization allows me to move from a cloud gpu to my laptops crappy cpu/igpu, so I’m ok with that tradeoff.