ben_db t1_iyrmvzu wrote on December 3, 2022 at 5:03 PM

#852,859

I can forgive them not giving a comparison to other architectures but why don't they give a reference to the timing before the optimisations? 18 seconds in meaningless.

hrkrx t1_iyrp1m8 wrote on December 3, 2022 at 5:17 PM

#852,966

Replying to ben_db (#852,859)

My not further defined calculation machine(TM) can generate an image of unknown size in less then a random amount of time.

[deleted] t1_iyrq0sy wrote on December 3, 2022 at 5:24 PM

#853,017

Replying to ben_db (#852,859)

[deleted]

ben_db t1_iyrq17d wrote on December 3, 2022 at 5:24 PM

#853,018

Replying to hrkrx (#852,966)

Wow, that's an amount of time different to the previous calculation machine!

ben_db t1_iyrqqfk wrote on December 3, 2022 at 5:29 PM

#853,047

Replying to [deleted] (#853,017)

The problem is, stable diffusion isn't a fixed length operation, yes it's 50 iterations but those iterations will vary massively based on the input term, output resolution, channels as well as about 10 other settings.

Avieshek OP t1_iyrsbe3 wrote on December 3, 2022 at 5:40 PM

#853,109

Replying to [deleted] (#853,017)

The M1 MacBook Air... is a fanless, ultra lightweight laptop with no dedicated GPU and 20-hour battery life.…. I’d say that’s pretty impressive when we are yet to see a Mac Pro on  Silicon.

BlingyStratios t1_iyrsu45 wrote on December 3, 2022 at 5:43 PM

#853,134

Replying to [deleted] (#853,017)

What was it before? I tried it a couple months ago on an m2 air, an image would take me 15 minutes

juggarjew t1_iyrtq5p wrote on December 3, 2022 at 5:49 PM

#853,171

And I can generate an image in a few second on my Nvidia A4000, this is a meaningless statement given that you can tweak so many settings such that there is no apples to apples comparison going on.

Themasterofcomedy209 t1_iyruepz wrote on December 3, 2022 at 5:54 PM

#853,203

Replying to hrkrx (#852,966)

My digital electronic programmable machine consisted simply of six hydrocoptic marzelvanes, so fitted to the ambifacient lunar waneshaft that sidefumbling was effectively prevented.

Cindexxx t1_iyrwekl wrote on December 3, 2022 at 6:08 PM

#853,294

Replying to Avieshek (#853,109)

I've just been wondering if the Pro haven't used Apple silicon because it doesn't scale up to it. Their chips are insanely impressive, but can that 20W thing scale up to 120W and actually have 5-6x the power? And if it can, why haven't they done it?

Avieshek OP t1_iyrxwzr wrote on December 3, 2022 at 6:19 PM

#853,369

Replying to Cindexxx (#853,294)

There’s already been benchmark leaks with 96GB of RAM, there’s a Covid-situation going on in China currently and likely the launch has been postponed to the end of the financial year.

AutoSlashS t1_iyryz50 wrote on December 3, 2022 at 6:26 PM

#853,427

Replying to ben_db (#853,018)

Well,.it's impressive.

doremonhg t1_iys08wh wrote on December 3, 2022 at 6:35 PM

#853,474

Replying to ben_db (#853,018)

Definitely one of the calculation machine ever made

Stanley--Nickels t1_iys0iti wrote on December 3, 2022 at 6:37 PM

#853,487

Replying to Avieshek (#853,109)

.

whackwarrens t1_iys1b8e wrote on December 3, 2022 at 6:42 PM

#853,517

Replying to [deleted] (#853,017)

Chips become more power efficient over time so how old is that gpu? And on what node?

If you're comparing an old ass node on a desktop part to Apple's latest and greatest mobile chip the power difference would be insane. Comparable laptop apus from AMD would manage the same, although they use like 65w last I checked.

M2 is on like 4 nanometer. Clearly a desktop pc taking 42 seconds to do basic 50 iteration renders isn't remotely bleeding edge lol.

sambes06 t1_iys1dfx wrote on December 3, 2022 at 6:42 PM

#853,518

Would this work M1 iPads?

AkirIkasu t1_iys400c wrote on December 3, 2022 at 7:01 PM

#853,618

Replying to sambes06 (#853,518)

From the article:

> This leads to some impressively speedy generators. Apple says a baseline M2 MacBook Air can generate an image using a 50-iteration StableDiffusion model in under 18 seconds. Even an M1 iPad Pro could do the same task in under 30 seconds.

AkirIkasu t1_iys4fxb wrote on December 3, 2022 at 7:04 PM

#853,638

The actual writeup by Apple, for those curious.

The actual code for those who want to actually try it out.

AkirIkasu t1_iys6fy9 wrote on December 3, 2022 at 7:18 PM

#853,714

Replying to Cindexxx (#853,294)

Perhaps? The M1 Ultra is basically two M1 chips glued together with a bunch of extra GPU cores.

There isn't an M2 Ultra right now, but it's probably only a matter of time until that gets released.

[deleted] t1_iysj0o2 wrote on December 3, 2022 at 8:45 PM

#854,146

Replying to hrkrx (#852,966)

[removed]

[deleted] t1_iysj19r wrote on December 3, 2022 at 8:45 PM

#854,148

Replying to ben_db (#853,018)

[removed]

[deleted] t1_iysj1oa wrote on December 3, 2022 at 8:45 PM

#854,150

Replying to AutoSlashS (#853,427)

[removed]

[deleted] t1_iysj2yb wrote on December 3, 2022 at 8:45 PM

#854,152

Replying to doremonhg (#853,474)

[removed]

[deleted] t1_iysj3ln wrote on December 3, 2022 at 8:46 PM

#854,154

Replying to Themasterofcomedy209 (#853,203)

[removed]

[deleted] t1_iysj42y wrote on December 3, 2022 at 8:46 PM

#854,156

Replying to [deleted] (#853,017)

[removed]

[deleted] t1_iysj4pk wrote on December 3, 2022 at 8:46 PM

#854,158

Replying to ben_db (#853,047)

[removed]

[deleted] t1_iysj58t wrote on December 3, 2022 at 8:46 PM

#854,160

Replying to BlingyStratios (#853,134)

[removed]

[deleted] t1_iysj62e wrote on December 3, 2022 at 8:46 PM

#854,162

Replying to Avieshek (#853,109)

[removed]

[deleted] t1_iysj6na wrote on December 3, 2022 at 8:46 PM

#854,163

Replying to Cindexxx (#853,294)

[removed]

[deleted] t1_iysj76s wrote on December 3, 2022 at 8:46 PM

#854,165

Replying to Avieshek (#853,369)

[removed]

[deleted] t1_iysj7k7 wrote on December 3, 2022 at 8:46 PM

#854,166

Replying to AkirIkasu (#853,714)

[removed]

[deleted] t1_iysj7zz wrote on December 3, 2022 at 8:46 PM

#854,167

Replying to Stanley--Nickels (#853,487)

[removed]

[deleted] t1_iysj8kr wrote on December 3, 2022 at 8:47 PM

#854,168

Replying to whackwarrens (#853,517)

[removed]

[deleted] t1_iysj9az wrote on December 3, 2022 at 8:47 PM

#854,171

Replying to AkirIkasu (#853,638)

[removed]

[deleted] t1_iysj9q7 wrote on December 3, 2022 at 8:47 PM

#854,173

Replying to juggarjew (#853,171)

[removed]

[deleted] t1_iysjalj wrote on December 3, 2022 at 8:47 PM

#854,175

Replying to sambes06 (#853,518)

[removed]

[deleted] t1_iysjb5e wrote on December 3, 2022 at 8:47 PM

#854,176

Replying to AkirIkasu (#853,618)

[removed]

[deleted] t1_iysjusr wrote on December 3, 2022 at 8:51 PM

#854,200

[deleted]

[deleted] t1_iysk5qi wrote on December 3, 2022 at 8:53 PM

#854,212

Replying to ben_db (#852,859)

[deleted]

stealth_pandah t1_iyskm18 wrote on December 3, 2022 at 8:56 PM

#854,235

Replying to ben_db (#852,859)

for example, my XPS 17 11th gen i7 and 2060 generates one image in 10 sec on average. I'd say 18 sec is pretty good at this point. M silicon future looks brighter every day.

browndog03 t1_iysn522 wrote on December 3, 2022 at 9:13 PM

#854,312

Maybe it’s a time increase who knows?

Avieshek OP t1_iyso0h3 wrote on December 3, 2022 at 9:19 PM

#854,350

Replying to AkirIkasu (#853,714)

No, that’s M1 Max

dangil t1_iysr4ln wrote on December 3, 2022 at 9:42 PM

#854,461

Replying to ben_db (#852,859)

My 2010 12 core Mac Pro with a Radeon 7970 takes about 5 minutes

Spirit_of_Hogwash t1_iysr7mx wrote on December 3, 2022 at 9:42 PM

#854,464

Replying to ben_db (#853,047)

In the ars technica article they say that with a rtx 3060 it takes 8 seconds and with the M1 ultra 9 seconds.

So once again Apple's "fastest in the world" claims are defeated by a mid-range GPU.

https://arstechnica.com/information-technology/2022/12/apple-slices-its-ai-image-synthesis-times-in-half-with-new-stable-diffusion-fix/

Eggsaladprincess t1_iysrcum wrote on December 3, 2022 at 9:43 PM

#854,470

Replying to AkirIkasu (#853,714)

I think M1 Max is basically 2 M1 chips and M1 Ultra is basically 4 M1 chips

ben_db t1_iysrvou wrote on December 3, 2022 at 9:47 PM

#854,483

Replying to dangil (#854,461)

You can't compare two different images with different settings

Cindexxx t1_iysxx32 wrote on December 3, 2022 at 10:32 PM

#854,668

Replying to Eggsaladprincess (#854,470)

Isn't that going to limit the single core to being not much higher than the original M1? Maybe with more power and cooling they can crank it up a bit, but it seems like that's the limit.

S1DC t1_iyszcja wrote on December 3, 2022 at 10:43 PM

#854,716

Funny how they don't mention the number of steps/method used. Big difference between 120 steps of Euler vs 20 steps of DDIM

Aozora404 t1_iyt1kl0 wrote on December 3, 2022 at 11:00 PM

#854,813

Replying to juggarjew (#853,171)

Hehe apples

dookiehat t1_iyt3gjg wrote on December 3, 2022 at 11:14 PM

#854,892

Replying to Spirit_of_Hogwash (#854,464)

I think it is a software or compiler (?) issue. Stable Diffusion was written for nvidia gpus w cuda cores. Idk what sort of translation happens but it probably leads to inefficiencies not experienced with nvidia.

Eggsaladprincess t1_iyt5x7v wrote on December 3, 2022 at 11:33 PM

#854,983

Replying to Cindexxx (#854,668)

Not really sure what you're saying. Single core is pretty consistent between M1 to M1 Ultra

Cindexxx t1_iyt65ou wrote on December 3, 2022 at 11:35 PM

#854,994

Replying to Eggsaladprincess (#854,983)

Yeah, talking about the pro line. If they're stuck at M1 single core speeds at desktop level it'll suck for certain applications.

dangil t1_iyt89xx wrote on December 3, 2022 at 11:51 PM

#855,071

Replying to ben_db (#854,483)

Every prompt takes the same amount of time

ben_db t1_iyt8la0 wrote on December 3, 2022 at 11:54 PM

#855,086

Replying to dangil (#855,071)

Prompt yes, anything else, no.

SD version, resolution, passes, channels etc, all massively affect performance.

"I take 25 minutes to drive to work and you take 30 so my car is faster"

sylfy t1_iytgr3x wrote on December 4, 2022 at 12:58 AM

#855,325

Replying to dookiehat (#854,892)

CUDA and the accompanying cudnn libraries are highly specialised hardware and software libraries for machine learning tasks provided by Nvidia, that they have been working on over the past decade.

It’s the reason Nvidia has such a huge lead in the deep learning community, and the reason that their GPUs are able to command a premium over AMD. Basically all deep learning tools are now designed and benchmarked around Nvidia and CUDA, with some also supporting custom built hardware like Google’s TPUs. AMD is catching up, but the tooling for Nvidia “just works”. This is also the reason people buy those $2000 3090s and 4090s, not for gaming, but for actual work.

Frankly, the two chips are in completely different classes in terms of power draw and what they do (one is a dedicated GPU, the other is a whole SoC), it’s impressive that the M1/M2 even stays competitive.

maxhaton t1_iythn1q wrote on December 4, 2022 at 1:05 AM

#855,350

Replying to [deleted] (#853,017)

It can absolutely draw more than 20W, no?

Draiko t1_iythp1x wrote on December 4, 2022 at 1:06 AM

#855,353

Knowing Apple, this method and result has a ton of asterisks on it.

PBlove t1_iythyik wrote on December 4, 2022 at 1:08 AM

#855,360

Replying to Draiko (#855,353)

YEP!

Bet it was on a special rig, not a consumer computer.

PBlove t1_iyti2t1 wrote on December 4, 2022 at 1:09 AM

#855,366

Replying to ben_db (#855,086)

That last part is a great way to out it.

PBlove t1_iytidxo wrote on December 4, 2022 at 1:12 AM

#855,373

Replying to Avieshek (#853,109)

It's a tablet with a keyboard.

Mac airs are shit.

Half my office got those from IT.

I got a 4lb Asus work station with an A5000... ;p

(Basically I use it to run freaking CAD software but only to review engineering, hell for fun I run blender renders I set up at home and send over to render in the background while I work.

svtscottie t1_iytiq15 wrote on December 4, 2022 at 1:14 AM

#855,392

Replying to AkirIkasu (#853,638)

You the real MVP. The github page contains most of the info everyone is complaining the article didn't have.

[deleted] t1_iytiutm wrote on December 4, 2022 at 1:15 AM

#855,400

Replying to maxhaton (#855,350)

[deleted]

Eggsaladprincess t1_iytmifw wrote on December 4, 2022 at 1:45 AM

#855,491

Replying to Cindexxx (#854,994)

Hm, I don't see it that was at all.

If we look at how Intel chips scale, we see that single core performance actually decreases on the largest chips. That's why historically the Xeon Mac Pro would actually have a lower single core performance than the similar generation i5 or i7.

Of course the Xeon would more than make up for it by having tons of cores, more PCIe lanes, support for ECC RAM, etc.

I think it would be fantastic if the M1 Supermega or whatever they end up calling the Mac Pro chip matches the M1 single core performance.

DiscoveryOV t1_iytqahn wrote on December 4, 2022 at 2:16 AM

#855,608

Replying to Spirit_of_Hogwash (#854,464)

Fastest in the world in their class.

I don’t see any ultrabooks with a 3060 in them, nor any even close to as powerful as a fanless 20w one.

BlazingShadowAU t1_iytrjlq wrote on December 4, 2022 at 2:26 AM

#855,652

Replying to ben_db (#852,859)

Ngl, as someone who has run stablediff on my own gpu, 18 seconds could either be god awful, average or good depending on the number of steps in the generation. A 15 step generation on my 2070 only takes like 4 seconds and produces perfectly fine results. Think ive gotta go up to like 50+ before reaching 18 seconds.

Spirit_of_Hogwash t1_iytsbuk wrote on December 4, 2022 at 2:33 AM

#855,673

Replying to DiscoveryOV (#855,608)

I dont see any ultrabook or even 5kg laptop with a M1 ultra either.

Edit: you know what actually you can buy many ultrabooks with the RTX 3060 ( Asus ROG zephyrus G14, Dell XPS, razer blade 14 and many more <20mm thick laptops) while Apple laptops's gpu is at best half a m1ultra.

So yeah talk about fanboys who cant even google.

Starold t1_iytyzdk wrote on December 4, 2022 at 3:30 AM

#855,889

Replying to ben_db (#852,859)

not meaningless for those that use the same sw

Ethario t1_iyu2im8 wrote on December 4, 2022 at 4:01 AM

#855,977

86400 seconds a day divided by 18 seconds per waifu. POG

AkirIkasu t1_iyu41du wrote on December 4, 2022 at 4:15 AM

#856,027

Replying to ben_db (#853,047)

If you go to the actual github project you can see the full benchmarks and settings.

AkirIkasu t1_iyu4g6q wrote on December 4, 2022 at 4:19 AM

#856,037

Replying to Spirit_of_Hogwash (#855,673)

You never will, given that ultrabook is a trademark of Intel.

wakka55 t1_iyu4ocd wrote on December 4, 2022 at 4:21 AM

#856,042

Replying to AkirIkasu (#853,638)

I am too stupid to actually try it.

>ERROR: Failed building wheel for tokenizers or error: can't find Rust compiler

WHAT

lol

AkirIkasu t1_iyu4v2k wrote on December 4, 2022 at 4:22 AM

#856,047

Replying to BlazingShadowAU (#855,652)

The benchmark they used is 50 steps on a 77 character input, outputting 512x512.

AkirIkasu t1_iyu4y2e wrote on December 4, 2022 at 4:23 AM

#856,048

Replying to juggarjew (#853,171)

From the github page:

> The image generation procedure follows the standard configuration: 50 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).

kent2441 t1_iyu5dkz wrote on December 4, 2022 at 4:27 AM

#856,059

Replying to Spirit_of_Hogwash (#854,464)

Apple has never said their GPUs were the fastest in the world. Why are you lying?

Spirit_of_Hogwash t1_iyu5xnj wrote on December 4, 2022 at 4:32 AM

#856,074

Replying to kent2441 (#856,059)

https://birchtree.me/content/images/size/w960/2022/03/M1-Ultra-chart.jpeg

Dude, Apple is always claiming fastest in the world .

In this specific case Apple DID claim that they are faster than the "highest end discrete GPU" while in this and most real world tests is roughly equivalent to a midrange Nvidia GPU.

You should ask yourself why Apple is the one who lies and you believe them without checking the reality.

AkirIkasu t1_iyu60ra wrote on December 4, 2022 at 4:33 AM

#856,076

Replying to wakka55 (#856,042)

You need to have the nightly version of Rust installed. There's an issue linked in the FAQ of the README for the project that has instructions to install it.

wakka55 t1_iyu6n3o wrote on December 4, 2022 at 4:39 AM

#856,092

Replying to AkirIkasu (#856,076)

Maybe next year I'll give it another shot, for now I give up and go on with my dum dum life

Spirit_of_Hogwash t1_iyu6yj7 wrote on December 4, 2022 at 4:42 AM

#856,104

Replying to AkirIkasu (#856,037)

The previous fanboy said ultrabook when everyone else was comparing desktop to desktop.

But it turns out the rtx 3060 is available in many ultrabooks but the m1ultra is not available in any laptop format.

Tarkcanis t1_iyu9dio wrote on December 4, 2022 at 5:05 AM

#856,164

If the tech industry could stop using "sciencey" words for their products, that'd be greaate.

CatWeekends t1_iyufwml wrote on December 4, 2022 at 6:15 AM

#856,354

Replying to S1DC (#854,716)

50 steps of DDIM maybe?

https://www.reddit.com/r/StableDiffusion/comments/zajwqk/apple_stable_diffusion_with_core_ml_on_apple/iyobkyp/

Impossible_Wish_2675 t1_iyui8dc wrote on December 4, 2022 at 6:42 AM

#856,414

My Digital Abacus says a few seconds here and there, but no more than that.

[deleted] t1_iyuivy3 wrote on December 4, 2022 at 6:50 AM

#856,426

Replying to Spirit_of_Hogwash (#855,673)

[deleted]

ben_db t1_iyukygi wrote on December 4, 2022 at 7:17 AM

#856,465

Replying to AkirIkasu (#856,027)

They should give comparisons in the article, that's the point.

Are Apple users just fine with this? It seems to happen a lot for Apple products.

Always "30% better" or "twice the performance" but never any actual meaningful numbers.

S1DC t1_iyul16q wrote on December 4, 2022 at 7:18 AM

#856,468

Replying to CatWeekends (#856,354)

That's a reasonable amount on apple silicon in 18 seconds. I get 50 steps DDIM at 512x512 in about six seconds on a RTX 3080 10gb.

muffdivemcgruff t1_iyum5nx wrote on December 4, 2022 at 7:33 AM

#856,502

Replying to [deleted] (#854,212)

Cool, now put that into an iPad that barely sips wattag.

muffdivemcgruff t1_iyum8m7 wrote on December 4, 2022 at 7:34 AM

#856,508

Replying to AkirIkasu (#856,048)

Welp his GPU is fast, maybe not his brain so much.

headloser t1_iyumeqv wrote on December 4, 2022 at 7:37 AM

#856,514

And how is that compare to Windows 10 and 11 version?

Nicebutdimbo t1_iyux4n3 wrote on December 4, 2022 at 10:10 AM

#856,856

Replying to Cindexxx (#854,994)

Err the single core performance of the M1 chips is very high, I think when they were released they were the most powerful single cores available.

ryo4ever t1_iyuyo1m wrote on December 4, 2022 at 10:33 AM

#856,902

Why is it even called stable diffusion? This whole AI mumbo jumbo is confusing as hell…

HELPFUL_HULK t1_iyuyofg wrote on December 4, 2022 at 10:33 AM

#856,903

Replying to [deleted] (#853,017)

I'm using DiffusionBee on an M1 MacBook Air with 8GB of RAM and I'm getting similar time results to your friend, about 40-50 seconds with 50 steps on a 512x512 model.

This is without the optimizations in the article above