Hacker News Re-Imagined

Benchmarking the Apple M1 Max

  • 164 points
  • 7 hours ago

  • @xrayarx
  • Created a post
  • • 152 comments

Benchmarking the Apple M1 Max


@matthewmacleod 5 hours

Replying to @xrayarx 🎙

Good to see a detailed benchmark. I’m pretty impressed by the performance in real-world applications as well - the machine is easily 2.5—3x as fast at running various builds and processing jobs as my 15” from 2018 was, and it’s cool and quiet while doing it.

The performance claims have been a bit overblown in some quarters - it’s not going to replace a 5950X with a big GPU, and some of the rhetoric is a bit silly. But it’s surprisingly close - watching a silent laptop rip through a build faster than the 125W TDP i9-10900K we have in the office is pretty cool!

Reply


@YetAnotherNick 1 hour

Replying to @xrayarx 🎙

Good post. One thing to note is that the 3090 being 8 times faster is not very correct statement. The author is comparing FP16 3090 with FP32 M1. The difference between them is more like 3-4 times for FP32.

Even that is not true FP32 for 3090 as tensorflow uses Nvidia's AI32 by default for convolution.

Reply


@Jack000 5 hours

Replying to @xrayarx 🎙

One thing I don't see anyone mention is that the M1Max is the probably the cheapest GPU memory you can buy. The only other way to get >=64Gb GPU memory is with the A100 (?) which is like 20k by itself.

So this would be great specifically for finetuning large transformer models like GPT-J, which requires a lot of memory but not a lot of compute. Just hoping for pytorch support soon..

Reply


@newaccount74 6 hours

Replying to @xrayarx 🎙

I also got an M1 Max. The chip is amazing. Compile times are a lot faster than on the 6 core Intel Mac mini I had before.

But at this point it's really held back by Apple's software.

Anything related to Apple ID and iCloud regularly hangs 30-60 seconds, showing a spinner with no progress indicator whatsoever.

Apps randomly take 20 seconds to launch, maybe because of [1]?

The Open/Save dialog taking 30 seconds to show.

ControlCenter using 8GB of RAM to show a few sliders (I hope they fix that bug soon).

The scanning feature in Preview is so unreliable that I started using my Windows machine for scanning something on my HP all-in-one.

Some of those problems may be issues with 3rd party software (drivers), and others are just things that slipped through QA, and will hopefully be fixed in an update.

But some of the issues are structural issues, where Apple has made questionable decisions that means issues can never be fixed.

Eg. designing a security architecture that requires synchronously checking a binary signature during app startup with a web service is bound to cause performance issues.

Or the design of the XPC system, which uses asynchronous message passing between services that are implicitly launched on demand sounds nice in theory, but it has been the source of so many bugs, causing temporary or permanent app hangs that are impossible to debug. The system was introduced in macOS 10.7 (!) and it still doesn't work reliably! At this point I've lost hope it will ever work properly.

[1]: https://sigpipe.macromates.com/2020/macos-catalina-slow-by-d...

Reply


@akdor1154 5 hours

Replying to @xrayarx 🎙

For all the (truly) amazing performance from the M1.*, how much of the benefits are just coming from Mac users not realising how laggy their OS is on non-extreme hardware? I used MacOS for two years on a '15 i5 MBP and didn't realise how persistently sluggish it was until i blew everything away and chucked on Xubuntu. (Nothing magic about linux here, Gnome and KDE were as bad as MacOS)

Is the incredible performance of the M1 just going to enable a whole new generation of inefficient software?

Reply


@bee_rider 4 hours

Replying to @xrayarx 🎙

I wonder if they'll somehow include the AMX 'instruction' (or whatever it is) into BLIS kernels. GEMM isn't everything, but it is a pretty important building block in linear algebra. (I mean that's the big observation of these fancy tile based BLAS implementations).

Reply


@rc_mob 17 minutes

Replying to @xrayarx 🎙

Did I get the worst M1 Max in the world? In my one month with this computer so far -- its been problematic. It has this fun issue where it freezes up for a couple seconds randomly when watching youtube. Its frozen for a few seconds in other situations also. One time it just rebooted completely right in the middle of using it. Add to that I've never gotten more that 4 hours out of this battery.

Personally I'm kind of regretting the purchase. My 2015 Macbook Pro is faster than it.

Reply


@jeffbee 5 hours

Replying to @xrayarx 🎙

Didn't numpy remove Apple Accelerate support recently because it has numerical problems? Their docs are still warning against it.

Reply


@lincpa 1 hour

Replying to @xrayarx 🎙

Apple M1 unified memory architecture(published on November 11, 2020.) is my "warehouse/workshop model"(hardware architecture section published on February 06, 2019). It has not yet fully realized the warehouse/workshop model, It needs to further improve the programming language, compiler, and OS to support and promote my programming methodology.

Reference

1. The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model

https://github.com/linpengcheng/PurefunctionPipelineDataflow

Its mathematical prototype is the simple, classic, vivid, and widely used in social production practice, elementary school mathematics "water input/output of the pool". My theory rebuilt the theoretical foundation of the IT industry, It makes the computer theory system fully & perfectly related to mathematics in a simple and unified way: from hardware integrated circuits and computer architecture, to software programming methodology, architecture, programming language and so on. It solve the most fundamental and core major problems in the IT industry: The foundation and core of the IT theory lack mathematical support.

2. Why my "warehouse/workshop model" can achieve high performance and low power consumption (take Apple M1 chip, Intel AVX-512, Qualcomm as examples)

https://github.com/linpengcheng/PurefunctionPipelineDataflow...

3. In the future, OS will be a DB, and Clojure will be the best DML, The future OS kernel will be a data-oriented scheduler (with Computer hardware and software integration architecture diagram)

https://github.com/linpengcheng/PurefunctionPipelineDataflow...

4. Foxpro Database-oriented programming paradigm is the development direction of the future programming language

https://github.com/linpengcheng/PurefunctionPipelineDataflow...

Reply


@musicale 5 hours

Replying to @xrayarx 🎙

The memory bandwidth result is impressive.

Reply


@jessriedel 3 hours

Replying to @xrayarx 🎙

> We already know that the M1 Max CPU should have really strong matrix multiplication performance due to Apple's "hidden"/undocumented AMX co-processor embedded in the CPU complex, and that it is leveraged when you use Apple's Accelerate framework

Does this hold for the M1 Pro?

Reply


@noveltyaccount 5 hours

Replying to @xrayarx 🎙

Finally some benchmarks beyond just encoding video (which is admittedly a huge use case for these CPUs). I've been on Windows for decades, and this step change in computing performance is a Siren's call for me to switch, I've never wanted an Apple product so much as this before.

Reply


@m15i 6 hours

Replying to @xrayarx 🎙

Regarding training ResNet50, even though img/sec is less than the 3090, could a 64gb m1 max accommodate larger image sizes than the 24gb 3090?

Reply


@brrrrrm 6 hours

Replying to @xrayarx 🎙

for those curious about running their own matmul benchmarks, I wrote a script a while back that works with both linux and MacOS that should make comparison easy.

https://jott.live/code/blas_test.cc

I saw ~1.2tflops on the regular M1

Reply


@PragmaticPulp 6 hours

Replying to @xrayarx 🎙

Great detailed benchmarking.

This mirrors my experience with my M1 Max: Absolutely amazing battery life and performance in a laptop. I’m thrilled to have it. Huge step up from last gen Apple laptops.

But at the same time, it feels like some of the rhetoric around the performance claims got a little out of hand in the wake of the launch. It’s fast, but it’s not actually crushing my AMD/nVidia desktop like a lot of news outlets were suggesting it would.

In fact, a lot of the GPU tests here show more or less what I’ve seen: That Apple has matched the power/performance of other leading-edge GPU hardware:

> Pretty much what we would expect, with the M1 Max having about 8x less performance, but at 8x less power, so performance per watt is surprisingly quite comparable between the two.

This is actually an impressive accomplishment out of Apple. I’m just afraid it might get overshadowed by the fact that it doesn’t live up to some of the fairly extreme performance claims that got tossed around in the days following the launch.

Reply


@willvarfar 5 hours

Replying to @xrayarx 🎙

Excellent investigation.

I have the first M1 MBP pro 13” and have done a lot of data stuff on it. My experience was also that python flew - cpython on the M1 being almost as fast as pypy on my 2019 i7 laptop - and java compilation was much faster too. The CPU is fast and the memory is really fast.

The performance pain points though was anything involving containers, random 10-30 sec stalls in boot and app startup (I think it’s corporate firewall stuff) and a general preference I have for Linux desktop over OSX (yeah I’m a programmer).

Reply


About Us

site design / logo © 2021 Box Piper