AI’s voracious need for computing power is threatening to overwhelm energy sources, requiring the industry to change its approach to the technology, according to Arm Holdings Plc Chief Executive Officer Rene Haas.

  • frezik@midwest.social
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    6 months ago

    Improving the models doesn’t seem to work: https://arxiv.org/abs/2404.04125?

    We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend.

    It’s taking exponentially more data to get better results, and therefore, exponentially more energy. Even if something like analog training chips reduce energy usage ten fold, the exponential curve will just catch up again, and very quickly with results only marginally improved. Not only that, but you have to gather that much more data, and while the Internet is a vast datastore, the AI models have already absorbed much of it.

    The implication is that the models are about as good as they will be without more fundamental breakthroughs. The thing about breakthroughs like that is that they could happen tomorrow, they could happen in 10 years, they could happen in 1000 years, or they could happen never.

    Fermat’s Last Theorem remained an open problem for 358 years. Squaring the Circle remained open for over 2000 years. The Riemann Hypothesis has remained unsolved after more than 150 years. These things sometimes sit there for a long, long time, and not for lack of smart people trying to solve them.