• sexy_peach@beehaw.org
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    5 months ago

    Researchers who work on transformer models understand how the algorithm works, but they don’t yet know how their simple programs can generalize as much as they do.

    They do!

    You can even train small networks by hand with pen and paper. You can also manually design small models without training them at all.

    The interesting part is that this dated tech is producing such good results now that we throw our modern hardware at it.

    • archomrade [he/him]@midwest.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      an acknowledgement of how relatively uncomplicated their structure is compared to the complexity of its output.

      The interesting part is that this dated tech is producing such good results now that we throw our modern hardware at it.

      That’s exactly what I mean.

        • archomrade [he/him]@midwest.social
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          Maybe a less challenging way of looking at it would be:

          We are surprised at how much of subjective human intuition can be replicated using simple predictive algorithms

          instead of

          We don’t know how this model learned to code

          Either way, the technique is yielding much better results than what could have been reasonably expected at the outset.