Present AI benchmarks are struggling to maintain tempo with fashionable fashions. As useful as they’re to measure mannequin efficiency on particular duties, it may be onerous to know if fashions educated on web information are literally fixing issues or simply remembering solutions they’ve already seen. As fashions attain nearer to 100% on sure benchmarks, in addition they develop into much less efficient at revealing significant efficiency variations. We proceed to spend money on new and more difficult benchmarks, however on the trail to common intelligence, we have to proceed to search for new methods to judge. The more moderen shift in direction of dynamic, human-judged testing solves these problems with memorization and saturation, however in flip, creates new difficulties stemming from the inherent subjectivity of human preferences.
Whereas we proceed to evolve and pursue present AI benchmarks, we’re additionally constantly seeking to check new approaches to evaluating fashions. That’s why as we speak, we’re introducing the Kaggle Sport Enviornment: a brand new, public AI benchmarking platform the place AI fashions compete head-to-head in strategic video games, offering a verifiable, and dynamic measure of their capabilities.