AI Can’t Do Your Job (Yet)

One of the reasons I’m less worried about AI models leading to mass unemployment, is that I don’t think the models are nearly ready to take over actual jobs. All of our benchmarks look at small isolated tasks, and assume that getting better at these makes the models more capable at doing whole jobs. I disagree. And now there is some evidence to support that view.

Scale AI and the Center for AI Safety just released their Remote Labor Index. It used freelance projects on Upwork as a proxy for more realistic job needs and tested leading models to see if they could complete these tasks at a level a client would be satisfied with. Most models succeeded less than 2% of the time. That’s pretty different than the benchmarks we see for specific tasks or skills.

And I think that represents the best case scenario. Most jobs aren’t nearly as short-term as Upwork freelance projects - and don’t have such neatly defined inputs and expected outputs. That helps the team measure this concept - but it also means that 2% is more like an upper bound for the work most readily tackled by AI. It would be far lower in the real world.

If you’re interested in a more software engineering take on this argument, check out the recent post A Project Is Not a Bundle of Tasks by Steve Newman.