The next GPT model will have PhD-level intelligence and current video generation models are helping businesses create realistic film content.
“Some creative jobs maybe will go away, but maybe they shouldn’t have been there in the first place.” This statement from OpenAI’s Chief Technology Officer, Mira Murati when she was addressing the audience at Dartmouth’s Thayer School of Engineering, has gotten genuine and a few self-professed ‘experts’ on social media all riled up.
Mira Murati
A few are calling her insensitive, others are questioning Murati’s credentials in judging creative work. There are valid arguments on both sides in this sudden and rather acrimonious conversation. And while any debate may only serve to give the perception of delaying the inevitable, there’s little scope for changing the reality that awaits us.
There is a definite reason why Murati said what she said. It’s also why I agree with her, having had a broader vision of how artificial intelligence (AI) capabilities are shaping up. Quite rapidly too while at it. OpenAI CEO Sam Altman has often talked about AGI or Artificial General Intelligence. The crucial difference between AI and AGI, as things are envisioned, encompasses scope, relevance, capabilities and nature.
Basically, AGI is expected to be capable of understanding and reasoning, and unlike the often specific nature of generative AI, it will have a broader scope. The idea is, that it’ll replicate human behaviour as well as an ability to learn context and reason, for everything AI chatbots can do now, to more complex problem-solving.
The crucial difference – an ability to learn emotion and contextual awareness, much like humans. In fact, Altman isn’t the only one. The now departed from OpenAI’s co-founder and former chief scientist Ilya Sutskever’s start-up, Safe Superintelligence aims to create what the name suggests – a super-intelligent machine which surpasses anything we’ve seen thus far, the important caveat being it shouldn’t be dangerous as a result of its capabilities.
Quite what Sutskever’s team managed at OpenAI, is nothing short of definitive. OpenAI’s ChatGPT online chatbot set off a chain of events, brought generative AI into our lives as a talking chatbot, a writing assistant, generates code, drafts email replies for us, takes notes of meetings and generates an image or a music track exactly how we like it. It is only logical that I wouldn’t bet against Sutskever or Safe Superintelligence, with what they’ve set out to achieve.
Murati is hinting at something. I am not entirely sure if AI will take away the jobs of the “Creative Directors”, the “Art Director”, the “Chief Creative Officer”, or someone lower down in the hierarchy. But a change is coming, albeit temporary if at all, because organisations will most certainly experiment with powerful technology replacing a human being and their salary (the tech solution coming at a fraction of that cost). These are the developments with AI which I mentioned, that lead me to not dismiss her comments.
AI company Anthropic’s Claude 3.5 model, released a few days ago, is often matching and mostly superseding OpenAI’s greatest thus far GPT-4o, Google’s Gemini 1.5 Pro and Meta’s Llama 3 (specifically, the 400B) in benchmarks. Any benchmark scores should be taken with a rather generous dose of salt, and in the real world, Claude 3.5 is an improvement at multi-layered workflows, understanding data in the form of charts and a new functionality called Artifacts that lets you tweak generated results before moving them along to their intended destination or recipient.
When OpenAI introduced Sora earlier this year, the initial teasers of this text-to-video generation tool looked incredibly detailed and professional. While it’s yet to be released for everyone to use, an example emerging from its closed group testing leads us to Toys R Us, a brand well known for its retail presence focused on toys. They’ve created a brand film, partially generated using Sora. They took a trip back in time.
“Sora can create up to one-minute-long videos featuring realistic scenes and multiple characters, all generated from text instruction. Imagine the excitement of creating a young Charles Lazarus, the founder of Toys“R” Us, and envisioning his dreams for our iconic brand and beloved mascot Geoffrey the Giraffe in the early 1930s,” is how the retail chain explains this brand film. Does Murati have a point?
Sora isn’t the only one. Runway’s unleashed the Gen-3 Alpha, with advanced functionality at your novice fingertips – think of Text to Video, Image to Video, Advanced Camera Control and something called the Multi-Motion Brush, all there. The web browser is only for now, the iOS app will arrive sometime in the coming weeks.
I have a feeling, the next big battle will pitch tech and AI companies on one side, with artists, creators, and everyone else who’s data on the internet forms the data sets used to train AI models. That is the data which makes AI (it is, by definition, data hungry) what it is today, and what it will be, in times to come. Some regulation there is the only hope against unchecked, predatory use of AI, while slowing down the relentless juggernaut. Bring some balance, perhaps.
It may be good to underline this chapter of our tryst with tech, with a quote I read a few days ago, from author Joanna Maciejewska. It simply reads, “I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.” We’re going down a completely different path. Apparently the next GPT model will have PhD level intelligence.
Vishal Mathur is the technology editor for the Hindustan Times. Tech Tonic is a weekly column that looks at the impact of personal technology on the way we live, and vice-versa. The views expressed are personal.