What lies beneath...
Last week’s announcement at OpenAI’s inaugural keynote marked a turning point, in my opinion. The ability to easily create a custom AI assistant without code democratizes capabilities that software teams have been working on feverishly for the past few months.
What I am curious about is – all of the people who will train a custom model on content that they are working on. Typically this would be content that is in a knowledge base or wiki or intranet within a company. This is an entirely different set of data that has been largely unavailable.
I started my career building the web for enterprises. I initially worked on a lot of corporate intranets. I even worked on a couple that won awards for best intranet of the year, back when that was a thing. I distinctly remember an infographic produced by an analyst like Gartner or Forrester stating that over 90% of the content out there is unseen, much like an iceberg has most of its mass underneath the water.
I see this “unseen” data now being gathered by OpenAI. It is still firewalled and not being used to train their general model, but that is the case today. Who knows what means or incentivization they will use to incorporate this content later on? One of the most critical elements for an AI to succeed is its training corpus. I think OpenAI is trying to amass the largest training corpus possible.
There’s still the legality of all the data it is trained on that is on the public internet and whether or not using it to train an AI model falls under “fair use.” This may not even matter if we look at Spotify as an example. In the case of Spotify, they were streaming music that they didn’t have the rights to and a class action suit was filed against them. They settled the class action and used it to create a vehicle to pay for the ability to stream this content. This seems like a likely course of action for OpenAI.
So, what is the next move for OpenAI? It will likely forge more ties to enterprises to get closer to the “content that lies beneath” in corporate intranets. Having Microsoft as a strategic partner/investor and also the distributor of Sharepoint/OneDrive is quite convenient in this case. Who wouldn’t want a work assistant that is trained on all the content that I’m drawing from daily to speed up what I’m doing and increase my productivity? That’s the carrot right there.
Monday November 13, 2023