Peter Nam

It looks like Apple Photos now has the ability to identify objects in photos using AI and let you search for them. Google Photos has had an early version of this since as far back as I can remember. A quick search shows 2015 as an early date where people started talking about it. It’s a little funny hearing some podcasters get so excited and gush over this, when this tech has been in my pocket for almost a decade already. Computer vision has advanced considerably since then, and we’ve come a long way since Jian-Yang’s “not a hotdog” app.

[sidebar: on X/Twitter, why doesn’t Elon Musk implement image recognition to detect questionable imagery and automatically put a temporary suspension on the account until it has been resolved by a human? The Taylor Swift fakes issue could have been mitigated. The technology is available]

Another technology that seems to be rolling out now, but was also nascent many years ago was Google’s ability to have an assistant make dinner reservations for you over the phone or other simple tasks like this using natural language processing. This was announced at Google I/O 2019. Google was way advanced here compared to others. The famous paper which enabled GPT, Attention is all you need, was produced by Google in 2017. That is the “T” in GPT. It seems like they shelved that technology since it was not completely safe. Which we can easily see in LLM’s today which often hallucinate (confabulate?) inaccuracies. I suspect they held off on advancing that technology because it would disrupt search ad revenue and that it was unreliable, dangerous even. Also, no one else seemed to have this technology. So back then it would have made perfect sense.

My first exposure to GPT was in 2019 when you were able to type in a few words and have GPT-2 complete the thought with a couple lines. Some people took this to the extreme and ~~wrote~~ generated really ridiculous mini screen plays. Maybe that was with GPT-3 a year later. It was a nifty trick at the time. Look at how much it has grown now.

Google is playing catch up, but I think they’ll be a very strong contender. After an AI model has been tweaked and optimized to its limit (if that’s ever possible) a significant factor in the capability of the model is its training corpus. That is the “P” in GPT. Who has a large data set of human-generated content that can be used to train a model? Hmmm… Just as Adobe Stock is a vast treasure trove of creative media, Youtube is an incredible resource of multimedia human content. We will likely see sources of really good content become very valuable. Sites like quora, stackoverflow and reddit where niche topics and experts in various areas gather will increase in value if they can keep there audiences there. To keep generating valuable content. I kinda feel like The Matrix was prescient in this. Instead of humans being batteries to power the computers, the humans are content generators to feed the models.

This ecosystem of humans being incentivized to produce content. Content being used to train a model. Models being used to speed up human productivity. So humans can further produce content. All this so I can get a hot dog at Gray’s Papaya for three dollars. Yes, you heard me right. It’s not seventy-five cents anymore…