-
so yeah, i deep-faked myself…
-
Agentic agents, agency and Her
majestic magenta magnets and err.. sorry, i haven’t had my coffee yet and my mind is wandering.
Agentic is a term that is coming up a lot these days. The vision is understood, but I haven’t seen much on the execution side. At least not much that is useful. I think Agentic functionality is where Apple Intelligence will be a game-changer if they can pull it off. I also think Gemini can also be culturally transformative in this way as well.
The idea is that an AI model can do things for you. Let’s say on your iPhone (i’m team android btw) you let Apple Intelligence or Google Gemini have access to your apps so that it can do things like read your email, read your calendar, browse the web, make appointments on your calendar, send text messages to your contacts, auto fill out forms for you. Wouldn’t it be nice if you can ask your AI Agent to be proactive and look through all the emails your kid’s school has sent and automatically identify forms that need to be filled out and start pre-populating those for you? and contacting any necessary parties for things like having a doctor’s office provide a letter for such and such? Then your job is to basically review and make minor corrections/revision and do the final submit!
Rabbit R1, while a cool idea and interesting design, failed to live up to the hype. I’ve played with a browser extension that somewhat could do some tasks, but it needed to be told very specific things. For example, I could ask it to look at my calendar, find the movie we’re going to see this week, search for family-friendly restaurants in the area, preferably with gluten-free options and provide some suggestions. If there was some memory built into that AI service, it could be useful, but it is still far from something like a personal assistant. And it wouldn’t have suggested out of nowhere to see if I wanted to look for dinner options for that night. I think once this form of AI gets refined, it can be a huge help to many people.
Another form of assistance that is interesting is the relational AI. I haven’t tried it out, but Replika is one that seems to provide chat services. Supposedly some people use it for emotional support like a virtual girlfriend. Some people have even felt like they’ve developed relationships with these chat bots. I believe Replika is also providing counselor-type AI chat bot services. This makes me think about how humans can form relationships with inanimate objects, like that favorite sweater or that trusty backpack. I think counseling can be transformative because of the relationship that is formed between the patient and the counselor, but it is a somewhat transactional relationship. The client pays for time and the counselor is required to spend that time. But it is also like a mentor relationship where there is an agreement for this time, and by nature of having access to a skilled individual, the need for compensation is understood.
In a typical human relationship, there is a bit more free will where one party can reject the relationship. They have agency to do that. In an AI-chatbot to human “relationship” the AI-chatbot is always there and does not leave. I think that produces a different dynamic with advantages and disadvantages. An ideal parent will never leave a child; an ideal friend will never leave you when you need them the most. So this AI-chatbot is nice in that way. But that level of trust is typically earned, no?
What I find interesting about the movie, Her, is that the AI has the freedom to disappear and leave. Was that intended to make the AI more human and have agency? Or was it because the story would be pretty boring if the AI couldn’t do these things? Will researchers eventually build an AI with autonomy and agency? If so, will the proper guard rails and safety systems be in place? Or is this the beginning of Skynet? And is that why so many safety and security-minded researchers are leaving OpenAI? Too much to think about… I should really get that coffee..
-
Hard things are easy. Easy things are hard
I’ve totally murdered the quote, but chatGPT helped me attribute that to Andrew Ng (my 2 minutes of fact checking via Google and ChatGPT haven’t been definitive). This idea is about how computers can be good at some things that are really hard, but some things that seem really easy to us are really hard for computers. For example, with your computer, you can now generate a full length essay within seconds on a given topic. That is something hard for humans to achieve (especially given the time frame), but now is easy with LLMs. Or AlphaGo beating humans at the game, Go (Baduk). Mastering Go is hard for most people, but the computer can now beat human grand masters. Something as simple as walking, is easy for people, but for a computer/robot to learn the subtle muscle movements and balance of a walking, running, or jumping biped, can be very difficult.
Likewise, I’m seeing GenAI following a similar pattern, but maybe for different reasons. In image generation, it seems easy these day to generate an amazing fantastical image that you would never have imagined before. Hard for human, simple for silicon. Using this as a raw technology for image generation de novo is great, but not always practical. We don’t always need images of panda bears playing saxophones while riding unicycles. Using this tech for image manipulation on the other hand is incredibly useful. GenFill and GenExpand are becoming a standard part of people’s workflow. Artists can now perform these tasks without having to learn clone stamp or perspective warp or spend countless hours getting the right select mask. Another example: using a raw image generation tool like midjourney to perform a GenExpand is a bit hacky. These instructions were provided by a redditor on how they would do it:
“but if you wanted to only use Midjourney, here’s what I would do:
Upload the original image straight into the chat so it gets a Midjourney URL, and upload a cropped version of the same image and also copy that url (trust me).
Then, do /describe instead of imagine and ask it to describe the original (largest) image.
Copy the text of whichever description you think fits the image best.
Do /imagine and paste the two URLs with a space between them [Midjourney will attempt to combine the two “styles”, in this case two basically identical images], and then paste the description Midjourney wrote for you and then a space and put —ar 21:9 or whatever at the end (just in case you didn’t know already, “—ar” and then an aspect ratio like “16:9” etc. will create an image in that aspect ratio :)”
What I see in GenFill is an exciting raw technology made into a useful tool.
I feel like we’re in another AI winter. Pre-covid, AI winter was when little attention was paid to AI, except for some sparks of spring like GPT 2. Then later, OpenAI blew everyone’s minds with their work on LLMs. This current “AI winter” is where we’re gotten used to the hype of the tech demos we’ve been seeing and now refinement and practical application has to happen. As well as safe guards, hopefully… This takes a lot of work, but it will be fun to see what comes next.
When I think of easy/hard/hard/easy, I keep coming back to a story a friend told me. She was an inner city math teacher and one of her students just couldn’t get multiplication right except for his 8 times table. This was really puzzling, so she dug in a bit more and the student revealed that he need to know how many clips to bring on a particular occasion and that his pistol had 8 bullets to a clip. Fascinating. But also a reminder that a lot of our knowledge is based on rote memorization. When you want to multiple 8 by 3, all your years of education tell you to rely on your memory that the answer is 24. But that path to the answer is pure memorization. Another method is to form objects in groups of 8 and form 3 groups of them and then count them all up. Isn’t rote memorization closer to what an LLM is doing as opposed to forming a strategy on how to solve something? But now, LLM can even strategize and break a complex task into simpler smaller parts. But even this function is advanced text/token retrieval and manipulation. it seems so close to human intelligence. and again, it makes me wonder what is human intelligence and what is silicon intelligence. as well as other concepts of sentience, consciousness, self-awareness, etc… Maybe I’ll ask chatGPT to help me sort through these things…
- Thursday June 27, 2024
-
Stochastic - so drastic? pro tactic? slow elastic?
In a recent interview, Mira Murati, OpenAI CTO, said that she herself doesn’t know exactly what in chatgpt 5 will be better than the current version. The word that has been creeping into my vocabulary is stochastic. It means inherently random and unpredictable. Very much like the responses you get from an LLM. You will often get a very helpful response, and it is amazing that it is generated in a matter of seconds. But sometimes you get unexpected results, sometimes harmful results. If you think about it, people aren’t that much different. It’s just that babies have had years of training on how to behave acceptably in society. Do you ever get random thoughts/impulses in your head? And then decide not to act on them? I think that’s kinda similar. Likewise, these LLMs need a bit more training for them to be acceptable and safe.
I think it’s fascinating that chatgpt 5 capabilities are very much unknown and yet there is a massive amount of engineering dedicated to it. My guess is that it is an improved algorithm or strategy of some sorts that showed promise in small scale prototyping. Or maybe it ends up not being a huge advance. If they can deliver on what they demo’ed for 4omni, that would be significant in itself. There’s still a lot of places where this raw technology has yet to be applied. And hopefully, some of that intelligence can find a more reasonable method of energy consumption . . .
I find it interesting that a new product version is being developed without very specific goals or success metrics, since the capabilities don’t seem to be fully defined. Product Management is a discipline where future capabilities are imagined and designed, feasibility is tested by research, design and user experience flow is refined and iterated, cost of goods are calculated, revenue impact is estimated, pricing models are structured and so forth. How can you do that if you don’t even know the new functionality of your product? So this chatgpt (5) is more like a raw technology than a product. Maybe one day in the future it will become like a utility, no different from water, electric and internet. Some day in the future – “Hey kids, Our AI tokens bill is really high this month. Can you stop generatively bio-engineering those crazy pets in the omniverse?”
-
On 4Omni, agents, rabbits, phone assistants, and sleep
So, by now you must have heard about OpenAI’s ChatGPT 4o(mni). If not you should definitely find a youtube video when they demonstrated it. The OpenAI demo was rather rushed tho. Almost as if they just found out Google was going to announce some new AI features and they wanted to steal their thunder the night before…
Nevertheless, it is an impressive demo. Heck, it got me to renew and fork over twenty bucks to try and get a chance at the new features earlier than general users. One of the podcasts I listen to commented that after seeing this demo, they declared the Rabbit R1 dead. But I don’t think there’s a strong relation between the capabilities OpenAI demo’ed and what the R1 represents. If I understand correctly, 4omni is a natively multi-modal LLM, and has been trained or more than just text, but rather images, music, video, documents and such. The Rabbit R1 is an agent which can take fairly independent action on your behalf. You give it a command to do something, it does some strategizing and planning of steps to follow and then begins to act on your behalf. I tried out another agent in the form of a browser plugin which was able to look into my email and calendar and maps and online accounts to perform tasks that I asked it to do. This was eye-opening for me. But it did not seem to correlate to what 4omni was demonstrating. 4o didn’t seem to take action on my behalf such as make dinner reservations based on certain criteria. As for an AI agent, the deal with Apple and OpenAI is really interesting to me. Everyone complains about Siri. What if Siri was replaced with ChatGPT (not Sky) and also had some guardrailed ability to perform actions on your behalf using the access it has to apps on your iPhone? This could be interesting.
The other player here is Google with Android and Google Assistant. Google Assistant was introduced almost a decade ago and when it first came out, I was a big fan. I could ask it questions from my watch or my earphones. I could receive and reply to text messages with my headphones without taking out my phone. It was connected to my home and I could turn on my air conditioner when I was a certain distance away from home.
But these days, Gemini has not been making a very good show of itself. The most recent gaffe is the Generative Search Experience telling people the daily amount of rocks to eat or pizza recipes with glue to prevent sauce falling off. The trust has been eroded. If we can’t trust Gemini to return safe responses via RAG (retrieval augmented generation) there’s no way people would trust giving it agent-capabilities and access to their phone apps and data. Apple on the other hand has a lot more trust from its users. (Let’s not talk about the commercial where they squished fine art and musical instruments…) So, I see Apple in a better position to release this type of agent-assistant.
This reminds me of what my teachers have always taught us since elementary school – it’s about quality not quantity. In the case of AI model training, it has to be both. Peter Norvig and others have emphasized the importance of large training data sets. Now it looks like it’s not just what amount of training data, but intelligently feeding it to the LLM and having it recognize sarcasm and trolling. Haven’t we learned anything from Microsoft’s Tay?
I think I need to take a break from podcasts. I find that every interim moment I have, I pop in my earbuds and listen to really interesting podcasts. it seems to take a way that boring space where i’m forced to just stare at the subway ceiling. But I’m starting to feel like that interim space of having nothing to consume is kinda like sleep. Some say that sleep is when your mind organizes the thoughts and experiences you’ve had during the day and helps make better sense and orientation and connections for them. I kinda feel like interim space might be like that as well. Some people listen to podcasts at 2x speed to consume and learn as much as possible. I think for a while I need my podcasts to go at O x speed.
Oh and speaking of sleep, here’s a pretty good podcast episode on it – open.spotify.com/episode/3…
-
Avengers v AI - IRL?
So it’s come to this. ScarJo (aka Avenger Black Widow) is in a fight against OpenAI over the alleged use of a voice similar to hers or possibly even trained on her voice. It is quite literally Avengers versus AI. So what questions does this bring up? The training corpus for AI LLMs has been under some scrutiny, but it is still unclear what is considered fair use. A human can mimic ScarJo’s voice. Does that make it unlawful for that person to perform and profit from that ability? Does that performer need permission to do so? Does being able to do it easily and at mass scale make a difference? Or was it that OpenAI seemed to want to intentionally use a voice similar to ScarJo’s and even when they could not obtain consent, went ahead with a voice similar to hers? Is that a violation? Who knows.
What I find interesting is that they are trying to create something inspired by the movie, “Her”. An AI that can form relationships with people. This raises so many questions. Is the AI sentient? If it thinks it is sentient is it really? How can we tell if it thinks it is sentient versus is it just repeating words that follow the human pattern that trick us into thinking it is sentient? If it is just repeating words, how is that different from a human growing up and learning words and behavior and reacting? What is it like for a human to have a relationship with an AI? How is it different from a human to human relationship?
In a HumanXHuman relationship, both people interact and grow and change, moreso in a close relationship such as family or close friends. In a HumanXAI relationship, does the AI change and grow? It is able to gain new input from the environment and from the human. Does that constitute growth? Is that AI able to come to conclusions and realizations about the relationship on its own? or with some level of guidance? Does the AI have an equivalent of human feelings or emotions? Does mimicking those emotions count? When a human has emotions and reacts, is the human mimicking learned behavior? Are these emotions based on environmental input triggering biological responses in the form of hormone release? Is there anything more to that? If that is it, then is there a AI or robot equivalent?
I do think there are things us that make us truly unique and distinct from AI machines. But the lines are blurring. Being human used to be determined by my ability to pick which pictures contained traffic lights in them. Now that AI can do this what’s left? :D
-
Not hotdog? Look how far we've come...
It looks like Apple Photos now has the ability to identify objects in photos using AI and let you search for them. Google Photos has had an early version of this since as far back as I can remember. A quick search shows 2015 as an early date where people started talking about it. It’s a little funny hearing some podcasters get so excited and gush over this, when this tech has been in my pocket for almost a decade already. Computer vision has advanced considerably since then, and we’ve come a long way since Jian-Yang’s “not a hotdog” app.
[sidebar: on X/Twitter, why doesn’t Elon Musk implement image recognition to detect questionable imagery and automatically put a temporary suspension on the account until it has been resolved by a human? The Taylor Swift fakes issue could have been mitigated. The technology is available]
Another technology that seems to be rolling out now, but was also nascent many years ago was Google’s ability to have an assistant make dinner reservations for you over the phone or other simple tasks like this using natural language processing. This was announced at Google I/O 2019. Google was way advanced here compared to others. The famous paper which enabled GPT, Attention is all you need, was produced by Google in 2017. That is the “T” in GPT. It seems like they shelved that technology since it was not completely safe. Which we can easily see in LLM’s today which often hallucinate (confabulate?) inaccuracies. I suspect they held off on advancing that technology because it would disrupt search ad revenue and that it was unreliable, dangerous even. Also, no one else seemed to have this technology. So back then it would have made perfect sense.
My first exposure to GPT was in 2019 when you were able to type in a few words and have GPT-2 complete the thought with a couple lines. Some people took this to the extreme and
wrotegenerated really ridiculous mini screen plays. Maybe that was with GPT-3 a year later. It was a nifty trick at the time. Look at how much it has grown now.Google is playing catch up, but I think they’ll be a very strong contender. After an AI model has been tweaked and optimized to its limit (if that’s ever possible) a significant factor in the capability of the model is its training corpus. That is the “P” in GPT. Who has a large data set of human-generated content that can be used to train a model? Hmmm… Just as Adobe Stock is a vast treasure trove of creative media, Youtube is an incredible resource of multimedia human content. We will likely see sources of really good content become very valuable. Sites like quora, stackoverflow and reddit where niche topics and experts in various areas gather will increase in value if they can keep there audiences there. To keep generating valuable content. I kinda feel like The Matrix was prescient in this. Instead of humans being batteries to power the computers, the humans are content generators to feed the models.
This ecosystem of humans being incentivized to produce content. Content being used to train a model. Models being used to speed up human productivity. So humans can further produce content. All this so I can get a hot dog at Gray’s Papaya for three dollars. Yes, you heard me right. It’s not seventy-five cents anymore…
-
I see yo (SEO) content. LLMAO?
Google’s search dominance is at an existential crossroads at the moment.
I remember before google, we had yahoo which attempted to manually categorize the internet according to its general taxonomy. It was human curated, so naturally, it could not scale. Back then, the internet was so small, they would highlight a “new site of the day” and it would be a random page about someone’s historical or culinary interests. Back then, it was a novelty to be able to put text and images on a digital page for the world to see. People were just starting to realize, “hey, maybe i can take orders for my business on this internet thing”. The next logical question was – But how would the world see my page?
Search engines were popping up like altavista, lycos, askjeeves, etc… But none of them got it right until google. They had a clever algorithm and they were able to scale as the internet grew. They offered free services like browser-based email. I’m not exactly sure when they realized how to monetize, probably when they hired Eric Schmidt, but once they did, there was no looking back and google had officially become a verb.
SEO, search engine optimization, became a fast-growing industry as companies consulted on how to help get your business website to the top of google’s organic search results. People started deciphering the algorithm that was constantly refined and tuned. They realized faster loading pages ranked better. They realized pages with 1st party and 3rd party links back to them did better. They realized page URLs with relevant human-understandable text did better. And so on.. And these are just basic strategies. SEO became an indispensable part of web content architecture and design.
These days we are starting to see people replace google search with various AI models - chatgpt, bard, pi, perplexity, etc.. there are just so many. I’m sure they all want to be the next “google”. In my brief experience tho, sometimes I google for a specific reason and I don’t want an engine to find the most relevant pages, and summarize and present the most common, banal answer to what I’m asking. Sometimes, I’m looking for a product to buy, sometimes it’s a video tutorial, sometimes I actually find what I want on the second page of google results! I don’t yet see the current state of LLM’s completely taking over google search.
I liken the LLM’s to bees' honey. It is produced by the worker bee consuming nectar and pollen and magically producing honey for us to consume. Am I a bit of a strange bird in that sometimes I might want raw nectar instead of the honey? Maybe the next step is for these LLM’s to do a better job of recognizing MY intent about what I’m searching for, or what goal I am trying to accomplish. Give me my nectar page! Not this sweet honey summary!
I recently suggested in an internal forum that SEO may become LLMEO. But maybe instead of an E, it should be an A for attribution? Maybe I should start a business called LLMAO Consulting? Who’s with me? Want to start the next wave of SEO/LLMAO strategy development? Want to work for a company with the coolest name ever? :D
-
Discontent with the intent of our content
Reading news today is increasingly becoming an exercise of determining whether it is worth tapping on a clickbait title and then deciding if it is worth it to try to skim through the article to eventually scroll down to the bottom where some conclusion related to the title is finally revealed. They force you to go through such inane, tangentially related commentary to fill up space, and force you to scroll through ads, only to end up with a moderately satisfying answer to why Olivia Rodrigo upset Taylor Swift or something like that.
The intent of that ‘journalism’ was not to provide information on a topic. The intent was not to provide commentary or critical analysis on some current event. The intent was to get your attention, fill up space, and force you to see ads of something you just bought on amazon 5 minutes ago. Is this what LLMs are being trained on?
And what about all the trolling that goes on in reddit or twitter/X or youtube, etc… The intent on those platforms is a mixed bag. Some are genuinely having meaning conversation and others are trolling. Are LLM’s being trained properly to be discriminating?
And what about these fake news sites that are being spun up overnight to affect SEO rankings? Again, the intent of these sites is not to provide meaningful information. Rather it is typically regurgitation of content on specific topic meant to tip the balances of SEO scoring, regardless of whether the content is true or helpful. The intent of that content is to be recognized by SEO monitors and skew scoring.
Someone jokingly said government should mandate that all AI-generated text must rhyme. I personally think that would be amazing. It’ll never happen, but it brings up a good point – it is hard to ‘watermark’ text, but it is something that is needed. These sites full of generated content can pop up quickly and with little effort, and they can have significant impact. Can LLM’s discern this kind of content from genuine human-generated content?
I’m sure there are lots of smart people already looking at these issues and developing solutions. But I think we need to be mindful of the potential negative impacts that generative AI can usher in.
Speaking of rhyming text, it’s crazy how easy it is to generate content today. Here’s a song created in seconds based on the text of this blog post. Will the chorus ever become an earworm? Will we ever have an “AI Top Ten”? who knows..
-
UX UI U-me U-you U-mami
I can tell when I haven’t eaten for a while. Food just creeps into my throughtstream. (would that happen to an AI, given certain motives and reward systems? …hmm interesting)
We’re seeing a lot of applications have AI assistance being added to them. Co-pilots, Twinkly stars, pop-up help boxes. Sometimes these can help guide the user through a complex sequence of steps to achieve their goal. What does this mean for application design? Will this lead to a trend where application UI designers can get away with being a little more lazy? I can envision the scenario – product MVP is set to launch; user feature testing is not quite hitting the mark; lightly supported research is showing that users can use AI assistance to use that feature and avoid friction that is naturally in the product. Product update ships without UI correction. UI issue never gets looked at again because “if it ain’t broke, don’t fix it” mentality.
Hopefully it doesn’t get like this, but I think it’s possible.
Should AI in UX aim to cover things up or make things disappear? To quote a notable fruit-named company leader
“Great technology is invisible” - Steve Jobs
For the past 50 years (or more?) humans have interacted with computers via punch card or keyboard or mouse. But now with advances in AI and LLM, the computer is learning “human”. To quote Andrej Karpathy, “The hottest new programming language is English”. This was just about a year ago and we’ve seen chatGPT explode because you interact with it in natural language.
Taken to the extreme, will we ever get to the point where the computer is invisible? or the application interface is invisible?
I’m excited for what kind of innovative product interfaces we’ll experience in the next couple years. I’m hoping designers will take advantage of AI more and use it as the foundation of their UX design. Extracting the user’s intent is key though. This can sometimes be challenging. And also, when the user is trying to perform precise actions, such as setting up an exact sequence of procedural variations on a material surface texture to be overlaid on a 3D mesh group. Some things will just require exact precision. Maybe Elon Musk’s Neuralink is on to something? What better way for a computer to understand intent than a direct connection to the brain?
There’s also a bit of serendipity with manual controls. You can set some wild parameters and get surprising results that an AI would probably consider outside the range of expected ‘normal’ results. So, there are pros and cons to manual UI and invisible UI.
Another thing that comes to mind is the UX of a restaurant. In the extreme “invisible” approach, I would sit down and tell the kitchen exactly what I want. In the normal UX, I get a menu and see what is available. Maybe I’ll try the scorched rice cube with spicy tuna and avocado? I wouldn’t have thought of that without a menu. Sometimes having open sky is not always a good thing.
So, kids, next time you’re designing an interface, remember to have AI in the UX, but don’t forget the Umami :)
-
Regulate Us!
It’s a little disturbing how easy it is to make “fake” content. I generated a video of myself delivering parts of JFK’s famous “Ask not…” speech, but in Korean.
I never said those words. My level of Korean language proficiency is barely enough to order food at a restaurant. I grabbed the text from a website and put it into a translator and pasted that into the transcript for what my avatar would say.
Seeing a video of myself say words that I never said is quite a jarring experience. The political implications of this are obvious, as witnessed in the recent Argentinian elections. Those AI-faked videos were of poor quality, but it was still impactful. How much moreso will high quality faked political videos affect unsuspecting, unaware masses? Takedowns can be issued, but the social networks haven’t had the best report card on moderating content that should be removed.
In a recent panel discussion with Yann LeCun and others, they talk about how the algorithm drives the ultimate output. For social media networks, the goal was attention. The algorithms were tuned to keep viewers attention and in turn sell more ads. It can be argued that this affected an unhealthy body image, since anorexic teen girl videos attracted a lot of attention and were thus repeated on people’s feeds. It can be argued that this resulted in higher levels of depression and negatively impacted mental health. It can be argued that because the goal of the algorithm was attention, when kept unchecked this resulted in a number of adverse societal issues.
There should have been better regulation. It can’t be left to a profit-driven entity to self-moderate. With AI we are seeing a more powerful technological force and the need for regulation is clear. But how can we achieve this? What is to stop someone from generating fake political videos and spamming targeted social feeds? The damage will be done before any regulation enforcement or takedowns can be enacted. So, what is the driving force behind the technological wonders we see every day? I believe it is a mix of profit and innovation in the name of profit. And again, there need to be guardrails so that the technology can grow properly and avoid misuse.
OpenAI seems to have tried to create a corporate structure that has non-profit and for-profit sides to it. The goal of the non-profit side is “to build artificial general intelligence (AGI) that is safe and benefits all of humanity”. Looking back at the fiasco that happened with Sam A’s firing and re-hiring, it is clear that the non-profit side lacks teeth. Well, maybe that fiasco is more telling of the mismanagement of the board, but Sam A driving innovation was clearly the winner.
And now I hear that fake nudes are on the rise :( There are bad actors out there who make it their life’s work to torture people online with content like this, fake or real. The most recent episode of Darknet Diaries podcast is a really insightful view of one person’s decades long struggle to combat this. One thing I learned from that podcast was that one of the most effective tools to combat this is DMCA takedown and that when the subject of the image owns the copyright, they can immediately request a takedown of the content. In the case of AI-generated, how does copyright apply? Victims might have to resort to less effective means if this route isn’t readily available to them.
Tech solution…?
Maybe we could “stamp” all AI model usage such that with everything that gets generated. Then, we can trace the content back to which specific instance of that model generated it and which specific user of that instance summoned the content. This is likely not possible with AI models being open-source. This allows anyone to run the code on their own and modify it as long as they understand what they’re doing.
Another approach would be to have each content generation call to a service that records the model, instance, and user information and tracks that content. Then enforce that generated images have this traceability whenever a platform allows images/video on it. Then we would also need to register regular images which can come from cameras or creative tools. This would be an immense task. And if this is all eventually accomplished, the remaining excluded set of content would be things illegitimately generated and thus more susceptible to being taken down from platforms. Maybe a news organization can implement this and receive acknowledgement as having trustworthy media.
We’re starting to see some companies explicitly self-regulate. Purple Llama from Meta is an attempt at this. It is interesting that a company who dropped the ball on social regulation now has an initiative on AI regulation. It makes sense though - if I can regulate myself sufficiently to satisfy the necessary parties, maybe the government won’t feel the urgency. Or if EU/US regulation is enacted, having a model that can easily adapt to this and comply with regulation will make them more nimble and quicker to win market share.
Maybe companies will wise up and realize the positive gains from doing the right thing. But there will always be bad actors out there, so there will still be need for government regulation.
-
Baby Baby Baby Oh!
I’m purposely avoiding the temptation to write about the Altman-OpenAI drama. It’s tiring, to be honest. Just wake me up when it’s all done.
onto AGI…
I think it’s interesting that in Korean, AGI can be read as the word for baby, pronounced “ah-gee” 아기
But I’m thinking about this because one of the goals of OpenAI is to produce AGI, artificial general intelligence. My first question is - how do they define AGI? Asking ChatGPT4 it gives 5 characteristics:
-
Adaptability: AGI can adapt to new tasks and environments without needing to be retrained or reprogrammed specifically for each new task.
-
Generalization: It can generalize knowledge and skills from one domain to another. For instance, learning from a game like chess might help it understand strategies in a completely different context.
-
Autonomous Learning: AGI has the ability to learn on its own, without human intervention, gathering information, and learning from its environment.
-
Understanding and Reasoning: Unlike narrow AI which follows predefined patterns, AGI can comprehend complex ideas, form abstractions and hypotheses, and use reasoning to solve problems.
-
Consciousness and Self-awareness: Some definitions of AGI also include elements of consciousness or self-awareness, although this remains a topic of debate and philosophical inquiry.
For me, I’d leave the last one out. And this is for the similar reasons I would ask myself, “why would I want to bring a new life form into existence?” A lot of people/couples do this all the time, but usually are committed to devoting incredible amounts of personal resources to the care and nurture of their 아기. Are computer scientists trying to create AGI because they are similarly motivated? I don’t know.
As for the first four characteristics? I’d be really curious to see what OpenAI’s path to AGI is. Some people claim an LLM like ChatGPT is AGI, but to me it is still a very advanced mimic. It is taking input it has received and it finds linguistic patterns and re-forms them according to the prompts and the patterns (weights, feature vectors, etc..).
I could see how school children are taught in the same way. They are given a prompt, they are given enough material to solve their prompt (training corpus), they are given a format to form their response in (one shot prompt example). At some point, something clicks and they are able to conduct this on their own and make connections to other areas and be curious and ask questions. On curiosity, did we train them somehow or are they innately programmed to be curious? Can we inject these qualities into an AGI? who knows!
Many see Langchain as a major step to AGI, and this could be a big part of it. I am not deep into the mechanics of it, but my understanding is that it allows for one step to follow the next. For example, if I had something trigger a thought and it led me to search on google and that provided more information on my original thought and that helped me set up subtasks to help me with a goal around my initial thought. and so forth…
I think we can probably get to something that looks very much like Intelligence, but ultimately it would still be a task-enabled super parrot. I can imagine the following being possible:
me: hey computer, can you do my homework for tonight?
computer: sure where can I find your homework?
me: go to google classroom
computer: okay, what’s your login and password?
me: [top secret whisperings]
computer: got it. i’m in. it looks like you have an assignment for algebra due tomorrow. should i work on that?
me: yes. thanks
computer: i’ve created an answer sheet that shows the work to solve the problems for tonight’s homework, shall I upload it to google classroom or will you review it first?
… and so forth
[sidebar: giving a computer system the autonomy to perform tasks like this should really be carefully thought out. i think I mentioned this in a previous post. But this is something that I believe should be regulated, somehow.]
It’s interesting how the voice interaction of ChatGPT4 makes you feel like you’re conversing with a person. That anthropomorphism is quite interesting and I wonder if it is part of OpenAI’s plan to get human interaction with it’s AI to help train it in a specific way.
As for consciousness and self-awareness. I am reminded of a philosophy class on personhood. Self-awareness seemingly is achieved by interacting with everything around you. A baby gets visual and tactile and audio input from the environment around it. It has all these informational signals coming in and eventually learns to make sense of it, very much by interacting and playing with it. It interacts and the environment responds. It pushes a ball and it see the ball roll away. It sees the world around it and it sees itself interact in the world. Maybe an AI needs to attain “self-awareness” with these baby steps? Maybe this is why Nvidia is creating its Omniverse. What better place to train something on a world than a safe virtual world that you can fully define and control? and hopefully firewall from the outside world.
It will be neat to have an assistant autonomously do things for you. I think this is as far as AGI should go. Trying to create a new sentient life form is a bit too much for me. People are going to try it for the sake of science, but I don’t know if it is achievable.
For one thing, I think the Turing test is now proven ineffective. I don’t think we’ve reached AGI with these LLM’s, even though they seem pretty darn human enough to have already fooled some humans into thinking they are “alive.”
Often, advances in science help us to ask ourselves questions. I think with AI, the questions are
- what does it mean for a thing to be intelligent?
- what does it mean for me to be intelligent, or for me to be human?
- am I just a wet-ware LLM? taking in input all around me and resynthesizing it into response patterns?
The answer to number 3 is emphatically, “No.” I, and you biological units reading this are human. You create, you learn, you are unique. You do things that no machine can do or ever will do. Creativity is at the core of being human and I do not believe that a silicon-based entity can have that kind of unique creativity. Or can it? Who knows..
-
-
What lies beneath...
Last week’s announcement at OpenAI’s inaugural keynote marked a turning point, in my opinion. The ability to easily create a custom AI assistant without code democratizes capabilities that software teams have been working on feverishly for the past few months.
What I am curious about is – all of the people who will train a custom model on content that they are working on. Typically this would be content that is in a knowledge base or wiki or intranet within a company. This is an entirely different set of data that has been largely unavailable.
I started my career building the web for enterprises. I initially worked on a lot of corporate intranets. I even worked on a couple that won awards for best intranet of the year, back when that was a thing. I distinctly remember an infographic produced by an analyst like Gartner or Forrester stating that over 90% of the content out there is unseen, much like an iceberg has most of its mass underneath the water.
I see this “unseen” data now being gathered by OpenAI. It is still firewalled and not being used to train their general model, but that is the case today. Who knows what means or incentivization they will use to incorporate this content later on? One of the most critical elements for an AI to succeed is its training corpus. I think OpenAI is trying to amass the largest training corpus possible.
There’s still the legality of all the data it is trained on that is on the public internet and whether or not using it to train an AI model falls under “fair use.” This may not even matter if we look at Spotify as an example. In the case of Spotify, they were streaming music that they didn’t have the rights to and a class action suit was filed against them. They settled the class action and used it to create a vehicle to pay for the ability to stream this content. This seems like a likely course of action for OpenAI.
So, what is the next move for OpenAI? It will likely forge more ties to enterprises to get closer to the “content that lies beneath” in corporate intranets. Having Microsoft as a strategic partner/investor and also the distributor of Sharepoint/OneDrive is quite convenient in this case. Who wouldn’t want a work assistant that is trained on all the content that I’m drawing from daily to speed up what I’m doing and increase my productivity? That’s the carrot right there.
-
Twinkle Twinkle AI Star
[this post, after the prompt, was entirely AI generated. Is this the future of content on the internet? I must admit, it is a pretty good write up]
prompt: the icon for AI assistant buttons in many applications is a twinkling star. tell me how the song twinkle, twinkle little star embodies the current ethos and sentiment towards AI assistance
response: The song “Twinkle, Twinkle, Little Star” is a classic English lullaby with lyrics from the 19th century. The melody is an older French tune, and the lyrics are from a poem by Jane Taylor titled “The Star.” This song has a sense of wonderment about the natural world, encapsulated in the opening lines:
Twinkle, twinkle, little star, How I wonder what you are!
This sense of wonder can be likened to the curiosity and amazement that AI assistance evokes in people today. Just as the star in the lullaby is up above the world so high, AI often seems like a distant, almost magical entity that’s both fascinating and a bit out of reach in terms of understanding how it works. Here’s how the song might embody the current ethos and sentiment towards AI assistance:
Wonder and Curiosity:
Like a child’s wonder at the twinkling of a star, people marvel at the capabilities of AI and are curious about its potential. There is a sense of fascination with how AI can process information and assist with tasks that range from the mundane to the complex.
Accessibility and Ubiquity:
Just as stars are visible everywhere around the world, AI assistance has become ubiquitous in modern life, integrated into smartphones, home automation devices, and various applications. It’s a constant, accessible presence that can be summoned to help at any time.
Guidance and Assistance:
Stars have been used for navigation and guidance for centuries. Similarly, AI assists and guides users through complex digital landscapes, helping to find information and complete tasks.
Simplicity and Clarity:
The simple and clear nature of the lullaby is echoed in the user experience that AI assistants strive to provide. They aim to simplify complex processes and provide clear answers to questions, much like the clear and bright twinkling of a star.
The Unknown and the Unreachable:
There is also a parallel in the mystery surrounding stars and AI. Just as a star’s true nature and the full extent of the universe are not completely understood by all, the workings of AI are often not fully comprehensible to the average user, adding a layer of mystique to its presence.
Continuous Presence and Dependability:
Stars are a constant in the night sky; similarly, AI assistance is designed to be always on and reliable, providing consistent support whenever it’s needed.
Bright Potential:
Stars symbolize hope and potential. The burgeoning field of AI promises many advances and positive changes in the way we live and work, akin to a future that seems bright when we look at the stars. The use of a star icon for AI assistants encapsulates these attributes—wonder, guidance, ubiquity, simplicity, and the promise of a better future—reflecting the sentiment and ethos that people hold towards AI as a helpful, constant, and somewhat enigmatic presence in their lives.
-
My precious (content)
[Originally, I had generated an image with Gollum hoarding books and pages (hence the title), but it was too dark. I decided to go with an Oprah theme instead. You get a GPT, You get a GPT. Everyone gets a GPT! ]
OpenAI just announced GPTs in GPT-4 Turbo. This truly is a turbo accelerator. It’s a very exciting announcement as well as properly formed JSON output and repeatable outcomes with seed parameters. With GPTs, it looks like you’ll be able to easily create assistants for almost anything. The limiting factors are your imagination and your training dataset. We’ve already seen companies taking their data private or adding paywall barriers to API, like Reddit. How much further will organizations close access to data?
On the one hand, ideas and conversation and progress is advanced with open data, but many organizations see the data as a valuable commodity. How will companies balance between letting consumers interact with data and services/API interact with data? Will more companies start paywalling their content? Is this the end of an open content/data era? Will organizations be able to get their content out to consumers to grow their endeavors, but still keep control of their content when it comes to training models?
How will this affect design? Will having a virtual assistant allow for application/product owners to be a bit less rigorous in their design and UX since a virtual assistant can now guide them through their tasks? This is another form of lowering the floor and raising the ceiling. More non-pros will be able to produce somewhat “pro-level” work with the help of these assistants. Pros will save time having assistants do things for them that were previously tedious.
I am still wary of the ominous statistical median. The content outputs of these models trend toward the most statistically average response to the prompt that is fed to it. Who wants the most statistically average campaign brief? The most statistically average outfit? Is this all a big race to the middle? Maybe, sometimes “just good enough” is good enough?
Again, rambling, I haven’t gotten to my outputting my thoughts on AGI and personhood. Maybe I’ll let that stew in my brain for a bit more. To quote Johnny Five, “Need more input.”
-
What are we feeding our (AI) kids?
This halloween kids, young and old, will be consuming a bit more candy than we typically do. It leads me to think about the adage, “you are what you eat.” While somewhat true, our bodies are incredibly adept at breaking things down into raw materials and re-synthesizing them into proteins and enzymes and cells and so forth. On a more metaphysical level, we are a product of nature and nurture, our genetic makeup and our environmental factors such as our upbringing and the neighborhood around us and all manner of inputs we receive everyday.
To me, the AI models today are trained on somewhat skewed inputs and we should keep that in mind. Several years ago, an AI chatbot was placed on Twitter and it “learned” to become quite representative or some of the more vocal parts of Twitter, namely trolls. While this experiment attracted a lot of trolls who purposely fed it hateful content, it is a reminder that a machine with a base corpus of training needs guidance and a way to properly process what it is taking in.
A lot of what is used to train these models seems to be - whatever you can get your hands on. A lot of the public text out there is actually very good. There is a lot of great expert discussion out there. This helps up generate really smart confident assertive cover letters and essays. But there is a certain difference between what people post publicly and what might be considered more normal language. There’s also a difference between what images are presented on the internet versus what is seen in our every day lives.
An AI model tends to present the most average result with some tweaks for variation. This is the goal of an LLM - to re-sequence tokens (words) into a statistically averaged order to seem to provide natural response. And the result is a well-thought carefully constructive five paragraph essay on a topic like “free speech and net neutrality.” But it is still a very sophisticated statistical average.
A friend once called me, “the master dot connector.” I really like that title. I take pride in being able to have a cross-disciplinary view of things and seeing things in a different light, making relevant connections where others may not have seen them. I wonder how much “dot-connecting” AI models can do and if they can generate novel insightful points of view.
LLM’s are a powerful tool, but I think it’s important to understand what they are and how they work and not just be amazed that it can produce conversations and images and code like a human can. It’s a bit sad that some humans have developed relationships with AI chatbots and even fallen in love with them and have committed illegal acts being spurred on by these chatbots. The line is very blurry, but these are still machines. Incredibly cool machines, but still machines, not people.
I’ve rambled, but I’m too lazy to edit and re-write this to be more cohesive. I’m human..
-
Always on AI-assist
I’m actually kinda intrigued with the Meta Ray-ban glasses. I had been watching Snap Spectacles and Meta’s previous collab, Ray-ban Stories. I think this latest attempt might be useful now given the acceleration in development of multimodal AI models. With Chat GPT, I find myself pulling out my phone more often to take pictures of things I want to ask about, such as rough calorie estimates or translations or price estimates. There’s a bit of friction in getting the phone out and taking the pic, sending it to the app, typing or speaking a question. I would much rather prefer a flow where I’m looking at something, say a trigger phrase and ask my question about what I’m seeing, and hear the response through the speakers. I’m not too interested in posting/streaming to Insta, that’s a bit niche, but I think if they can put together a smooth AI-assistant user experience, this can be a useful product.
-
A couple days ago, I noticed the new NYC subway robocop. It’s interesting how things are getting more automated. Robocalls, AI, self-driving cars, robopopo - who knows what things will be like 5, 10 years from now.
-
Test post
This is a test post. Things on my mind at the moment - free speech and moderation on social platforms, effect of work from home on real estate and work culture in general, with everyone rushing to AI what determines who will win and who will lose, what if all these generative ai models had been released while we were all in covid lockdown?