Hard things are easy. Easy things are hard
I’ve totally murdered the quote, but chatGPT helped me attribute that to Andrew Ng (my 2 minutes of fact checking via Google and ChatGPT haven’t been definitive). This idea is about how computers can be good at some things that are really hard, but some things that seem really easy to us are really hard for computers. For example, with your computer, you can now generate a full length essay within seconds on a given topic. That is something hard for humans to achieve (especially given the time frame), but now is easy with LLMs. Or AlphaGo beating humans at the game, Go (Baduk). Mastering Go is hard for most people, but the computer can now beat human grand masters. Something as simple as walking, is easy for people, but for a computer/robot to learn the subtle muscle movements and balance of a walking, running, or jumping biped, can be very difficult.
Likewise, I’m seeing GenAI following a similar pattern, but maybe for different reasons. In image generation, it seems easy these day to generate an amazing fantastical image that you would never have imagined before. Hard for human, simple for silicon. Using this as a raw technology for image generation de novo is great, but not always practical. We don’t always need images of panda bears playing saxophones while riding unicycles. Using this tech for image manipulation on the other hand is incredibly useful. GenFill and GenExpand are becoming a standard part of people’s workflow. Artists can now perform these tasks without having to learn clone stamp or perspective warp or spend countless hours getting the right select mask. Another example: using a raw image generation tool like midjourney to perform a GenExpand is a bit hacky. These instructions were provided by a redditor on how they would do it:
“but if you wanted to only use Midjourney, here’s what I would do:
Upload the original image straight into the chat so it gets a Midjourney URL, and upload a cropped version of the same image and also copy that url (trust me).
Then, do /describe instead of imagine and ask it to describe the original (largest) image.
Copy the text of whichever description you think fits the image best.
Do /imagine and paste the two URLs with a space between them [Midjourney will attempt to combine the two “styles”, in this case two basically identical images], and then paste the description Midjourney wrote for you and then a space and put —ar 21:9 or whatever at the end (just in case you didn’t know already, “—ar” and then an aspect ratio like “16:9” etc. will create an image in that aspect ratio :)”
What I see in GenFill is an exciting raw technology made into a useful tool.
I feel like we’re in another AI winter. Pre-covid, AI winter was when little attention was paid to AI, except for some sparks of spring like GPT 2. Then later, OpenAI blew everyone’s minds with their work on LLMs. This current “AI winter” is where we’re gotten used to the hype of the tech demos we’ve been seeing and now refinement and practical application has to happen. As well as safe guards, hopefully… This takes a lot of work, but it will be fun to see what comes next.
When I think of easy/hard/hard/easy, I keep coming back to a story a friend told me. She was an inner city math teacher and one of her students just couldn’t get multiplication right except for his 8 times table. This was really puzzling, so she dug in a bit more and the student revealed that he need to know how many clips to bring on a particular occasion and that his pistol had 8 bullets to a clip. Fascinating. But also a reminder that a lot of our knowledge is based on rote memorization. When you want to multiple 8 by 3, all your years of education tell you to rely on your memory that the answer is 24. But that path to the answer is pure memorization. Another method is to form objects in groups of 8 and form 3 groups of them and then count them all up. Isn’t rote memorization closer to what an LLM is doing as opposed to forming a strategy on how to solve something? But now, LLM can even strategize and break a complex task into simpler smaller parts. But even this function is advanced text/token retrieval and manipulation. it seems so close to human intelligence. and again, it makes me wonder what is human intelligence and what is silicon intelligence. as well as other concepts of sentience, consciousness, self-awareness, etc… Maybe I’ll ask chatGPT to help me sort through these things…
Wednesday July 31, 2024