Generative AI

Frank Stepanski
10 min readJul 18, 2023

Making machines feel like Picasso on a caffeine high.

Generative AI is changing how we create. For the first time, humans are supervising and machines are generating. Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.

Deep learning is a field within machine learning that deals with building and using neural network models. Neural networks with more than three layers are typically categorized as deep learning networks. Neural networks mimic the functioning of the human brain.

A neural network is like a stack of layers. Each layer has units that are connected to units in the previous layer using special values called weights.

When you give the network something to process, like an image, it goes through each layer one by one. This is called a forward pass. Each layer transforms the input a little bit until it finally reaches the output layer.

The output layer is where all the processing comes together. It has a single unit that gives a probability, like a chance percentage, that the original input belongs to a specific category. For example, it might tell you how likely it is that the input image shows a smiling face.

The magic of deep neural networks lies in finding the set of weights for each layer that results in the most accurate predictions. The process of finding these weights is what we mean by training the network.

AI 101

AI can be described as performing tasks that typically humans can do such as picking up a salt shaker from the table between other objects on a table.

Being able to identify a salt shaker from other objects is done because our mind unconsciously has been trained with thousands or millions of salt shakers to identify it.

With AI, you feed it with thousands, millions, and trillions of content, and then you teach a certain algorithm to generate outputs and solutions as a result.

  • AI — represents the broad field of creating systems that can perform tasks, showing human intelligence and ability and being able to interact with the ecosystem.
  • Machine Learning — focuses on creating algorithms and models that enable those systems to learn and improve themselves with time and training.
  • Deep Learning — encompasses deep ML models. Those deep models are called neural networks and are particularly suitable in domains such as computer vision or Natural Language Processing (NLP). When we talk about ML and DL models, we typically refer to models whose aim is that of making predictions or inferencing patterns on top of data.
  • Generative AI — uses those powerful Neural Network models to generate brand new content, from images to natural language, from music to video.

Other types of AI may generate content but as a side effect of their primary function.

What is Generative AI?

As general AI involves pattern recognition or prediction, generative AI aims to produce new data that resembles the patterns and characteristics of the training data it has been exposed to.

Generative AI refers to machine learning models — such as ChatGPT, Bing AI, DALL-E, and Midjourney — that are trained on vast databases of text and images to generate new text and images in response to a prompt.

In order to build a generative model, we require a dataset consisting of many examples of the entity we are trying to generate. This is known as the training data, and one such data point is called an observation.

Each observation consists of many features.

For an image generation problem, the features are usually individual pixel values; for a text generation problem, the features could be individual words or groups of letters.

A generative model must also be probabilistic rather than deterministic because we want to be able to sample many different variations of the output, rather than get the same output every time. If our model is merely a fixed calculation, such as taking the average value of each pixel in the training dataset, it is not generative. A generative model must include a random component that influences the individual samples generated by the model.

When performing discriminative modeling, each observation in the training data has a label. In contrast, generative modeling doesn’t require the dataset to be labeled because it concerns itself with generating entirely new images, rather than trying to predict a label of a given image.

There are a variety of different generative AI models. These AI models are written and manufactured by groups of highly advanced computer vision specialists, machine learning experts, and mathematicians. They’re built on years of open-source machine learning research and are generally funded by companies and universities.

Some of the big players in writing these generative AI models, engines, are Open AI, NVIDIA, Google, Meta, and universities like UC Berkeley and LMU Munich. They can either keep these models private, or they can make them public (open source), for those to benefit from their research.

The Rise of Generative Modeling

Until recently, discriminative modeling has been the driving force behind most progress in machine learning. This is because, for any discriminative problem, the corresponding generative modeling problem is typically much more difficult to tackle.

In the last 10 years, many of the most interesting advancements in the field have come through novel applications of machine learning to generative modeling tasks

While algorithms for neural networks have existed for some time, the advances in large-scale data processing, as well as inference technologies, like GPUs, have spurred their popularity in real-world applications.

For example, the figure below shows the striking progress that has already been made in facial image generation since 2014.

The development of generative AI has a very rich and very fascinating history marked by significant breakthroughs, even though it gained widespread attention in 2022, its evolution was built on decades of mathematical research, starting with auto-encoder neural networks in 2006 and continuing on through the mass adoption of generative AI models.

Description of Gen-AI landscape categories:

  • Text: Summarizing or automating content.
  • Images: Generating images.
  • Audio: Summarizing, generating, or converting text in audio.
  • Video: Generating or editing videos.
  • Code: Generating code.
  • Chatbots: Automating customer service and more.
  • ML platforms: Applications / ML platforms.
  • Search: AI-powered insights.
  • Gaming: Gen-AI gaming studios or applications.
  • Data: Designing, collecting, or summarizing data.

Advantages and Disadvantages

As workers, generative AIs have several advantages.

First, they can exhibit deep knowledge of their training data, sometimes deeper than human professionals. Second, they compute responses many times faster than even teams of humans. Finally, they are inexpensive compared to human workers.

But they also have severe limitations. First, they are prone to making things up, known as hallucination, as well as bias. Second, they don’t have common sense like humans do, because they don’t have the experience of a human environment. Third, there are challenges in implementation.

Generative AI is not automatically fit for every workflow. Adaptations in user interface design or how teams work are necessary for integration.

Natural language models

Natural language generation is perhaps the most well-known application of generative AI so far with chatGPT in the headlines.

GPT stands for generative pre-trained transformer.

It’s a language model developed by OpenAI, a research organization focused on developing and promoting friendly AI. The idea of pre-training a language model and finding it on a task-specific data set isn’t something new. This concept has been around for decades and has been used in several other models before GPT.

However, GPT has become notable for its large scale, use of transformer architecture, and its ability to generate human-like text, which had lent to its widespread use and popularity in the field of natural language processing.

GPT-3 can take in a prompt, like a topic or a sentence, and can generate text based on that prompt. It can even continue a story or a conversation you started earlier.

Here are a few industrial applications.

  • GitHub Copilot — is a generative AI service provided by GitHub to its users. The service uses the OpenAI codex to suggest the code and entire functions in real-time right from the code editor. It allows the users to search less for outside solutions and it also helps them type less with smarter code completion.
  • Microsoft’s Bing — implemented chatGPT into its search functionality, enabling it to reach concise information in a shorter amount of time.

Since OpenAI made ChatGPT-3 available to the public on November 30th, 2022, it reached 1 million users in less than a week. It took Netflix 49 months to reach 1 million users. It took Twitter 24 months. It took Airbnb 30 months, Facebook 10 months, and it took Instagram 2 1/2 months to reach 1 million users.

However, GPT-3 has several limitations, such as the lack of common sense, creativity, and understanding of the text it generates, also biased data sets, and the danger of normalization of mediocrity when we come up with creative writing.

Natural language models synthetically mimic human capabilities, but clearly, conscious contemplations are required before developing generative AI tools.

With the evolution of GPT-4, released in March 2023, according to OpenAI, this next-generation language model is more advanced than ChatGPT in three key areas: creativity, visual input, and longer context. In terms of creativity, OpenAI says GPT-4 is much better at both creating and collaborating with users on creative projects. Examples of these include music, screenplays, technical writing, and even “learning a user’s writing style.”

Prompt Engineering

Prompt engineering refers to constructing inputs that help us get the most out of generative AI, and language models in particular.

A prompt is an input you provide, typically text when interfacing with an AI model like ChatGPT or Midjourney.

This is a fast-changing ecosystem and some of the technologies where prompt engineering really shines are the GPT-related technologies. That is GPT-3 and 4, the Jurassic models, and any large language model.

It’s important to note that an increasing amount of the time state-of-the-art models will give you good enough results on your first try. For any throwaway interactions with an AI, where you don’t plan to do the same task again, the dumb approach is all you need.

However, if you planned to put this prompt into production, say you were building a product name generator, there are some obvious issues you’d want to attempt to fix:

  • Direction: You’re not briefing the AI on what types of names you want. Do you want a single word or a concatenation? Can the words be made up or is it important they’re in real English? What sort of audience are you hoping to attract?
  • Format: You’re getting back a list of newline separated names, of arbitrary length. When you run this prompt multiple times you’ll see sometimes it comes back with a numbered list, and often it has text at the beginning which makes it hard to parse programmatically.
  • Examples: You haven’t given the AI any examples of what good names look like. It’s autocompleting using an average of its training data, i.e. the entire internet, but is that what you want? Ideally, you’d feed it examples of successful names, common names in an industry, or even just other names you like.
  • Evaluation: You have no feedback loop here to identify which names are good or bad, or improve the quality of the name generator over time. If you can institute a rating system you can optimize the prompt to get better results and identify the times when it fails.

Prompt engineering is the process of discovering prompts that reliably yield useful or desired results.

The Five Principles of Prompting are as follows:

  • Giving Direction: Describe what you’re imagining, to get output that matches your vision.
  • Specifying Format: Define your required response format, to minimize time spent parsing errors.
  • Providing Examples: Integrate examples in your prompts, and improve the reliability of your output.
  • Evaluating Quality: Identify errors to iterate and improve the reliability of your responses.
  • Dividing Labor: Use the right model and the right prompt for the right job, then chain them together for sophisticated tasks.

The Future of Generative AI

Generative AI’s impacts will not be evenly distributed. Knowledge of how to use the technology, known as AI Literacy, is also required to thrive in the new world of generative AI. Until more people can access and learn how to use the tools, this AI divide means that more affluent individuals and nations will benefit disproportionately from generative AI in the near term.

AI’s impact is not simply replacement.

Just as economies adapted to ubiquitous personal computing, we will again need to adapt to generative AI. Education systems will also be reshaped around always-on access to AI. Instead of memorizing facts, students will learn how to better prompt generative AIs to get further in their studies.

Instead of completing complicated but redundant tasks alone, workers will seek the help of a partner AI so they can do more and better. This will require major adaptation. Governments and companies that wish to thrive in the new era will help their citizens or employees close the “AI divide”. They may also consider policies like basic income schemes to support those struggling with job loss.

As generative AI advances far enough, human values will also need to adjust to having a new intelligence around.

--

--

Frank Stepanski
Frank Stepanski

Written by Frank Stepanski

Engineer, instructor, mentor, amateur photographer, curious traveler, timid runner, and occasional race car driver.

No responses yet