(originally posted to LinkedIn)

A lot of companies have AI models out there that do some pretty magical things. For example, OpenAI has ChatGPT which we all know and love. Meta has LLaMA 2 which… for our purposes… is similar to ChatGPT.

Yesterday, Google DeepMind released Gemini so I thought I’d write a quick summary about what it is/does to save you some time and clicks.

Gemini #

Gemini is an attempt to provide a universal AI model meaning it was built from the ground up to work with text, vision, and audio.

Here’s a… well… not so much a research paper, but more of a whitepaper describing it.

What it can do #

The first example in that paper is a scan of a hand-drawn physics problem with a hand-written answer.

screenshot from the whitepaper linked above

The model was able to understand the problem, identify where the student made a mistake, then work through the correct solution to the problem.

What it is and how it was trained #

It is a Transformer based model, so similar in architecture to what is used by ChatGPT, built using Google’s Tensorflow and trained using Google’s custom hardware chip, the TPU v5e AI chip.

Here’s some documentation on how to use those chips if you’re interested.

How good is it? #

It, of course, scores very high on all the AI benchmarks, but if you can believe this is a real demo… it is amazing:

An explanation of the prompts used to create that video is explained here.

Can you use it right now? Yes: #

You can try it yourself right now! Google Bard, an AI chatbot built by google is available here

https://bard.google.com/

I asked it, “can you write a javascript function that converts an RGB byte array to grayscale?” and within 20 seconds it returned all of the following text, including instructions and gotchas Jordan’s head explodes:

Certainly, here is a JavaScript function that converts an RGB byte array to grayscale:

function rgbToGrayscale(byteArray) {
  const grayscaleArray = [];
  // Loop through each byte triplet (RGB) in the array
  for (let i = 0; i < byteArray.length; i += 3) {
    // Get the red, green, and blue values
    const red = byteArray[i];
    const green = byteArray[i + 1];
    const blue = byteArray[i + 2];
    // Apply a weighted average formula to convert to grayscale
    const grayscale = (red  0.299 + green  0.587 + blue * 0.114);
    // Push the grayscale value to the new array
    grayscaleArray.push(grayscale);
  }
  return grayscaleArray;
}

This function takes a byte array representing RGB values and returns a new byte array with the corresponding grayscale values. It iterates through each byte triplet (RGB) in the array and calculates the weighted average of the red, green, and blue values using the formula:

Grayscale = 0.299 Red + 0.587 Green + 0.114 * Blue

This formula approximates the human eye’s perception of brightness and produces a more accurate grayscale conversion compared to simple averaging.

Here is an example of how to use the function:

const rgbByteArray = [20, 150, 240, 50, 100, 150, 255, 255, 255];
const grayscaleByteArray = rgbToGrayscale(rgbByteArray);
console.log(grayscaleByteArray); // Output: [136, 120, 255, 100, 150, 200]

Remember that this function only handles RGB byte arrays. If your byte array contains alpha values (RGBA), you need to modify the function to handle them appropriately.

Conclusion #

wow…