Google's "Gemini"
CEBU, Philippines — Google’s Gemini has recently roused the attention of techies in being portended to be the firm’s forthcoming contender in the ongoing AI race.
Introduced during Gogole’s I/O developers conference, tech industry insiders and rumor mills are saying that it is set on making its debut in the tail end months of 2023, and that it will likely dethrone Open AI’s ChatGPT as the leading generative AI suite once it rolls out.
But what exactly is Gemini and why are many saying that it is set on outperforming ChatGPT? Read on and find out.
Gemini, in a Nutshell
Gemini is essentially a type of AI – specifically, a large language model or LLM that works seamlessly with text.
Its capabilities are similar to that of GPT-4 (which is the LLM behind ChatGPT), but what makes it different is that is noted to be comparably as “smart” as the AlphaGo AI program that defeated the Go board game world champion in 2016.
Dennis Hassabis – the CEO of Google’s DeepMind subsidiary – characterizes Gemini this way: “At a high level, you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models.”
This association with AlphaGo is what sets Gemini apart from Bard (Google’s current competitor to ChatGPT), and is largely the reason why many are intrigued with just what it can do, given that its development comes with integrations of a program that managed to beat a human Go board game world champion.
Its Rationale and How It Works
More than just being “smarter”, Gemini is foreseen to power a number of Google’s existing offerings like Bard and enterprise apps like Google Slides and Docs once it is rolled out.
Its development came to be after Google brought together its artificial intelligence research unit Google Brain with DeepMind (the firm responsible for developing AlphaGo) in April this year, and its “learning capacities” are largely based on the combined AI-oriented developments that the two entities have been gearing towards.
Industry insiders note that Gemini utilizes a new architecture that merges a multimodal encoder (which converts data types in to common language) and decoder (which acts upon converted data types).
The combination means that the decoder can generate outputs in various modalities, and since it can produce such outputs, it means that Gemini – as an LLM – is not dependent on what it has been trained on but can “evolve” from information that it originally trained on.
But outside of the AI’s portended “smartness”, Gemini’s strength really lies in the library of information it has access to.
As a Google-backed project, Gemini has access to a huge library of information which it can “study” and “learn” from, like YouTube videos, Google books data, Google’s search index data, information from research and studies archived in Google Scholar and so much more. This library of information is essentially exclusive to Google and works to Gemini’s advantage.
Gemini is also portended to be the first multi-modal model that can handle video as well as text ang images, which is something which GPT-4 currently cannot.
Apart from information access, Gemini is also being actively developed by a talent pool that has years of experience in training large language models, and also has access to several already-active platforms like Google’s cloud-based offerings.
Generative AI Basics
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data.
The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
It should be said that the technology is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs – a type of machine learning algorithm – that generative AI could create convincingly authentic images, videos and audio of real people.
ChatGPT, Dall-E and Bard are examples of popular generative AI interfaces that’ve largely become the face of AI tech.
The real-world applications of generative AI interfaces are quite diverse, given that it can be used in practically all industries and fields – from finance to the legal field, media to medical industry, design and gaming.
Google’s Gemini is portended to be the next big player in the arena of AI developments, though just how game changing it will be will have to wait until it is formally rolled out.
How it’ll Stack Against ChatGPT
This early on, experts already have a rough idea how Gemini would stack against ChatGPT. GPT-4 – the engine that powers ChatGPT – is a large language model that operates between 1 trillion and 1.7 trillion parameters, and it can write essays, translate languages and quickly answers queries, but is limited to mostly text-based data.
Gemini, on the other hand, is a multimodal intelligence network that’s capable of handling data-driven requirements, images, audio, videos, 3D models, and graphs.
Because Gemini is the product of model networks, it can handle multiple requests simultaneously without limiting itself, and it is likely to outperform ChatGPT in this regard.
But though many experts are convinced that Gemini will make waves in the AI field, how it would fair in the AI race will really have to wait until it is launched and, more importantly, how well it works for those who have need of its features and functions. — (FREEMAN)
- Latest