Comparing Native Image Generation: ChatGPT vs. Gemini – Which Is Superior?

Comparing Native Image Generation: ChatGPT vs. Gemini – Which Is Superior?

Google and OpenAI are competing closely in the realm of native image generation. Following Google’s announcement of native image capabilities in its Gemini model, OpenAI quickly rolled out support for native image generation within ChatGPT. This article evaluates the performance of both models, focusing on various aspects such as character consistency, how well they render text, and adherence to given instructions.

1. Transforming Yourself into an Anime Character

The first task I set for both models was to generate an anime-style image. ChatGPT delivered a stunning output immediately, capturing the iconic Studio Ghibli style in a single attempt. In contrast, after numerous prompts, Gemini struggled to produce a viable anime image, failing to meet the expectations set by ChatGPT. Below are the results:

2. Illustrating a Whiteboard Session

Next, I tasked both models to create an image of a man explaining the concept of relativity on a whiteboard. ChatGPT, powered by a more extensive model, produced an impressive image featuring clear handwritten text and even included a reflection of the photographer. On the other hand, Gemini encountered difficulties with text rendering on the whiteboard, although it did depict the character convincingly.

3. Creating a Menu Card

In this exercise, both models were instructed to design a menu card. ChatGPT crafted a visually appealing card with mostly accurate text rendering, despite missing one item. Conversely, Gemini struggled significantly with text, producing jumbled words that did not make sense. This discrepancy highlighted ChatGPT’s ability to interpret instructions better, particularly when presented with complex prompts.

4. Designing an Infographic

For the next task, both models were asked to create an infographic about gravity, featuring Isaac Newton. ChatGPT excelled in this area, producing a well-designed, readable infographic. It proved that its capabilities extend to creating engaging educational materials, such as comic strips or visual guides. Meanwhile, Gemini returned an output that lacked coherence and clarity, failing to deliver a useful infographic.

5. Restyling Existing Images

In the test of enhancing existing images, I provided both models with a picture of a cactus plant in a garden and asked them to add colorful flowers. ChatGPT tended to overinterpret instructions, resulting in overly altered images, while Gemini managed to retain image consistency across generations.

Experts suggest that ChatGPT’s native image generation relies on a different architecture compared to Gemini, which explains the variations in output with each iteration.

6. Blending Images Together

In a task involving two images, I prompted both models to create an image of a woman holding a mug. Both managed to produce satisfactory results, but Gemini displayed a slightly higher level of creativity by altering the posture of the figure.

7. Changing the Point of View

When tasked with changing the perspective of a hallway image, both models provided similar outputs. However, ChatGPT’s version was more aligned with the original image, while Gemini added unnecessary elements that detracted from accuracy.

8. Rendering a Wall Clock Showing 6:30

For the final test, both models were asked to illustrate a wall clock displaying 6:30. They both defaulted to showing 10:10, a common issue in AI image generation linked to biases in training datasets. This highlights a shared limitation in accurately following specific instructions.

Please follow and like us:

Related