Skip to main content
April 2, 2026Dan Rodney/6 min read

Creating Images and Videos with ChatGPT

Master AI image generation with ChatGPT DALL-E

ChatGPT Version Requirements

Image generation requires ChatGPT 4.0 or the older version of 4.0. The 4.0 mini versions cannot create images, do data analysis, or access advanced features.

ChatGPT Version Capabilities

FeatureChatGPT 4.0ChatGPT 4.0 Mini
Image GenerationYesNo
Data AnalysisYesNo
Advanced FeaturesFull AccessLimited
Message LimitsHigherLower
Recommended: Use full ChatGPT 4.0 for image generation capabilities

Getting Started with AI Image Creation

1

Check Your Version

Ensure you're using ChatGPT 4.0 or older version 4.0, not the mini version

2

Alternative Options

If you run out of messages, use bing.com/create as a backup option

3

Craft Your Prompt

Be specific about what you want to create, including details about style and composition

Free Alternative

When you run out of ChatGPT messages, bing.com/create provides similar DALL-E functionality with some limitations on daily image creation.

AI Image Generation Reality Check

Pros
Creates convincing images that never existed
Built-in editing tools for refinement
Can generate multiple variations quickly
No need for photography equipment or models
Cons
Close inspection reveals artificial elements
Struggles with hands and fine details
Text generation in images is poor
May not follow instructions precisely
AI Image Limitations

Look for telltale signs like missing fingers, incorrect number of teeth, floating objects, and weird background elements that don't make logical sense.

Prompt Styles: Illustration vs Photography

FeatureIllustration PromptsPhotography Prompts
Key WordsRealistic, NaturalPhoto, Photograph
Technical DetailsStyle descriptionsCamera specs, lens
LightingArtistic termsPhotography terms
Result QualityArtistic stylePhoto-realistic
Recommended: Use photography terminology for realistic results

Writing Effective Photography Prompts

1

Start with 'Photo of'

Always specify you want a photograph, not just an image or realistic illustration

2

Add Camera Details

Include lens type (50mm, wide angle), aperture (F4, F22), and technical specifications

3

Specify Lighting

Use photography terms like golden hour, midday sunlight, or ceiling lights

4

Be Specific

Include exact number of subjects, colors, and detailed scene descriptions

Photography Technical Terms That Improve Results

Aperture Settings

F4 creates shallow depth of field with blurred backgrounds. F22 keeps more elements in sharp focus throughout the image.

Lens Types

50mm provides natural perspective. Wide angle (28-32mm) captures more of the scene. Avoid extreme settings like 18mm which creates fisheye distortion.

Lighting Conditions

Golden hour creates warm, soft lighting. Midday sunlight provides bright, clear illumination. Ceiling lights offer controlled indoor lighting.

Behind the Scenes Enhancement

ChatGPT automatically expands your simple prompts into detailed descriptions before sending to DALL-E. Check the prompt details by clicking the 'i' icon to see what was actually generated.

Common AI Image Issues to Watch For

0/5
Persistent AI Limitations

Some stereotypes are hard to override - for example, AI consistently adds glasses to 'geeks' even when specifically instructed not to, and removing them through editing often fails.

Iterating and Refining Your Images

1

Start Broad, Then Refine

Begin with a general prompt, then copy and modify the auto-generated detailed prompt for better control

2

Use Selection Tools

Click the brush icon to select and edit specific areas, removing or replacing unwanted elements

3

Experiment with Variations

Change one element at a time (lighting, lens, number of subjects) to see how it affects the result

4

Save Your Favorites

Keep successful images in your account for reference and to show the quality possible

Video Generation Coming Soon

ChatGPT currently only generates static images. Video generation has been announced but not yet released. Remember that video requires 30 images per second, making it exponentially more complex than single image creation.

This lesson is a preview from our AI with ChatGPT Course Online (includes software) and "MBA" Business Certificate (includes software). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Next, let's explore ChatGPT's image generation capabilities, powered by DALL-E technology. This feature represents a significant leap in AI-powered visual content creation, though it comes with specific requirements and limitations you need to understand.

Image generation requires GPT-4 or the earlier GPT-4 model—the previous GPT-3.5 lacks this capability entirely. If you exhaust your message quota, bing.com/create offers essentially the same DALL-E functionality as a fallback option. This becomes particularly valuable since the streamlined GPT-4 mini versions cannot generate images at all.

The model hierarchy matters significantly here. Mini versions sacrifice capabilities for speed and cost efficiency—they can't perform data analysis, create images, or handle complex multimodal tasks. You need the full GPT-4 implementation for image generation. When your GPT-4o messages are exhausted, you can fall back to the original GPT-4, though OpenAI's naming convention creates unnecessary confusion. The "4" model is actually the first generation, while "4o" represents the optimized second iteration—counterintuitive naming that trips up many users.

Let's demonstrate with a practical example. I'll generate an image of "puppies running in a backyard" to show you the process and potential results.

Here's something fascinating I keep saved in my account: an incredibly realistic image of a fluffy poodle with elaborate grooming, playing with a chew toy. The photorealism is remarkable—this puppy never existed. The image is entirely AI-generated, yet it captures emotional nuance and physical detail that rivals professional photography.

However, AI-generated images often contain subtle flaws. In this poodle image, there's an ambiguous brown object on the ground that could be interpreted as a toy or something less appealing. Fortunately, DALL-E includes editing capabilities. Using the brush tool, I can select problem areas and provide simple instructions like "remove the brown thing." The AI intelligently fills the space, maintaining visual coherence while eliminating unwanted elements.

Let's try something more complex: generating an image of skiers at a ski resort. This request reveals a critical aspect of how ChatGPT processes image prompts—it doesn't simply pass your words to DALL-E verbatim.

When I input "image of skiers at a ski area," ChatGPT dramatically expands this into: "A lively scene at a ski area with skiers of various ages and skill levels gliding down snowy slopes. The landscape includes tall, snow-covered pine trees, a clear blue sky, and a cozy ski lodge at the base of the mountain. Some skiers are wearing colorful winter gear. There are ski lifts in the background. The atmosphere is vibrant and energetic, capturing the joy of winter sports and a festive holiday vibe."


This automatic prompt enhancement can be helpful, but it may not align with your vision. Understanding this process allows you to craft more precise initial prompts or copy and modify the expanded version to better suit your needs.

The distinction between requesting "images" versus "photographs" proves crucial. When you use terms like "realistic" or "naturalistic," you're employing illustration vocabulary—language typically applied to paintings and drawings. Real photographs are never described as "realistic" because reality is their inherent nature. Instead, photographs are characterized by technical specifications: lens type, aperture settings, lighting conditions, and composition techniques.

Compare these approaches: requesting a "realistic dog playing with a chew toy" yields an illustration-style result. But specifying "photograph of a dog playing with a chew toy, shot with a 50mm lens at f/4, golden hour lighting" produces dramatically more photographic results. This works because AI models train on actual photographs with embedded metadata containing exactly these technical details.

Let me refine our ski scene using photographic terminology: "Photo of three skiers at a ski area, shot with a 50mm lens at f/4, during midday sunlight." The results improve significantly, though you may need several iterations to achieve your exact vision—a process that can quickly consume your image generation quota.

Even improved images contain telltale AI artifacts when examined closely. Look for inconsistencies in mechanical objects, impossible geometries, or anatomical errors. Hands remain particularly challenging—missing or extra fingers are common giveaways. While casual viewers might not notice these flaws, they become apparent under scrutiny.

Certain concepts prove stubbornly resistant to modification. For instance, when generating images of "geeks" or "smart people," the AI invariably adds glasses, regardless of instructions to the contrary. I've attempted numerous approaches—specifying contact lenses, explicitly stating "no glasses," even trying reverse psychology—but the association remains unbreakable. This reflects deep-seated training data biases that current models cannot easily overcome.

Text generation within images remains problematic across all AI image generators, not just DALL-E. Most generated text appears as illegible gibberish rather than readable content. Until this limitation is resolved, avoid incorporating textual elements in your image requests.


Video generation represents the next frontier, requiring 30 coherent images per second. While companies are making progress, ChatGPT currently offers only static image generation. OpenAI has announced video capabilities but hasn't released them as of 2026.

For optimal results, follow these professional guidelines: Replace illustration vocabulary ("realistic," "naturalistic") with photography terms ("photograph," "shot with," specific lighting conditions). Be extremely specific about your requirements—dog breed, colors, positioning, environment, and technical specifications. Describe lighting conditions precisely: golden hour creates warm, cinematic effects while ceiling lights produce even, professional illumination.

Consider this progression: "realistic dog with chew toy" produces an obvious illustration. "Photograph of a golden retriever puppy with a rope toy, shot with an 85mm lens at f/2.8, golden hour lighting, shallow depth of field" yields professional-quality photographic results. The difference lies entirely in prompt sophistication.

Art style specifications matter equally for illustrated content. Instead of generic requests, specify "oil painting in the style of the Dutch masters" or "minimalist line drawing with spot color." For photographs, experiment with different lighting scenarios—golden hour for warmth, overcast conditions for even tones, or dramatic side lighting for artistic effect.

Remember that ChatGPT serves as an image generator, not an editor. Unlike Photoshop with integrated Adobe Firefly, which can intelligently modify existing photographs, DALL-E creates entirely new images. While you can upload reference images for style or color guidance, you cannot edit existing photos directly. For comprehensive image editing, professional tools remain necessary.

The key to mastering AI image generation lies in understanding how to communicate visually through text. The more precisely you can describe your vision using appropriate technical vocabulary, the closer your results will match your intentions. This skill becomes increasingly valuable as AI visual tools continue evolving throughout 2026 and beyond.

Key Takeaways

1ChatGPT 4.0 full version is required for image generation - mini versions lack this capability
2Use bing.com/create as a free alternative when you run out of ChatGPT messages
3Photography prompts work better than illustration prompts for realistic results
4Include technical camera details like lens type, aperture, and lighting conditions in prompts
5AI images often have telltale flaws like incorrect finger counts and floating objects
6ChatGPT automatically expands simple prompts into detailed descriptions before sending to DALL-E
7Text generation within images is poor and usually appears as illegible gibberish
8Video generation is announced but not yet available in ChatGPT

RELATED ARTICLES