![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae323bea2c2f22956bb91a_Battle%20of%20AI-80.webp)
We’re all using AI in our workflows to various degrees, but not all LLMs are created equal. Choosing In this post, we evaluate four leading LLMs: ChatGPT 4.0, Claude 3.5 Sonnet, Google Gemini 2.0, and Meta Llama. Choosing the right tool depends on the job to be done, so we tested against three key marketing-focused scenarios:
- Messaging Insights from Sales Call Transcripts
- Keyword Intent Analysis and Content Recommendations
- Campaign Data Analysis for Optimization Recommendations
Using identical prompts for each scenario, we tested and scored the LLMs for usability, accuracy, depth of insight, and relevance. In order to set an even baseline for this analysis, no specialized GPTs, pre-trained models, or additional context was provided before each prompt. Here’s how they performed and which one we recommend for each task!
Note: ChatGPT 4.0 and Claude 3.5 Sonnet are the paid versions of these tools. We used the free versions of Google Gemini 2.0 and Meta Llama for this analysis.
In this LLM performance comparison, Claude 3.5 Sonnet emerged as the top performer, excelling in sales call analysis and campaign data optimization with its nuanced insights and data-driven recommendations. ChatGPT 4.0 stood out for keyword intent and content strategy, offering comprehensive intent breakdowns and creative content suggestions. Gemini showed promise in content and data tasks, occasional inaccuracies affected its reliability, and Meta Llama lagged behind in advanced capabilities, making it suitable for simpler tasks.
The Scorecard
In this LLM performance comparison, Claude 3.5 Sonnet emerged as the top performer, excelling in sales call analysis and campaign data optimization with its nuanced insights and data-driven recommendations. ChatGPT 4.0 stood out for keyword intent and content strategy, offering comprehensive intent breakdowns and creative content suggestions. Gemini showed promise in content and data tasks, occasional inaccuracies affected its reliability, and Meta Llama lagged behind in advanced capabilities, making it suitable for simpler tasks.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae326fdee03e38801972c6_AD_4nXf52_Zxsxe0uv-tZ-EBt5fPU_cb2MFt3--JIchaGohtfF1f4F80JtW3o1SW3qnZ7GiFzB_uIXscY89DxchhcaTUEY5ZT1HHMiqHUPK3tGe0YAWiOG9USLFpapNpRupg2qXyiJIi.png)
Let’s break down how each LLM performed across the three scenarios.
Sales Call Analysis for Messaging Insights
Understanding the voice of the customer is the cornerstone of effective marketing. Sales calls are a goldmine of unfiltered customer insights - revealing pain points, objections, goals, and motivators in the customer’s own words. Whether you’re refining your current messaging or brainstorming new ideas, this strategy is an invaluable tool for creating campaigns that truly connect with your audience. But extracting actionable insights from lengthy transcripts can be tedious, and very time-consuming.
This is where AI comes in. By analyzing these transcripts, LLMs can distill customer challenges and goals, while also suggesting tailored messaging ideas. This makes it easier for marketers to refine existing campaigns for better resonance or to generate fresh, personalized messaging that directly addresses customer needs. And with AI, you can do it - fast.
In this section, each LLM was evaluated on its ability to generate actionable messaging insights from a single sales call transcript. For each strategy, I’ll provide the prompt used, as well as detail on the data or file(s) uploaded for analysis.
For a deeper dive into analyzing sales calls, check out our recent blog post: Unlocking the Voice of the Customer using LLMs
Prompt
“Review this transcript and summarize how the customer describes their challenges, goals, and desired outcomes. Suggest ways these descriptions could be addressed in future marketing messaging for the audience/company/brand type.”
Data Uploaded
A text-based sales call transcript was used. For best results, you’ll want to do some basic maintenance on the transcript to identify each speaker before uploading to an LLM for analysis.
What We Evaluated
- Ability to extract meaningful insights (pain points, motivators, and goals).
- Quality and relevance of suggested marketing messaging.
- Speed to deliver actionable results.
Performance
Claude 3.5: 9/10
Scored highest here for its nuanced and concise insights. It used specific quotes from the transcript to validate its findings, adding credibility and context. The messaging suggestions were audience-specific and highly actionable, organized intuitively by customer segment.
ChatGPT 4.0: 8/10
Generated rich insights and offered creative marketing message examples across multiple channels (ad copy, email, and video scripts). However, it required a follow-up prompt to generate these examples (“Can you provide some ad copy and content messaging examples for multiple marketing channels?”)
Gemini: 4/10
Hallucinated details about the customer, including basics like their industry (the company was not a bakery, but an ad agency). After being called out for this error in a follow-up prompt, it produced detailed and creative outputs, but lacked consistency.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae326f4df649055de51a6e_AD_4nXcIgDuHsRW8FK9FfYVPos4S9YhSWnIrxLAEVkSPOx3Ov6dnL1YaABOb9g9viXRwBG-GO9T7viyusuwIx8KeL2CkCtbgYQjrrGkgF32u8sed2B8SKFUiwwtihkNsshofLeN89p6V.png)
Llama: 7/10
Delivered accurate summaries but lacked depth in its messaging recommendations, which were broad and general. It also required additional input before generating content suggestions.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae326eb2a1b6c4560982d5_AD_4nXdlk0OV9Z5Jc9iWGNtKm2nUH4rWz2tqpTqXxnFD1jGpOjg4D70P75_26qySJp0DrbkeQxlVyOn20dT-nM9_F5HpIc9g_9FNapkK3qlnni3pF62Vkm-XzOg7JuWPUtNE8eIzbRC-.png)
Best for Sales Call Analysis: Claude 3.5 Sonnet
Claude excelled at delivering accurate, audience-specific recommendations without requiring follow-ups. It’s ideal for marketers looking for quick and actionable messaging insights.
Keyword Intent & Content Recommendations
Understanding the search intent behind a specific keyword is critical for creating content that meets audience needs at each stage of the customer journey. Analyzing search intent helps marketers determine whether a user is seeking information, comparing solutions, or ready to purchase. This insight guides the development of targeted content - such as blog posts, product comparisons, or call-to-action pages. In this section, each LLM’s ability to interpret keyword intent and recommend content to engage customers at their journey stage was evaluated.
Prompt
“Analyze the search intent behind the keyword {marketing automation} and provide suggestions for the type of content that would best satisfy the user’s needs.”
Data Uploaded
None
What We Evaluated
- Depth of intent analysis (Informational, Navigational, Transactional).
- Quality and variety of content recommendations.
- Relevance of suggested formats (e.g., blogs, landing pages, case studies).
Performance
Claude 3.5: 8/10
Clear and concise in breaking down intent into categories with percentage weightings (e.g., 70% informational). Its content recommendations, while accurate, were a bit light compared to ChatGPT and Gemini.
ChatGPT 4.0: 9/10
Nailed this section with a robust breakdown of search intent across all stages (informational, navigational, transactional). It provided detailed content suggestions for each stage and included practical examples like blog titles, case study ideas, and landing page CTAs.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae326e4df649055de51a57_AD_4nXd9o52oSTZjJibRp-ZCY1N2cG5qImuKOCU4AQFMZ0jL0c9vAZM_LaKHx0kKjzizG_4kRHA2YCXLny0uQWkDVtlOm4Y5-QHdCmAfykVU7TRz26jF0MnrduSY_xsYUb215RAj3cQ7gA.png)
Gemini: 9/10
Delivered a comprehensive intent analysis, leaning heavily on the "commercial investigation" stage, and gave strong content suggestions tailored to mid-funnel users. Its content format recommendations were creative and actionable.
Llama: 7/10
Strong on content suggestions, offering eight actionable ideas, but its intent analysis felt generic and lacked depth compared to the other models.
Best for Keyword Intent & Content Recommendations: ChatGPT 4.0
ChatGPT provided the most comprehensive and actionable response, making it an ideal choice for Keyword-Intent analysis and even SEO-driven content planning
Campaign Data Analysis
Analyzing campaign data is a critical yet time-consuming task for marketers. It requires identifying patterns, pinpointing areas for improvement, and making actionable recommendations to optimize performance. This is where LLMs can step in as a “junior analyst”, processing large datasets and surfacing key insights on your behalf. By leveraging AI for daily, weekly, and monthly reporting analysis, marketers save time and gain valuable recommendations for budget allocation, creative testing, and audience targeting. In this section, each LLM’s ability to interpret marketing data and provide detailed optimization suggestions was evaluated.
Prompt
“Imagine you are a digital marketing analyst. Review this campaign data and provide a reporting summary that includes optimization recommendations and causality.”
Data Uploaded
The data uploaded included a month of campaign-level performance metrics across Paid Search, Paid Social, and Programmatic Display. The Columns - in order - included Date, Publisher, Campaign Name, Spend, Impressions, Clicks, CTR (Click-Through Rate), Conversions, Avg CPC (Cost-per-Click), and Avg CPM (Cost-per-thousand Impressions).
What Was Evaluated
- Ability to access and interpret raw data accurately.
- Quality of optimization recommendations (e.g., budget reallocation, targeting improvements).
- Identification of causality (e.g., linking campaign changes to performance trends).
Performance
Claude 3.5: 9/10
Dominated this task with clear, actionable recommendations and an intuitive understanding of the data. Could accept both raw CSV or cloud-hosted data. It showed its analysis work, excelled in budget reallocation suggestions, creative testing ideas, and causality analysis. Its responses felt tailored to marketers, with recommendations grounded in real-world scenarios.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/67ae326ffd5722c02c543690_AD_4nXcRj4vXy_26fcQSr6VepG6Ew3yLYq7_mFSLJVmJLBXIyHm8vAJRBSvfClh7ppgbvMqHn9Te_tdKJCMolXG_FWEX8gFKO5vamj0yOekI1pjyv9umTTeX2eT2RdCGHzRvnp56b_fxmQ.png)
ChatGPT 4.0: 7/10
Easy to upload a CSV or cloud-hosted file link. ChatGPT cleaned and summarized the data quickly - showing its work - but required follow-up prompts to deliver actionable insights. Its optimization recommendations were broad and less specific compared to Claude.
Gemini: 7/10
Unable to upload a CSV file for analysis, but data was successfully accessed via a Google Cloud link. Analyzed the data effectively and provided actionable recommendations. It excelled in causality analysis, identifying patterns and linking fluctuations to likely external factors. However, it incorrectly assumed the data lacked channel identifiers, which had to be resolved with a follow-up guidance prompt.
Llama: 5/10
Unable to upload a CSV file, but successfully accessed a Google Cloud link. Took the longest to process the data and required multiple clarifications. Its recommendations were solid but lacked depth and causality analysis, making it less effective for advanced campaign optimization.
Best for Campaign Data Analysis: Claude 3.5 Sonnet
Claude’s ability to interpret complex data and provide actionable, campaign-specific recommendations set it apart for this use case. It analyzed the entire data-set, showed its work, and delivered quality results in under 3 minutes.
Final Recommendations
For Most Digital Marketers
If you're a marketer looking for a single LLM to handle diverse tasks, Claude 3.5 Sonnet offers the most consistent performance across use cases.
For SEO & Content Marketers
Choose ChatGPT 4.0 for in-depth keyword analysis and content strategy planning. Its creative outputs make it a top choice for content creation.
For Advanced Data Analysts
Use Claude 3.5 for its superior data interpretation and optimization insights.
For Budget-Conscious Marketers
Google Gemini 2.0 is a decent option for straightforward tasks, though it currently lags in advanced capabilities. If you are looking to use an LLM without making a financial commitment, the free version(s) of ChatGPT and Claude may give you better results.
As LLM technology evolves, the gap between models is likely to shrink. For now, understanding your use case and selecting the right model can make all the difference in your marketing outcomes.
If you’re looking for additional AI solutions tailored to marketing, consider Marin’s very own AI tool, Advisor, designed to streamline campaign management, uncover actionable insights, and drive better outcomes based on your data and custom workflow.
![](https://cdn.prod.website-files.com/60ffbc3a211dee2c1a74db99/64bfd718707d2a50a6b515b0_gferris.avif)