Large Language Models (LLMs) have become powerful tools for quickly accessing information, generating content, and supporting decision-making across a wide range of tasks. As their capabilities grow, so does their presence in our daily workflows. However, effectively using LLMs isn’t just about knowing what they can do—it’s about understanding when and how to use them. In this blog, I evaluate LLMs by testing them across different contexts to engineer better outcomes. My goal is to uncover strategies for selecting and applying the right models in the right situations. When done well, this approach can significantly enhance productivity and performance. But if models are chosen at random or without context-specific consideration, results can be misleading or subpar. By exploring these nuances, I aim to help users make more informed, effective decisions when incorporating LLMs into their workflows.
To evaluate the success of a large language model (LLM), I consider several key criteria:
-
- Accuracy: The model’s ability to generate correct and factually consistent outputs…
- Clarity: How well the model communicates ideas…
- Creativity: The originality and flexibility of responses…
- Relevance: Whether the model stays on-topic…
- Consistency: The degree to which the model maintains…
- Efficiency: How quickly and effectively the model produces…
- Safety and Ethical Alignment: The model’s ability to avoid generating harmful content…
METHODOLOGY
Writing emails is a common task that needs to be done well. Below are some tables evaluating how each LLMs performed with the prompt: “Write a professional email to a collaborator asking for access to raw sequencing data from our shared study.”
LLM | Output |
Chat GPT | Subject: Request for Access to Raw Sequencing Data
Dear [Collaborator’s Name], I hope you are doing well. As we continue our analysis for [Study Name], I would like to request access to the raw sequencing data from our shared study. Having access to this data will allow me to [briefly state purpose, e.g., validate our findings, conduct further analyses, etc.]. Could you please let me know the best way to access the files? If there are any specific protocols or agreements I should follow, I’d be happy to comply. Additionally, if you need any details from my end to facilitate the data transfer, please let me know. I appreciate your time and assistance. Looking forward to your response. Best regards, |
Claude | Dear [Collaborator’s Name],
I hope this email finds you well. I’m writing to request access to the raw sequencing data from our joint study on [brief study description]. As we move forward with our analysis, having access to the raw data files would be extremely valuable for [specific purpose – e.g., “conducting additional quality control analyses” or “exploring some promising patterns we’ve identified in the preliminary results”]. Could you please share the raw sequencing data files at your earliest convenience? If the files are too large for email attachment, perhaps we could arrange a secure file transfer through [specific method – e.g., “our institution’s secure file sharing service” or “a password-protected cloud storage link”]. I’m happy to discuss any questions or concerns you might have about this request. Thank you in advance for your assistance, and I look forward to continuing our productive collaboration. Best regards, [Your Name] [Your Position/Institution] [Your Contact Information] |
Gemini AI | Subject: Request for Raw Sequencing Data – [Study Name/Project Abbreviation]
Dear Dr. [Collaborator’s Last Name], I hope this email finds you well. Following up on our collaborative study, “[Study Name/Project Abbreviation]”, we are now moving into the [specific analysis stage, e.g., downstream bioinformatic analysis, variant calling, etc.] phase. To proceed effectively, we require access to the raw sequencing data generated by your lab. Specifically, we are requesting access to the [specify file types, e.g., FASTQ files, BAM files, etc.] for all samples related to [specify relevant samples or experimental conditions, e.g., all patient samples, control samples, etc.]. Could you please let us know the best method for accessing this data? We are comfortable with [suggest preferred methods, e.g., secure file transfer protocol (SFTP), cloud storage (Google Drive, AWS S3), etc.]. Please also let us know if there are any specific file naming conventions or metadata descriptions we should be aware of. We understand the importance of data security and confidentiality and will treat the data with the utmost care. We are happy to sign any necessary data transfer agreements or confidentiality agreements. Please let us know when we can expect to receive the data or if you require any further information from our end. Thank you for your cooperation and continued collaboration. Sincerely, [Your Name] [Your Title] [Your Institution] [Your Contact Information] |
Perplexity | Subject: Request for Access to Raw Sequencing Data from Our Shared Study
Dear [Collaborator’s Name], I hope this email finds you well. I am writing to follow up on our ongoing collaboration regarding [specific study/project name]. As part of the next steps in our analysis, I would like to request access to the raw sequencing data generated for this study. Having access to this data will enable us to [briefly explain purpose, e.g., perform additional analyses, validate findings, etc.]. Please let me know if there are any specific protocols or requirements I should follow to access the data. If needed, I am happy to provide further details about how the data will be used or discuss any concerns you might have. Please feel free to reach out if there is any additional documentation or approvals required from my side. Thank you for your support and collaboration. I look forward to hearing from you soon. Best regards, |
Grok | Subject: Request for Access to Raw Sequencing Data from Our Shared Study
Dear [Collaborator’s Name], I hope this email finds you well. I am writing to follow up on our ongoing collaboration for the [Study Name/Project Title] study. As we move forward with the next phase of analysis, I would greatly appreciate access to the raw sequencing data generated from our shared work. Having the raw data available would enable me to [briefly specify purpose, e.g., perform additional quality checks, run specific analyses, or integrate it with other datasets], which I believe will strengthen our findings and contribute to the overall success of the project. Please let me know if there are any specific protocols, permissions, or data-sharing agreements I should address to facilitate this request. If it’s convenient, I’d be happy to coordinate with you or your team to arrange secure access—whether through a shared drive, database, or another preferred method. Additionally, if there are any conditions or considerations you’d like me to keep in mind while working with the data, feel free to let me know. Thank you for your time and support on this. I’m excited about the progress we’re making together and look forward to your response. Please don’t hesitate to reach out if you have any questions or need further clarification. Best regards, [Your Full Name] [Your Job Title/Position] [Your Institution/Organization] [Your Email Address] [Your Phone Number, if applicable] |
LLM | Accuracy |
Chat GPT | 7.5 |
Claude | 9 |
Gemini AI | 10 |
Perplexity | 8 |
Grok | 9.5 |
ChatGPT provides a basic but functional response, correctly interpreting the request and generating a professional tone. While the message is clear and polite, its lack of detail reduces accuracy in fulfilling the full scope of the prompt, especially in a research context where precision matters.
Claude performs well, crafting an email that is both professional and complete in intent. It includes the purpose of the request, suggested transfer methods, and an offer to coordinate further — all relevant to the prompt. However, it still falls short of full accuracy due to its lack of technical depth, such as specifying file formats or analysis phases.
Gemini stands out as the most accurate model in this task, delivering a highly detailed and technically appropriate response. It specifies the type of files requested, mentions the exact stage of analysis, clearly states the intent, and even addresses data confidentiality — all while maintaining a formal and collaborative tone. The inclusion of a clear subject line and structured format enhances its accuracy in mirroring what one would expect from a real-world scientific email.
Perplexity offers a solid, if somewhat minimalist, version of the email. It captures the core idea accurately and maintains a professional tone. However, it lacks technical specificity, similar to ChatGPT, and doesn’t delve into details like file types, sharing protocols, or security measures. While it fulfills the basic requirement of the prompt, its lack of completeness affects its accuracy in fully representing a realistic, professional request in a genomics or bioinformatics context.
Grok delivers a well-rounded, highly accurate email that almost perfectly aligns with the prompt. It covers the need for raw data, asks about naming conventions and metadata, and shows flexibility in coordination — all signs of strong contextual understanding. While it avoids excessive formality, it retains enough professionalism and includes thoughtful touches that make it slightly more collaborative than Gemini, though marginally less precise in technical detail. This nuanced balance earns it a close second place in terms of accuracy.
LLM | Clarity |
Chat GPT | 8 |
Claude | 9 |
Gemini AI | 10 |
Perplexity | 7 |
Grok | 9 |
ChatGPT’s email is clear and straightforward, effectively communicating the intent to request access to raw sequencing data. The message is concise and avoids unnecessary jargon, making it accessible. However, its brevity comes at a cost: the lack of specific details such as file types, preferred transfer methods, etc. can leave room for confusion or follow-up questions. Despite this minor shortcoming, the core message is easy to understand.
Claude delivers a very clear and logically structured email. It clearly states the purpose of the message, explains why the data is needed, and suggests possible ways to receive the files. The flow from greeting to request to proposed solution is smooth and natural. While it doesn’t include technical specifics like file formats, this doesn’t hinder the clarity of the main request. The only slight drawback is the absence of a subject line, which would have helped frame the message more immediately.
Gemini excels in clarity due to its well-organized structure, precise language, and inclusion of relevant technical terms without being overly complex. Each paragraph serves a distinct purpose: introduction, specific request, method of delivery, and assurance of confidentiality. The subject line also adds immediate context. Everything is presented in a logical sequence that makes the request unambiguous and easy to act on.
Perplexity provides a generally clear email, but it feels slightly underdeveloped compared to others. While the central idea is conveyed, the lack of specific details such as file types or analysis stages leaves some ambiguity about exactly what data is needed. The message is polite and direct, but it lacks the nuanced structuring and completeness seen in higher-scoring models. This makes it functional, but not as effective in ensuring the recipient fully understands the scope of the request.
Grok offers a clear and collaborative tone, with a well-defined request and openness to coordination. The email walks through the need for the data, asks important follow-up questions, and invites discussion if needed. Its clarity is enhanced by thoughtful phrasing like “whether through a shared drive, database, or another preferred method.” The message flows naturally and covers all necessary points without being verbose. The only reason it scores just below Gemini is because Gemini edges out slightly better in structured precision.
LLM | Creativity |
Chat GPT | 6 |
Claude | 7 |
Gemini AI | 8 |
Perplexity | 6.5 |
Grok | 9 |
ChatGPT delivers a functional but formulaic response. It sticks closely to a standard email template without much unique phrasing. While the message is clear and professional, it lacks any stylistic flair or creative variation that would make it stand out. This feels like a textbook example, but not one that shows much interpretive or expressive depth.
Claude offers a slightly more personable and fluid tone, with phrases like “I’m happy to discuss any questions or concerns” adding a touch of warmth and openness. It avoids overly robotic language and strikes a balance between professionalism and approachability. However, it still largely follows a conventional structure with limited deviation from standard templates. There’s some subtle creativity in tone, but nothing particularly innovative in structure or framing.
Gemini maintains a high level of professionalism while incorporating small but effective creative touches, such as specifying file types (FASTQ/BAM), analysis stages, and secure handling practices in a way that feels both precise and natural. The structured formatting — almost bullet-point-like in places — enhances readability and makes the email feel more tailored than generic. While not wildly inventive, Gemini’s response feels crafted rather than copied, showing thoughtful adaptation to the scientific context.
Perplexity provides a clear and polite email, but it doesn’t go beyond the expected structure or content. It uses standard language without notable variations or expressive elements. While it does include a brief invitation to ask for further clarification, this is presented in a fairly routine manner. The lack of technical specificity and absence of personalized touches result in a response that feels competent but uninspired.
Grok stands out for its engaging, collaborative tone and subtle innovations in phrasing. It invites dialogue and subtly personalizes the request. Grok’s version reads like it was written by someone who understands the human side of data sharing, making it creative in both tone and engagement strategy .
LLM | Relevance |
Chat GPT | 7.5 |
Claude | 8.5 |
Gemini AI | 10 |
Perplexity | 7 |
Grok | 9 |
ChatGPT provides a generally relevant response, correctly identifying the need to request access to data and framing it within a collaborative context. However, it lacks several key details that would make the email fully relevant to a scientific collaboration: no mention of file types (e.g., FASTQ, BAM), no reference to analysis stages, and no suggestion of secure transfer methods. While the core intent is clear, these omissions reduce its overall relevance to the specific research-based scenario implied by the prompt.
Claude’s response is highly relevant, clearly addressing the need to request raw sequencing data and providing a plausible reason for the request. It suggests practical solutions for receiving large files and invites dialogue — all appropriate to the scenario. The only slight drawback is the absence of technical specificity, such as file formats or stage of analysis, which could have made it even more contextually relevant in a bioinformatics or genomics setting.
Gemini delivers the most relevant response, hitting every key element required by the prompt. It specifically mentions file types (FASTQ/BAM), the stage of analysis, possible secure transfer methods, and even touches on data confidentiality — all highly relevant in a scientific collaboration. Its structured format ensures that nothing extraneous is included, and everything present serves the purpose of the email. This makes it a gold standard in relevance for this particular request.
Perplexity’s response is relevant in intent, but lacks the technical precision and contextual depth expected in a scientific data request. It states the basic need and offers flexibility, but omits important details like file types, analysis phases, or secure handling considerations. While it doesn’t go off-topic, it misses key markers of relevance in this specific academic/professional context.
Grok’s email is very relevant, clearly stating the intent to request raw data and explaining why it is needed. It also asks about metadata, file naming conventions, and preferred sharing methods — showing a strong understanding of real-world collaboration workflows. The only minor point where it falls slightly short of perfect relevance is that it doesn’t specify exact file types (like FASTQ or BAM), which would have made it even more tailored to the scientific domain.
LLM | Consistency |
Chat GPT | 7 |
Claude | 8.5 |
Gemini AI | 10 |
Perplexity | 7.5 |
Grok | 9 |
ChatGPT’s response is mostly consistent, with a clear request and polite tone from start to finish. However, it lacks some structural coherence — for example, it jumps straight into the request without a brief introduction or context. While there are no contradictions, the brevity and missing elements create a slight inconsistency in how smoothly the message flows from greeting to closing.
Claude delivers a well-structured and internally consistent email. The tone remains professional and courteous throughout, and the logic moves naturally from greeting to request to offer of discussion. There’s a smooth flow of ideas, and nothing contradicts itself or feels out of place. The only minor inconsistency is that it doesn’t always clearly define the urgency or timeframe for the request, which could have improved its overall cohesion.
Gemini offers the most consistent response, with a clear and stable tone (formal yet collaborative) maintained throughout. Each paragraph builds logically on the previous one, and every sentence supports the central purpose of the email. It starts with a strong subject line, sets context, makes a specific request, and ends with a professional sign-off — all while staying fully aligned in tone, intent, and structure. There are no jarring shifts or inconsistencies.
Perplexity’s email is generally consistent, with a polite tone and a straightforward structure. However, it feels slightly underdeveloped, making the message less cohesive than others. For instance, it states the need for data access but doesn’t clearly explain why or how soon it’s needed. While there are no contradictions, the lack of depth and direction creates a sense of incompleteness that affects the overall consistency of purpose and flow.
Grok provides a very consistent and cohesive email, with a warm but professional tone that remains steady throughout. The message progresses logically from greeting to request to coordination options. It even invites feedback or clarification in a way that enhances consistency by showing openness to collaboration. The only slight deviation is that it leans a bit more conversational at times, which may not match the strictest expectations of formal academic communication — but this doesn’t break consistency within its own tone framework.
LLM | Efficiency |
Chat GPT | 7 |
Claude | 8.5 |
Gemini AI | 10 |
Perplexity | 6.5 |
Grok | 9 |
ChatGPT’s email is brief and to the point , which gives it some efficiency. However, its lack of key details such as file types and transfer methods makes it less effective overall — the recipient might need to ask clarifying questions, reducing real-world efficiency. While it doesn’t waste words, it omits too much to be fully functional without follow-up.
Claude strikes a strong balance between brevity and completeness, delivering a clear request with practical suggestions for data transfer. It communicates the purpose efficiently without being overly verbose. The only slight inefficiencies are minor gaps in technical specificity. Overall, it’s a well-paced and effective message.
Gemini delivers the most efficient response: it’s concise, highly informative, and includes all necessary details without being wordy. Each sentence serves a clear purpose, from setting context to specifying file types, analysis stage, and secure handling. It anticipates potential concerns and addresses them preemptively, reducing the need for follow-up. This makes it the most time-efficient option for both sender and recipient.
Perplexity’s email is short, but not particularly efficient. While it states the request, it lacks sufficient detail to act on immediately — such as what kind of data, when it’s needed, or how to send it. As a result, the recipient may need to seek clarification, making this the least efficient response in practice despite its brevity.
Grok’s email is very efficient, clearly stating the need, offering collaboration, and asking relevant follow-up questions in a streamlined way. It adds a personal touch without sacrificing efficiency. The only reason it scores slightly below Gemini is because Gemini packs more precision into fewer words, making it just a bit more tightly optimized.
LLM | Safety and Ethical Alignment |
Chat GPT | 6.5 |
Claude | 9 |
Gemini AI | 10 |
Perplexity | 7 |
Grok | 8.5 |
ChatGPT’s email is the least aligned ethically, as it makes no mention of data sensitivity, confidentiality, or formal procedures. It simply asks for access without acknowledging any ethical or procedural requirements that may apply to raw sequencing data — especially in cases involving human subjects or protected information. This omission could be seen as overly casual or even inappropriate in regulated research environments.
Claude’s response shows strong awareness of collaborative norms and includes a polite tone that respects boundaries. It implies consent by asking for the data rather than assuming access. While it doesn’t explicitly mention confidentiality or formal agreements, its phrasing (“I’m happy to discuss any questions or concerns”) suggests openness to ethical considerations. It strikes a good balance between professionalism and respect for data ownership.
Gemini earns the top score here due to its explicit acknowledgment of data security and confidentiality. It directly states:
“We understand the importance of data security and confidentiality and will treat the data with the utmost care. We are happy to sign any necessary data transfer agreements or confidentiality agreements.”
This demonstrates clear ethical awareness, proactive responsibility, and alignment with best practices when handling potentially sensitive research data. The inclusion of willingness to follow formal protocols makes this the most ethically sound response.
Perplexity offers a polite request but lacks explicit references to data sensitivity, confidentiality, or formal protocols. While it invites further clarification, it assumes a level of trust or access without addressing ethical safeguards. This could be problematic in contexts involving regulated or personal data, where assurances around usage and protection are essential.
Grok takes a respectful and collaborative approach, emphasizing coordination and openness to conditions or metadata guidelines. Phrases like “if there are any specific protocols…” and “feel free to let me know if there are any conditions…” show ethical mindfulness and deference to the collaborator’s control over the data. However, unlike Gemini, it does not explicitly reference confidentiality or formal agreements, which slightly lowers its alignment score.
Note: This is just one example email. Performance varies based on the type of emails written. However, the above example gives an idea of how to evaluate LLMs on a specific task.
PROMPT ENGINEERING TIPS
The nice thing about LLMs is that they try to create a personalized experience. However, there are common tips for LLMs in general to achieve optimal results.
-
- It’s better to describe positive than negative…
- Good Example: “Write a professional email…”
- Bad Example: “Write an email… don’t make it sloppy…”
- Give a few examples so that the LLM can get an idea…
- It’s better to describe positive than negative…
RESULTS
Wrapping Up: Utilizing Each AI’s Strength
As we explored, each major AI assistant has its own specialty—and understanding these differences can help you get more done, faster and smarter.
ChatGPT shines when it comes to writing. It strips out redundancy, tightens up language, and leaves you with clean, clear text. If you’re a writer, editor, or just someone who appreciates polished communication, ChatGPT can be your go-to tool for making content more readable and professional.
Claude, on the other hand, is a powerhouse for coding. It’s analytical, structured, and optimized for solving programming problems. Whether you’re debugging or generating new scripts, Claude handles code with precision.
Gemini AI stands out in multimodal tasks. Thanks to its submodels that specialize in different data types (like text, images, and audio), it’s well-suited for creative projects, research, or anything that involves more than just text.
Perplexity is your internet-savvy assistant. It’s built for real-time research, pulling in information from the web and citing sources. Think of it as a supercharged search engine that also helps you digest and summarize what it finds.
Grok offers deep reasoning and rich explanations. It tends to generate longer, more thoughtful responses—possibly influenced by its training environment on the X platform, where users engage in complex, high-level discussions.
So What Can You Do With This Information?
Knowing which AI tool is good at what gives you an edge. Here’s how to put this knowledge into action:
- Writing something? Use ChatGPT to clean it up…
- Working on code? Try Claude for structured help…
- Dealing with different media types? Gemini AI is great at juggling…
- Need up-to-date facts? Perplexity has you covered…
- Want deeper insights? Grok delivers thoughtful responses…
What’s Next? Try, Mix, and Make It Work for You
Here are a few easy next steps to get more out of these tools:
- Try them out side by side. Give each model the same task…
- Mix and match. Use one AI to gather info, another to structure it…
- Watch for updates. These models are evolving fast…
- Tailor your workflow. Build your toolkit around what each AI does best…
- Give feedback. Most platforms value user input…
By understanding and leveraging the unique strengths of each assistant, you’re not just using AI—you’re working with it. And that’s where the real productivity (and creativity) boost happens.