How to get the best out of LLMs
Effective prompting for free and early stage LLM models
N.B. this article does not apply to best practice for using reasoning models such as OpenAI o1 and o3, which have step-by-step logic built into them.
OpenAI, the developer of ChatGPT, has the goal of quadrupling users to over 1 billion in 2025. To do this it must make its models easy to use and responsive to simple English prompts. Yet as of the beginning of 2025, the reasoning models that fulfil this aim are only starting to emerge and are too expensive for the mass market to run. The average consumer will therefore still need to know how to get the most out of standard Large Language Models (LLMs), including ChatGPT and Google’s Gemini.
How LLMs Work
LLMs predict the next word in a sequence based on previous words. The output depends on the volume and nature of the data that was used to train each model. OpenAI’s models often perform best on standard benchmarks because they are based on the most data.
The nature of AI means we do not know precisely how it is arriving at a particular response. Consequently, new research into how to improve results is published all the time. Commentators often assume that natural language generation favours fluent writers, but prompting is its own form of communication that takes into account how LLMs work. Users will improve with practice but this won’t make them great novelists.
LLMs have also been compared to an intelligent but naïve recruit. As such, training and instruction determines their progress as much as innate ability. Companies will increasingly face the choice of training humans or AI agents as new hires, and as the CEO of NVIDIA said on the Bg2 podcast,
“I’m hoping that Nvidia someday will be a 50,000 employee company with a 100 million, you know, AI assistants, in every single group,” – Jenson Huang.
AI agents do not replace humans but allow us unprecedented scale. The ability to interact with them, will therefore be an important skill in the near future.
Adding Context and Data
Picture a conversation with a friend or partner. Let’s face it, we don’t always listen intently, especially if distracted by something on our phone. Alternatively, the friend may refer to a topic that we don’t remember discussing. In the first case we lack context and in the second we lack data.
LLMs perform zero and few shot learning, which means they understand concepts that are not familiar from the training data. Just as we might try and piece together what our friend is talking about, the model infers an answer by recognising similar situations. When we know our friends well, we become good at filling in the gaps in their speech. In much the same way, models gain an understanding from a vast history of training data.
You can improve your performance with LLMs by providing examples of the output you want. One example may be a previous report you need the LLM to mirror. By adding data you are progressing from zero to few shot learning. You can also provide context, for instance by including limiting conditions in a query. These include the subject matter for a report, the time period of an analysis, or details of how to format output.
If you are not sure of any answer you receive then challenge it. LLMs recognise when they make mistakes and can correct them. The initial error may be caused by unclear input from users, or by the way that LLMs work.
Statistical Patterns and Mixing Up Words
Machine learning involves pattern matching, as a result of which models do not understand data and context the way humans do. Therefore, even with context, hallucinations still occur, which is when models make mistakes or fabricate responses.
Pattern recognition can be powerful, as when machines spot irregularities in x-rays and diagnose disease before doctors. But as it is based on the most likely outcome, it will on occasion be wrong.
There are techniques to overcome these inaccuracies. One is to ask the model to respond in steps, or enter a chain of prompts that breaks down requests. To see this at work prompt an LLM with:
2x = 36-9y and 6y = x+3. What are x and y?
The response will show you the working that derives the answer x = 9 and y = 2. If it does not, then ask for the steps to be detailed. If we breakdown reasoning requests in a similar fashion to algebra puzzles, then a model is more likely to produce a correct answer. You can also add “Let’s think step-by-step” to your queries of models lacking advanced reasoning capability. The advanced models automatically think in steps.
LLMs translate words into a string of numbers. When these numbers have a high correlation then words become interchangeable, such as car and automobile. Yet names such as Paul and Paula are also correlated and may be swapped for one another in error.
When source material contains similar noun-phrasing, the model may make mistakes. If, for instance, data describes the population, area and age of both London and Paris, the model may confuse the two cities. The solution is to provide it with fully formatted facts.
We are now moving beyond prompting into Retrieval Augmented Generation (RAG), in which a model is provided with additional information. This can be rewritten to make it easier for models to understand, by simplifying sentences, making each one a true statement on its own and thereby separating potentially conflicting information. Fully formatted documents are a terrible read for humans, because they repeat nouns and add detail about them in every sentence.
Converting RAG text into fully formatted form most likely requires outside assistance. In the meantime, avoid using pronouns such as it, he and she in queries and repeat the noun. LLMs interpret each sentence separately and need constant reminding what they are working on. The best way to do this is by thinking through a problem and seeking answers for each step of the way. In addition, any guidance you provide for your expected output should improve your satisfaction with responses.
Thought Exercises:
- Have I asked one question for which I expect one answer?
- Have I included examples of the output I expect to receive?
- Have I explained my terms and defined any ambiguous words?