Working with LLMS for a year: the lessons I took along the way

admin2 weeks ago

0 3 3 minutes read

Working with LLMS for a year: the lessons I took along the way

How will you not disrupt your AI structure (and you can’t actually save time)

We are in artificial intelligence. Each company wants to add some kind of AI to its products. LLMs (large language models) emerge everywhere – some of them are building themselves. However, when everyone jumps to Hype, I want to share a few things I learned after working with LLMS in different projects last year.

1. Choose the right model for the right job

A big mistake I see (and I did) assumes that all LLMs are equal. They are not.

Some models have more information about certain locations or fields (twins are great in some areas where Openai models do not perform well). Some are good in reasoning but slow. Others are bad in fast but critical thinking.

Each model has its own powerful aspects. Use Openai GPT-4 for deep reasoning tasks. Use Claude or Ship for other fields depending on how they are trained. Models such as Gemini Flash have been optimized for speed, but they tend to tend to reason deeper.

After all: Do not use a model for everything. Be deliberate. Try, test and select the best for your use.

2. Do not expect LLMs to make all thoughts

I believed that you could have a request to a LLM, and he would do all the heavy lifting.

For example, I was working on a project where users chose their favorite teams and had to create a match -based travel route of the application. At first, I thought that I could send the entire match list to LLM and expect him to choose the best and create the route. It didn’t work.

It was slow, scattered and unreliable.

So I changed the approach: the system first selects the right matches, then puts only the relevant ones to LLM to create a route. This worked much better.

Lesson? Allow your application to deal with the logic. Use LLMS to produce something to decide everything. Wonders in language; At least for now, logic is not always great.

3. Give each agency a responsibility

Trying to make a single LLM doing more than one job is a recipe for confusion.

In one of my projects, I had a supervisor representative who directed messages to different special representatives based on user input. Initially, I added a lot of logic to it – taking the context, finding follow -up, deciding on continuity and so on. He finally got confused and made wrong calls.

That’s why I separated him. I carried a little logic outside (such as yarn continuity) and focused the supervisor only on guiding. After that, things became much more stable.

Lesson: Do not overload your representatives. Keep a responsibility per representative. This helps to reduce hallucinations and increases reliability.

4 Delay is inevitable – Use the flow

LLMs, which are good in reasoning, are usually slow. This is just the truth right now. Some models such as GPT-4 or Claude 2 spend time, especially with complex demands. You cannot fully eliminate the delay, but you can make it feel better for the user.

Is it a way to do this? Transfer the output as it is created. Most LLM APIs support the text flow, allowing you to start sending the rest while the rest is still processing partial answers – sentence with sentence.

In my applications, I publish everything that is ready for the customer as soon as it is available. Even if the full result lasts a little longer, it gives users a sense of progress.

Lesson: You cannot avoid delay, but you can hide it. Early flow, often flow – makes a major difference in the perceived speed even partial output.

5. Thin adjustment can save you time (and coins)

People often avoid fine tuning because they look complex or expensive. However, in some cases, it actually saves a lot.

If your requests need to include the same structure or context each time, and if the cache does not help, you will spend extra coins and time. Instead, adjust the model with this structure. After that, you don’t need to pass the same example every time – he just knows what to do.

But be careful: Do not make fine tunes to frequently varying data, such as flight times or prices. Finally you will teach the model old information. Fine tuning, logic and format work in the best way when it is fixed.

Lesson: Make a fine tuning when things are consistent – not when they change constantly. It saves long -term efforts and leads to faster, cheaper demands.

🎯 Latest Thoughts

Working with LLMS is not only related to demands and APIs. This is about knowing what to expect (and what you will not expect) from architecture, performance, clarity and most importantly.

I hope that helps someone to create the next AI feature.

admin2 weeks ago

0 3 3 minutes read