A scalable Zero shot intent classifier

Published in

80–20-Hacking AI

3 min readApr 17, 2024

Flan-T5-Base as a fine tuned general classifier

Please continue if you’re interested in a very focused and applicable research into small language models. By my definition, small language models have < 2B parameters, very small (ultra?) language models have < 500M parameters. Various Flan T5 models fit that definition.

Content:
1. TL;DR
2. Small language models
3. Language models as engineering hiring profiles
4. Main questions (and answers)
5. Next

TL;DR

You can use a fine-tuned flan-t5-base model for classification tasks. I’ve trained it on synthetic data. The base model itself is not very performant for any complex classification task (more ~5–8 classes), that’s why I had to fine tune it. Fine tuned versions supports 20 zero shot classes in one prompt (maybe even more).

Hugging face model link: https://huggingface.co/Serj/intent-classifier

You are welcome to join me in this research. There are a lot of tasks already planned ahead and contributions are more than welcome.

GitHub link: https://github.com/SerjSmor/intent_classification

The synthetic dataset is already uploaded to GitHub, but I intend to upload it to hugging face as well.

I have run a comparison of the base model, vs Bart vs fine tuned model (leaderboard is below), openai GPT models are next to be tested.

Small language models

Exploring small language models is a hobby of mine. Determining the optimal number of parameters for a specific task is a fascinating challenge. Why consider small language models at all?

Recent advancements have primarily occurred in data handling, rather than architectural design, demonstrating the effectiveness of small language models for various applications. From an industry perspective, smaller models are more cost-effective, quicker, and thus more scalable.

Here are some notable example:
Phi (Microsoft) models are trained on mainly synthetic data (Phi2 has non synthetic data as well).
Flan-T5 models broke all NLP records when released late 2022 (October), and still are a very solid option for NLP tasks.
On the other hand, chatbot use cases and open QA tasks I would tend to use the bigger models out there.

Language models as hiring profiles

In a way researching small language models is akin to hiring the persona with the right amount of skills and experience. A task like intent classification, how many parameters does the model needs in order to have good performance in a zero shot environment?

Should we hire someone who is a polymath, multilingual, worked in all possible industries, read most available books and codes in all languages (GPT4)? Probably this persona is overqualified for this job.
Instead we can hire someone very passionate that can get you results fast, this is sometimes more valuable then anything (Flan T5 family).

Also, from a cost-efficiency performance, it will definitely cost you, but how much will you benefit from it, is unknown, because it’s hard to estimate the baseline of this task. If it’s 1% improvement, does it justify 1000% cost increase?

Main Questions

Why zero shot?
Mainly because it allows you to reduce complexity of training models per use case / client. The elegancy is compelling — one model instead of many.

Why small language models?
Less parameters — less computation, easily fine tuned.

Why Flan-T5-Base?
1. A proven model family
2. Flan-T5-Base Downloads last month: 751,054 (April 2024)
3. From 50M parameters to 11B parameters

Why intent classification?
Multiclass classification is our bread and butter, within this context, intent classification tries to classify a customer’s response which can be very dependent on the company use cases, making it non generic, thus makes generative solutions appealing.

How to use intent-classifier?

from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = "serj/intent-classifier"
device = "cuda"
model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
device = device
input_text = '''
  Company name: Company, is doing: products and subscription 
  Customer: Hey, after recent changes, I want to cancel subscription, please help.
  END MESSAGE
  Choose one topic that matches customer's issue.
  OPTIONS: 
  refund 
  cancel subscription 
  damaged item 
  return_item
  Class name: "
  
'''
input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True).to(device)

# Generate the output
output = model.generate(input_ids)

# Decode the output tokens
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded_output)

In the next post I’ll describe how did I train the model, mainly focusing on synthetic data creation and going through next algorithmic improvements that are worth investing.

If you’re interested in this process, stay tuned.

GitHub link: https://github.com/SerjSmor/intent_classification
Hugging face model link: https://huggingface.co/Serj/intent-classifier