LiteLLM for Developers: Build a Universal LLM Interface in Minutes

LiteLLM for Developers: Build a Universal LLM Interface in Minutes

Accessing over 100 LLMs using a single LiteLLM SDK

ยท

3 min read

๐Ÿ›ฃ๏ธ Introduction

Since early 2024, the landscape of Language Models has become increasingly complex with multiple providers and APIs. Developers often struggle to integrate different LLMs into their applications efficiently. LiteLLM simplifies this process by providing a unified interface for all your LLM needs.

๐Ÿ“™ What is LiteLLM?

LiteLLM is a powerful open-source toolkit that revolutionizes how developers interact with Large Language Models (LLMs). Think of it as a universal translator for LLMs โ€“ it allows your application to communicate with any supported language model using a single, consistent interface. ๐ŸŒ

Why Choose LiteLLM? ๐Ÿค”

  1. Universal Model Access ๐ŸŒ

    • Seamlessly connect to 100+ LLMs from major providers

    • Use the same code for OpenAI, Anthropic, Ollama, and other providers

    • Switch between models without rewriting your application logic

  2. Simplified Development ๐Ÿ‘จโ€๐Ÿ’ป

    • Write once, deploy anywhere approach

    • Consistent completion format across all models

    • Built-in error handling and retry mechanisms

    • Automatic model fallbacks for improved reliability

  3. Resource Management ๐Ÿ“Š

    • Monitor usage across different projects

    • Track costs for each model integration

    • Built-in load balancing capabilities

    • Optimize resource allocation automatically

  4. Enterprise-Ready Features โšก

    • Support for multiple API keys

    • Customizable retry logic

    • Detailed logging and monitoring

    • Production-grade reliability

๐Ÿงฑ Setup Guide

1. Install Ollama

curl https://ollama.ai/install.sh | sh

2. Install CUDA Drivers

sudo apt-get update && sudo apt-get install -y cuda-drivers

This installs the necessary CUDA drivers for GPU acceleration.

3. Configure Environment

echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections

4. Start Ollama Server

nohup ollama serve &

Note: This command:

  • Runs Ollama server in the background

  • Continues running after terminal closure (via nohup)

  • Remains active until explicitly terminated

5. Install LiteLLM ๐Ÿ› ๏ธ

pip install litellm

6. List Available Models

ollama list

Initial output will be empty:

NAME    ID    SIZE    MODIFIED

Browse available models at ollama.com/library.

7. Pull a Model ๐Ÿค–

For this tutorial, we'll use Llama 3.2:

ollama pull llama3.2:3b-instruct-fp16

Model Specifications:

  • 3B parameter model

  • Instruction-tuned variant

  • FP16 precision for optimized memory usage

  • Suitable for consumer GPUs

8. Configure API Keys ๐Ÿ”‘

import os
from google.colab import userdata

# Set API keys as environment variables
os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

9. Using Local Models (Ollama) ๐Ÿ’ป

from litellm import completion

response = completion(
    model="ollama/llama3.2:3b-instruct-fp16",
    messages=[{
        "content": "Write an API in ExpressJS to get the latest weather stats for a city",
        "role": "user"
    }],
    api_base="http://localhost:11434"
)
print(response['choices'][0]['message']['content'])

10. Using Cloud Models โ˜๏ธ

# OpenAI GPT-4
response = completion(
    model="gpt-4",
    messages=[{
        "content": "Write an API in ExpressJS to get the latest weather stats for a city",
        "role": "user"
    }]
)

# Anthropic Claude
response = completion(
    model="claude-3-opus-20240229",
    messages=[{
        "content": "Write an API in ExpressJS to get the latest weather stats for a city",
        "role": "user"
    }]
)

๐Ÿ“š References

  1. Complete Demo Code

  2. LiteLLM Documentation

  3. LiteLLM GitHub Repository

Did you find this article valuable?

Support Smaran Shaastra by becoming a sponsor. Any amount is appreciated!

ย