The Inktelligence - February 3, 2025

How to run the DeepSeek R1 model locally

What a week it has been in the Clash of the LLMs.

DeepSeek R1 has been making waves thanks to its cost-effective development, impressive performance in specialized tasks, and open-source accessibility. It is reported to have been developed on only a budget of $5.58 million compared to much larger costs for the American LLMs.

Secondly, the model was developed using less advanced Nvidia H800 GPUs (around 2,000 of them) instead of the industry-standard H100s. Despite the lower cost, DeepSeek R1 holds its ground in benchmarks, especially in math, coding, and logical reasoning tasks.

Thirdly, the API costs are also way lower than those from OpenAI’s o1 model.

When comparing the API costs of DeepSeek R1 and OpenAI's o1, there's a notable difference in pricing structures:

API Pricing for DeepSeek R1:

  • Input Tokens:

    • Cache Hit: $0.14 per million tokens

    • Cache Miss: $0.55 per million tokens

  • Output Tokens: $2.19 per million tokens

A note on caching:

Cache Hit: The information (or response to a query) was already stored and readily available, making it cheaper and faster to access.

Cache Miss: The information wasn’t available in the cache, so the system had to perform a more expensive operation (like re-computing or retrieving data from the main model).

API Pricing for OpenAI o1:

  • Input Tokens: $15 per million tokens

  • Output Tokens: $60 per million tokens

I mean, in terms of cost, it’s an order of magnitude difference. $2.19 per million output tokens for DeepSeek-R1 vs $60 per million output tokens for OpenAI o1.

Lastly, DeepSeek R1 is released under the MIT open-source license which makes it accessible for developers and organizations seeking to innovate without the restrictions of proprietary models. So in this issue of The Inktelligence, I’ll run through a simple tutorial that can get you set up to run DeepSeek-r1 or Qwen2.5 locally on your computer.

Concerns Surrounding DeepSeek R1:

Not everything is rosy and there are some legitimate concerns especially around privacy. As of this writing, there is no way for you to opt out of allowing the company to use your chats as further training data.

Additionally, try asking about politically sensitive topics in China and you’re likely not going to get a valid answer.

Qwen-2.5 Max from Alibaba

Alibaba's Qwen-2.5-Max is the company's latest and most advanced AI model, designed to compete with top-tier models like OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and DeepSeek's V3.

Released on January 29, 2025, Qwen-2.5-Max has been trained on an extensive dataset of over 20 trillion tokens, encompassing a wide range of topics and languages. This extensive training enables the model to generate nuanced and contextually rich language outputs. I have tested it for language translation and it does capture nuances between different languages quite well.

The model employs a Mixture-of-Experts (MoE) architecture, which activates only the most relevant parts of the network for a given task. This selective activation allows Qwen-2.5-Max to handle large-scale processing efficiently, reducing computational costs by approximately 30% compared to traditional dense models.

For those interested in exploring Qwen-2.5-Max, Alibaba offers access through the Qwen Chat platform, a web-based interface that allows direct interaction with the model. For developers, Qwen-2.5-Max can be integrated into their applications using the Alibaba Cloud Model Studio API, which is compatible with OpenAI's API format, facilitating straightforward integration for those familiar with OpenAI models.

The API costs are slightly higher than DeepSeek-R1, with Qwen-2.5-Max coming in at $1.60 per million input tokens and $6.40 per million output tokens.

As of now, Alibaba has not specified whether Qwen-2.5-Max will be available as an open-source model (though older Qwen models are available as open-source).

Tutorial: How to run open source LLMs locally

I know privacy is a major concern when using AI, and it should be. The last thing you want is to send sensitive data to a cloud-based model and worry about where it’s being stored or who might have access to it. That’s why I’m sharing this step-by-step guide with you.

You’ll learn how to run powerful models like DeepSeek R1 directly on your own computer, keeping everything local and secure. No internet required, no third-party servers—just you and the AI working together.

I’ve tried my best to make sure the instructions are simple, even if you don’t consider yourself a tech expert. By the end of this, you’ll have a fully functional local AI model ready to explore, all while keeping your data exactly where it belongs: with you.

Alibaba’s latest model, Qwen-2.5-Max, is making waves for its advanced reasoning and multilingual capabilities. While the Max version isn’t available as open source yet, you can still access the regular Qwen-2.5 model through Ollama. This means you get the best of both worlds—cutting-edge performance with the privacy benefits of running the model locally. In this guide, I’ll show you how to get started with DeepSeek R1, but keep in mind that adding models like Qwen-2.5 is just as simple.

Step 1: Install Ollama

  1. Visit Ollama's Official Website:
    Go to ollama.com/download and download the installer for your operating system (Windows/Mac/Linux).

  2. Install Ollama:
    Follow the installation instructions. Once completed, open your Terminal (Mac/Linux) or Command Prompt (Windows) and type:

    ollama
    

    You should see the list of commands, confirming the installation.

Step 2: Pull the DeepSeek-R1 Model

  1. Open the terminal (or command prompt) and type:

    ollama pull deepseek-r1:7b
    
  2. This will download the deepseek-r1:7b model for local use. It may take some time depending on your internet speed.

Note that there are different model sizes that you can download depending on the storage and RAM that you have. Here, for example, you’ll see options for 1.5b, 7b, 8b, 14b, 32b, 70b and 671b. The numbers refer to the number of parameters in the model. The ones with the smaller number will require less storage and memory to run. The 671b is the full uncompressed model, the same as what you’ll get on the cloud. But at 404GB, it’s unlikely you’ll have the storage or the RAM to be able to run this locally.

Step 3: Install Open-WebUI Using pip

  1. Ensure you have Python 3.8 or later installed. If not, download Python and install it.

  2. Install Open-WebUI:
    Open your terminal or command prompt and run:

    pip install open-webui
    

Step 4: Launch Open-WebUI

  1. Start the Web UI: Run the following command:

    open-webui serve
    
  2. The server will start, and by default, you can access it by opening your browser and navigating to:

    http://localhost:8080
    

Step 5: Configure Open-WebUI to Use DeepSeek-R1

  1. In the Web UI, navigate to the model selection section.

  2. Select or configure it to point to the deepseek-r1 model you downloaded via Ollama.

Here’s an example of what the Open WebUI should look like in your browser.

If I click on the drop down menu, I can see all the models that I have already pulled using Ollama.

This setup will let you run local LLM models with ease through the Open-WebUI. Let me know if you encounter any issues! 😊

That’s it for this time. If you enjoy this newsletter, please share it with a friend.

Till next time,

Hock