• About
  • Disclaimer
  • Privacy Policy
  • Contact
Saturday, June 14, 2025
Cyber Defense GO
  • Login
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration
No Result
View All Result
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration
No Result
View All Result
Cyber Defense Go
No Result
View All Result
Home Data Analysis

Serving Qwen Fashions on Databricks

Md Sazzad Hossain by Md Sazzad Hossain
0
Serving Qwen Fashions on Databricks
585
SHARES
3.2k
VIEWS
Share on FacebookShare on Twitter

You might also like

“Scientific poetic license?” What do you name it when somebody is mendacity however they’re doing it in such a socially-acceptable manner that no person ever calls them on it?

How knowledge high quality eliminates friction factors within the CX

Agentic AI 103: Constructing Multi-Agent Groups


Qwen fashions, developed by Alibaba, have proven robust efficiency in each code completion and instruction duties. On this weblog, we’ll present how one can register and deploy Qwen fashions on Databricks utilizing an strategy just like that for Llama-based architectures. By following these steps, you’ll be able to benefit from Databricks’ basis mannequin (Provisioned Throughput) endpoints, which profit from low latency and excessive throughput.

Desk of Contents

  1. Motivation: Why Serve Qwen Fashions on Databricks?
  2. The Core Thought
  3. Implementation: Annotated Code Walkthrough
  4. Efficiency and Limitations
  5. Abstract and Subsequent Steps

Motivation: Why Serve Qwen Fashions on Databricks?

For a lot of enterprise workloads, Databricks is a one-stop platform to coach, register, and serve massive language fashions (LLMs). With Databricks Mosaic AI Mannequin Serving one can simply deploy fine-tuned or base fashions and make the most of them for real-time or batch inference duties.

The just lately launched Qwen 2.5 sequence of fashions present robust efficiency in code completion and instruction duties. Qwen 2.5 fashions on the time of their launch beat equally sized fashions on customary benchmarks equivalent to MMLU, ARC-C, MATH, HumanEval, and multilingual benchmarks equivalent to Multi-Examination and Multi-Understanding. Qwen 2.5 Coder fashions present related positive aspects on coding benchmarks. This will present clients with robust motivation for deploying these fashions in Databricks Mannequin Serving to energy their use instances.

Serving a Qwen mannequin on Databricks entails 4 steps:

  1. Run a pocket book to transform the Qwen mannequin recordsdata to be suitable with the Llama structure and Databricks mannequin serving
  2. Register the Qwen mannequin in Unity Catalog
  3. Deployed the registered mannequin in Databricks Basis Mannequin Serving
  4. Conduct high quality testing on the deployment, equivalent to both guide testing or working customary benchmarks straight in opposition to the endpoint

The Core Thought

Databricks basis mannequin serving gives optimized efficiency for fashions equivalent to Meta’s Llama fashions. Clients can deploy these fashions with provisioned throughput and obtain low latency and excessive throughput. Whereas the Qwen fashions’ underlying mannequin construction is similar to the Llama fashions’ construction, sure modifications are required with the intention to benefit from Databricks’ mannequin serving infrastructure. The steps beneath clarify how clients could make the required modifications.

Implementation: Annotated Code Walkthrough

Half 1) Rewrite Qwen’s weights and config to be per Llama fashions.

The steps in modify_qwen.py take a Qwen2.5 mannequin and rewrite it to be per the Llama structure that’s optimized for provisioned throughput on Databricks. Listed here are the important thing steps within the code:

  1. Load Qwen State Dict: Acquire .safetensors from the unique Qwen listing.
  2. Copy & Regulate Weights: Insert zero biases for consideration outputs the place Llama expects them.
  3. Rewrite the Config: Replace fields like "architectures", "model_type" to "llama", and take away Qwen-specific flags.
  4. Copy Tokenizer Recordsdata: Guarantee we deliver over tokenizer.json, merges.txt, and so forth.
  5. Create Ultimate Output Folder: The recordsdata within the new listing make it appear to be a typical Llama mannequin.

On the finish of this step, you will have a Llama-compatible Qwen mannequin. You could possibly load the mannequin in vLLM and it ought to deal with it as a Llama mannequin and be capable to generate code or comply with directions, relying on which mannequin you used.

Tip: You need to use huggingface_hub.snapshot_download to fetch the one of many Qwen fashions equivalent to Qwen/Qwen2.5-Coder-7B-Instruct from Hugging Face to a listing earlier than performing the conversion.

Half 2) Register and Serve Qwen on Databricks

Subsequent we’ll deal with tips on how to log and serve the “Qwen as Llama” mannequin on Databricks. That is dealt with by register_qwen.py. The steps right here be sure that the mannequin has the configuration that mannequin serving expects for a Llama mannequin. The important thing steps:

  1. Specifying the trail to the transformed mannequin from earlier.
  2. Modifying tokenizer configs (particularly eradicating chat_template and setting tokenizer_class).
  3. Adjusting config.json to replicate Llama-compatible sequence lengths.
  4. Updating the mannequin with Llama-like metadata earlier than logging.
  5. Registering the mannequin with MLflow, so it may be served on a GPU endpoint.

As soon as this pocket book is run the mannequin shall be registered in Unity Catalog, navigate to the mannequin and click on “Serve this mannequin” to arrange the endpoint. It’s best to see the choice to arrange the endpoint with provisioned enter at completely different tokens/second charges.

Testing the Endpoint

As soon as the endpoint is prepared you’ll be able to conduct some primary exams to confirm it’s working correctly. Suppose that we now have deployed the Qwen2.5-Coder-7B mannequin after performing the above conversion and registration. This mannequin is able to both finishing a bit of code or performing fill-in-the-middle. Let’s use it to finish a easy sorting operate. Beneath the “Use” dropdown click on “Question” and enter the next request:

The textual content within the response incorporates the remainder of the implementation:

For a extra quantitative strategy you may generate completions for the HumanEval duties. Then run its analysis to get the go@1 metric and examine in opposition to the revealed outcomes.

Efficiency and Limitations

  1. Guide Chat Formatting
    Since we take away Qwen’s built-in chat template, you need to manually format system/consumer/assistant messages in your shopper code. This ensures the mannequin can nonetheless interpret dialog turns correctly.
  2. Max Place Embeddings
    We set max_position_embeddings to 16000 tokens to suit inside sure Databricks constraints. If Qwen initially supported extra, you would possibly lose some most context size. Nonetheless, you’ll nonetheless acquire provisioned throughput assist.

Abstract and Subsequent Steps

Whereas Databricks doesn’t assist Qwen fashions straight on provisioned throughput mannequin serving in the present day, the above methodology permits you to register and serve these fashions efficiently by aligning them to be suitable with the Llama fashions’ structure. This workaround is especially helpful in case your group requires Qwen’s capabilities but additionally needs the comfort of Databricks mannequin serving endpoints and provisioned throughput.

Key Takeaway

  • The Qwen and Llama fashions share sufficient architectural similarities that, with a number of minor modifications (specifically, to the tokenizer config and mannequin metadata), Databricks’ mannequin serving infrastructure can readily serve the Qwen fashions utilizing provisioned throughput.

Future Issues

  • We encourage you to maintain a watch out for official Qwen assist on Databricks mannequin serving.
  • Consider efficiency overhead from forcibly limiting context measurement.
  • Should you depend on chat prompting, bear in mind to manually format your prompts on the shopper aspect.

Acknowledgments

  • hiyouga’s llamafy_qwen.py for the preliminary instance that supplied the idea for the Qwen conversion.
  • The Databricks engineering group for clarifying the interior serving constraints.
  • All of the group members who examined and refined the strategy.
Tags: DatabricksModelsQwenServing
Previous Post

Alibabas nya Qwen2.5 Omni erbjuder röstchatt och videosamtal

Next Post

Florida Regulation About Landlord and Tenant Obligations on Mildew Points

Md Sazzad Hossain

Md Sazzad Hossain

Related Posts

“Scientific poetic license?”  What do you name it when somebody is mendacity however they’re doing it in such a socially-acceptable manner that no person ever calls them on it?
Data Analysis

“Scientific poetic license?” What do you name it when somebody is mendacity however they’re doing it in such a socially-acceptable manner that no person ever calls them on it?

by Md Sazzad Hossain
June 14, 2025
How knowledge high quality eliminates friction factors within the CX
Data Analysis

How knowledge high quality eliminates friction factors within the CX

by Md Sazzad Hossain
June 13, 2025
Agentic AI 103: Constructing Multi-Agent Groups
Data Analysis

Agentic AI 103: Constructing Multi-Agent Groups

by Md Sazzad Hossain
June 12, 2025
Monitoring Information With out Turning into Massive Brother
Data Analysis

Monitoring Information With out Turning into Massive Brother

by Md Sazzad Hossain
June 12, 2025
Information Bytes 20250609: AI Defying Human Management, Huawei’s 5nm Chips, WSTS Semiconductor Forecast
Data Analysis

Information Bytes 20250609: AI Defying Human Management, Huawei’s 5nm Chips, WSTS Semiconductor Forecast

by Md Sazzad Hossain
June 11, 2025
Next Post
5 Elements Affecting Mould Remediation Prices

Florida Regulation About Landlord and Tenant Obligations on Mildew Points

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

7 Tendencies Shaping Clever Doc Processing in 2025

7 Tendencies Shaping Clever Doc Processing in 2025

January 20, 2025
Microsoft Intune Deploy Information and System Onboarding Course of – 51 Safety

Microsoft Intune Deploy Information and System Onboarding Course of – 51 Safety

January 27, 2025

Categories

  • Artificial Intelligence
  • Computer Networking
  • Cyber Security
  • Data Analysis
  • Disaster Restoration
  • Machine Learning

CyberDefenseGo

Welcome to CyberDefenseGo. We are a passionate team of technology enthusiasts, cybersecurity experts, and AI innovators dedicated to delivering high-quality, insightful content that helps individuals and organizations stay ahead of the ever-evolving digital landscape.

Recent

Powering All Ethernet AI Networking

Powering All Ethernet AI Networking

June 14, 2025
6 New ChatGPT Tasks Options You Have to Know

6 New ChatGPT Tasks Options You Have to Know

June 14, 2025

Search

No Result
View All Result

© 2025 CyberDefenseGo - All Rights Reserved

No Result
View All Result
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration

© 2025 CyberDefenseGo - All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In