Skip to content

Conversation

@qinyiqun
Copy link
Contributor

  1. 为linear类增加量化选项
  2. 引入nlohmann json库
  3. 增加quantization和global config两个大类,以支持多种advanced feature config。

@qinyiqun qinyiqun requested a review from a team January 21, 2026 08:50
@qinyiqun qinyiqun linked an issue Jan 21, 2026 that may be closed by this pull request

// Create model using factory (may be expensive)
model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr);
model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr, global_config_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么又有model config又有global config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model config 是原有的llama_config,global config现在只负责advanced feature

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用通用的json吧原有llama config替换掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

infinicore::nn::Parameter QKVParallelLinear::get_q_weight_scale() const {
return infinicore::nn::Parameter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议返回optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉不需要,类似于bias,bias是由一个has_bias控制,这个get_xx_scale()是在宏里使用的,而宏在代码里面我写的是跟量化方法绑定的,即使返回optional,在宏里也要解optional。

@qinyiqun qinyiqun force-pushed the dev branch 3 times, most recently from 85c2485 to 9dee06a Compare January 26, 2026 06:53
#include <fstream>
#include <string>

namespace infinilm::config::global_config {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该不需要额外的global_config这个空间

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#include <string>

namespace infinilm::config::global_config {
struct GlobalConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用class吧。另外建议改名为ModelConfig之类的更直观的名字

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我本来想的是可以把distributed config和kv cache config都包进来,所以叫global_config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#include "../quantization/quantization.hpp"
#include "nlohmann/json.hpp"

namespace infinilm::config::quantization {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样不需要quantization这个space。这些config应该不会有重名的情况

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#include "nlohmann/json.hpp"

namespace infinilm::quantization {
class BaseQuantization {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这层封装的意义是什么,看着好像只是传了个quant scheme,但这个功能不是QuantConfig就能做吗

Copy link
Contributor Author

@qinyiqun qinyiqun Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在传config是因为逻辑太少了,为之后开发预留的类,现在有一个需求是模型级别的量化,需要在量化方法之上进行一个封装

// ========================= QKV Quantization ==================================
#define INFINILM_QKV_LINEAR_W8A8_INIT(name, q_name, k_name, v_name, ...) \
name##_ = std::make_shared<layers::QKVParallelLinear>(__VA_ARGS__); \
/* 注册 Q 权重 */ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释不要用中文

std::shared_ptr<InfinilmModel> model;
if (const auto llama_config_ptr = dynamic_cast<const models::llama::LlamaConfig *>(&config)) {
const auto &llama_config = *llama_config_ptr;
//****************************NEED TO BE FIXED */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个注释格式太奇怪了

//------------------------------------------------------
InferEngine::InferEngine(
const InfinilmModel::Config &config,
const distributed::DistConfig &distributed_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么会涉及这个接口的修改,而且还做了参数顺序调整,这种基础接口的修改属于高风险修改,确定不会有些地方炸掉?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 量化功能添加

3 participants