-
Notifications
You must be signed in to change notification settings - Fork 59
issue/194: Support Quantization Config and Quanted Model Inference #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
qinyiqun
commented
Jan 21, 2026
- 为linear类增加量化选项
- 引入nlohmann json库
- 增加quantization和global config两个大类,以支持多种advanced feature config。
csrc/engine/rank_worker.cpp
Outdated
|
|
||
| // Create model using factory (may be expensive) | ||
| model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr); | ||
| model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr, global_config_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么又有model config又有global config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model config 是原有的llama_config,global config现在只负责advanced feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用通用的json吧原有llama config替换掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| } | ||
|
|
||
| infinicore::nn::Parameter QKVParallelLinear::get_q_weight_scale() const { | ||
| return infinicore::nn::Parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议返回optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉不需要,类似于bias,bias是由一个has_bias控制,这个get_xx_scale()是在宏里使用的,而宏在代码里面我写的是跟量化方法绑定的,即使返回optional,在宏里也要解optional。
85c2485 to
9dee06a
Compare
csrc/config/global_config.hpp
Outdated
| #include <fstream> | ||
| #include <string> | ||
|
|
||
| namespace infinilm::config::global_config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该不需要额外的global_config这个空间
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
csrc/config/global_config.hpp
Outdated
| #include <string> | ||
|
|
||
| namespace infinilm::config::global_config { | ||
| struct GlobalConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接用class吧。另外建议改名为ModelConfig之类的更直观的名字
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我本来想的是可以把distributed config和kv cache config都包进来,所以叫global_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
csrc/config/quant_config.hpp
Outdated
| #include "../quantization/quantization.hpp" | ||
| #include "nlohmann/json.hpp" | ||
|
|
||
| namespace infinilm::config::quantization { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样不需要quantization这个space。这些config应该不会有重名的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| #include "nlohmann/json.hpp" | ||
|
|
||
| namespace infinilm::quantization { | ||
| class BaseQuantization { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这层封装的意义是什么,看着好像只是传了个quant scheme,但这个功能不是QuantConfig就能做吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在传config是因为逻辑太少了,为之后开发预留的类,现在有一个需求是模型级别的量化,需要在量化方法之上进行一个封装
| // ========================= QKV Quantization ================================== | ||
| #define INFINILM_QKV_LINEAR_W8A8_INIT(name, q_name, k_name, v_name, ...) \ | ||
| name##_ = std::make_shared<layers::QKVParallelLinear>(__VA_ARGS__); \ | ||
| /* 注册 Q 权重 */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释不要用中文
| std::shared_ptr<InfinilmModel> model; | ||
| if (const auto llama_config_ptr = dynamic_cast<const models::llama::LlamaConfig *>(&config)) { | ||
| const auto &llama_config = *llama_config_ptr; | ||
| //****************************NEED TO BE FIXED */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注释格式太奇怪了
| //------------------------------------------------------ | ||
| InferEngine::InferEngine( | ||
| const InfinilmModel::Config &config, | ||
| const distributed::DistConfig &distributed_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么会涉及这个接口的修改,而且还做了参数顺序调整,这种基础接口的修改属于高风险修改,确定不会有些地方炸掉?