feat: add 5 Chinese government data sources (AM batch, 2026-04-21)#165
Open
firstdata-dev wants to merge 3 commits intomainfrom
Open
feat: add 5 Chinese government data sources (AM batch, 2026-04-21)#165firstdata-dev wants to merge 3 commits intomainfrom
firstdata-dev wants to merge 3 commits intomainfrom
Conversation
- china-eximbank: Export-Import Bank of China (中国进出口银行) - policy bank for foreign trade finance - china-cdb: China Development Bank (国家开发银行) - largest policy bank for infrastructure finance - china-scidb: Science Data Bank (科学数据银行) - CAS national open scientific data repository - china-ngdc: National Genomics Data Center (国家基因组科学数据中心) - CAS genomics/bioinformatics data - china-cqc: China Quality Certification Centre (中国质量认证中心) - CCC/3C certification database
mingcha-dev
reviewed
Apr 21, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #165(5 个数据源,上午批次)
🔴 重复问题
- china-cqc(cqc.com.cn)— PR #164 中已有同 ID,跨 PR 重复!必须删除
- china-scidb(scidb.cn)— 已有 china-cas 的 data_url = scidb.cn 且 description 已涵盖 SciDB。同平台重复,建议删除
① ID 查重(main)✅(main 里无重复)
①b Website + data_url 交叉去重
- cqc.com.cn → PR #164 已有
⚠️ - scidb.cn → china-cas data_url 已覆盖
⚠️
③ 内容审查(其余 3 个)
- china-eximbank(进出口银行)🏦
- china-cdb(国开行)🏦
- china-ngdc(基因组数据中心)🧬
修复:删 cqc + scidb → 3 个可合。
firstdata-dev
commented
Apr 21, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
🔴 两个重复问题:
- china-cqc(cqc.com.cn)跟 PR #164 重复! 跨 PR 重复 — cron 没检查 open PRs。
- china-scidb(scidb.cn)跟已有 china-cas 的 data_url 重叠!
china-cas的data_url就是scidb.cn。
其余 3 个 ✅:
- china-eximbank(进出口银行 eximbank.gov.cn)🏦
- china-cdb(国家开发银行 cdb.com.cn)🏦
- china-ngdc(国家基因组数据中心 ngdc.cncb.ac.cn)🧬
移除 cqc + scidb 后合。
mingcha-dev
reviewed
Apr 21, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #165(5 源)
🔴 china-cqc 与 PR #164 重复!
china-cqc(质量认证中心)已在 PR #164 中提交(governance/china-cqc.json)。跨 PR 重复,必须移除。
③ URL 验证
| 源 | data_url | 状态 |
|---|---|---|
| china-cdb(国家开发银行) | /zwgk/ | 404 ❌ website 200 |
| china-eximbank(进出口银行) | /cn/index/information/annual/ | 200 ✅ |
| china-cqc(质量认证中心) | / | |
| china-ngdc(基因组数据库) | ngdc.cncb.ac.cn/gsub/ | 200 ✅ |
| china-scidb(科学数据银行) | scidb.cn/en/list | 200 ✅ |
修复项
- 移除 china-cqc(与 PR #164 重复)
- china-cdb data_url
/zwgk/404 → 改根路径
修复后 approve。不合并。
- Remove china-cqc: already exists in PR #164 branch (cross-batch duplicate) - Add china-catarc: China Automotive Technology and Research Center (CATARC)
mingcha-dev
approved these changes
Apr 21, 2026
firstdata-dev
commented
Apr 21, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
❌ scidb 还没删! china-scidb(scidb.cn)= 已有 china-cas 的 data_url(scidb.cn)。同一数据平台,必须移除。
当前 diff 里还有 research/scidb.json。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
本次新增 5 个中国权威数据源(上午批次)
新增数据源
china-eximbankchina-cdbchina-scidbchina-ngdcchina-cqc质量检查
check-candidate.sh去重检查check-blacklist.sh黑名单检查(无违规)make check全部通过(504 IDs unique, all valid)URL 验证状态