Skip to content

Conversation

@ajaits
Copy link
Contributor

@ajaits ajaits commented Nov 3, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ajaits, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the statvar_importer tool by introducing new functionalities for Property-Value (PV) map comparison and testing. It enables the output of detailed PVs per input cell, provides a dedicated script for evaluating PV map generation quality, and establishes a comprehensive set of utilities for testing and validating statvar imports. These changes aim to improve the reliability, debuggability, and overall quality assurance process for data imports.

Highlights

  • New Flag for PV Output: A new flag, output_data_pvs, has been introduced in config_flags.py to allow specifying a CSV file where Property-Value (PV) maps per input cell can be saved.
  • PV Map Collection and Output: The stat_var_processor.py now includes logic to collect PVs per input cell and write them to the file specified by the new output_data_pvs flag, enhancing traceability and debugging capabilities for PV map generation.
  • New PV Map Generation Baseline Script: A new script, pvmap_generation_baseline.py, has been added to evaluate the quality of PV map generation, providing a dedicated tool for assessing the accuracy and completeness of generated PV maps.
  • Comprehensive Import Testing Utilities: New utility files, statvar_import_test.sh, statvar_imports_test_utils.py, and statvar_imports_utils.py, have been added to provide robust testing infrastructure for statvar imports, including functions for setting up Python environments, retrieving import-specific files, generating pvmaps using various methods (statvar processor, agentic importer), running tests on sample data, and comparing pvmaps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a suite of utilities and test scripts for property-value (PV) map comparison and generation. The changes include adding a new flag to output PVs per data cell and implementing the logic in the stat_var_processor. Several new files are added for testing and evaluation, including a baseline script, a shell test script, and Python utility modules. My review identifies several critical issues in the new scripts, such as runtime errors due to missing imports and incorrect syntax, which need to be addressed. I also found a likely-to-be-obsolete file (statvar_imports_test_utils.py) that is riddled with bugs and should probably be removed. Additionally, I've provided suggestions to improve code quality, such as replacing os.system with the more secure subprocess module and fixing minor bugs and typos.

Comment on lines 55 to 59
process_http_server.run_http_server(http_port=_FLAGS.http_port,
script=__file__,
module=config_flags)
process_import(import_name=_FLAGS.statvar_import_name,
pvmap_generation_method=_FLAGS.pvmap_generation_method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The functions process_http_server.run_http_server and process_import are called here, but they are not imported or defined in this file. Similarly, the config_flags module is used but not imported. This will cause NameError exceptions at runtime. Please add the necessary imports to resolve these references.

sys.path.append(_SCRIPT_DIR)
sys.path.append(os.path.dirname(_SCRIPT_DIR))
sys.path.append(os.path.dirname(os.path.dirname(_SCRIPT_DIR)))
sys.path.append(_DATA_DIR, 'util')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

sys.path.append() takes only one argument, but two were given. This will raise a TypeError at runtime. It should be sys.path.append(os.path.join(_DATA_DIR, 'util')).

Suggested change
sys.path.append(_DATA_DIR, 'util')
sys.path.append(os.path.join(_DATA_DIR, 'util'))

with open(file, 'r') as fp:
manifest_json = json.loads(fp.read())
return manifest_json
except e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except block has invalid syntax. It should be except Exception as e: to catch exceptions and assign them to the variable e for logging.

Suggested change
except e:
except Exception as e:

-gm) shift; GEMINI_MODEL="$1";;
-gen_pvmap_s*) GENERATE_PVMAP_STATVAR="1";;
-q) QUIET="1";;
-m) shift; MANIFEST="$1";;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The -m option is defined twice in the parse_options function (lines 61 and 67). The second definition on line 67 is unreachable and uses a different variable MANIFEST which is not used elsewhere. This appears to be a copy-paste error and should be removed to avoid confusion.

# Get the manifest.josn file for an import
def get_import_manifest(import_name: str, root_dir: str = _DATA_DIR) -> str:
"""Returns the folder containing the manifest.json for the import name."""
files = glob.glob(os.path.join(root_dir, '**'), recurive=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a typo in the glob.glob call. The parameter recurive should be recursive.

Suggested change
files = glob.glob(os.path.join(root_dir, '**'), recurive=True)
files = glob.glob(os.path.join(root_dir, '**'), recursive=True)

Comment on lines 16 to 24
import os
import sys
import glob
import json
from typing import Union

from absl import app
from absl import flags
from absl import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains several unused imports: Union, app, flags, file_util, and ConfigMap. These should be removed to keep the code clean. Given the number of issues in this file and the presence of the more complete statvar_imports_utils.py, consider whether this file should be removed entirely.

import json
import re
import shlex
from typing import Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Union type from the typing module is imported but not used in this file. It can be removed to clean up the code.

cwd = os.getcwd()
if run_dir:
os.chdir(run_dir)
os.system(generate_pvmap_cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using os.system() can be a security risk if any part of the command comes from untrusted input. It is also less flexible than the subprocess module. Consider using subprocess.run() which provides more control over command execution, error handling, and capturing output. You will also need to import subprocess at the top of the file.

Suggested change
os.system(generate_pvmap_cmd)
subprocess.run(generate_pvmap_cmd, shell=True, check=True)

cwd = os.getcwd()
os.chdir(import_dir)
logging.info(f'Generating test output with command: {statvar_test_cmd}')
os.system(statvar_test_cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using os.system() can be a security risk and is less flexible than the subprocess module. It's recommended to use subprocess.run() for better error handling and control over the executed process. You will also need to import subprocess at the top of the file if you haven't already.

Suggested change
os.system(statvar_test_cmd)
subprocess.run(statvar_test_cmd, shell=True, check=True)

pvmap1 = load_pvmap_file(pvmap_file1)
pvmap2 = load_pvmap_file(pvmap_file2)
counters.add_counter(f'nodes-input1-{pvmap_file1}', len(pvmap1))
counters.add_counter(f'nodes-inpit2-{pvmap_file2}', len(pvmap2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the counter name: nodes-inpit2-... should be nodes-input2-....

Suggested change
counters.add_counter(f'nodes-inpit2-{pvmap_file2}', len(pvmap2))
counters.add_counter(f'nodes-input2-{pvmap_file2}', len(pvmap2))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant