Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Output of Label Column when applying ONNX model is not as expected #460

@antoniovs1029

Description

@antoniovs1029

When creating ONNX models for classifiers, using NimbusML, and then applying them either with OnnxRunner (aka OnnxTransformer from ML.NET) or directly using Onnx runtime (aka ORT) python's API, then we get unexpected values in the Label column (i.e. the column that was used as Label for the classifier).

The behavior is somewhat different if the input DataFrame's Label column is category, object (string) or float (as I show in my repro below, but I guess similar problems arise for different types). There are two main issues:
Issue 1. When running with ORT, the output Label column from the ONNX model, is 'keys' and not 'values'... i.e. we get integers starting from 0, instead of whatever original values there where in Label. This happens regardless of the input Label column type.
Issue 2. When running with OnnxRunner, the Label column has weird values. If the input Label column was object (string), then, for all rows, the value in that column is "4294967295"... if the input was category or float, then the value is "0".

Repro

NOTE: the data_frame_tool module used is the one currently in the aml branch (link)

import os
import tempfile
from data_frame_tool import DataFrameTool as DFT
from nimbusml.datasets import get_dataset
from nimbusml.linear_model import FastLinearClassifier
from nimbusml.preprocessing import OnnxRunner
from nimbusml.preprocessing import FromKey, ToKey
from nimbusml import Pipeline

def get_tmp_file(suffix=None):
    fd, file_name = tempfile.mkstemp(suffix=suffix)
    fl = os.fdopen(fd, 'w')
    fl.close()
    return file_name

# Change the label column to see different behaviors:
LABEL_COLUMN_NAME = "Species" # Type: object (string)
#LABEL_COLUMN_NAME = "Setosa" # Type: float
#LABEL_COLUMN_NAME = "Label" # Type: category

iris_df = get_dataset("iris").as_df()
print("\n\nORIGINAL DATASET - using", LABEL_COLUMN_NAME, " as Label column")
print(iris_df)
print(iris_df.dtypes)

predictor = FastLinearClassifier(feature=["Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width"], label=LABEL_COLUMN_NAME)
predictor.fit(iris_df)

print("\n\nML.NET RESULT")
original_result = predictor.predict(iris_df) # Notice this outputs only "PredictedLabel" so the user can't get the Label column after applying the predictor. QUESTION: Is there a way for the user to get that column after the predictor?
print(predictor.model_)
print(original_result)
print(original_result.dtypes)

# onnxpath = get_tmp_file()
onnxpath = get_tmp_file()
print()
print("Onnx model path:", onnxpath)
predictor.export_to_onnx(onnxpath, 'com.microsoft.ml')

print("\n\nORT RESULT")
df_tool = DFT(onnxpath)
result_ort = df_tool.execute(iris_df, [])
print(result_ort)
print("\nColumn:", LABEL_COLUMN_NAME, " - ORT RESULT") # Issue 1: It prints the "keys", instead of values for the Label column
print(result_ort[LABEL_COLUMN_NAME + ".output"])

print("\n\nONNX RUNNER RESULT")
onnxrunner = OnnxRunner(model_file=onnxpath)
result_onnx = onnxrunner.fit_transform(iris_df)
print(result_onnx)
print(result_onnx.dtypes)
print("\nColumn:", LABEL_COLUMN_NAME, " - ONNX RUNNER RESULT") # Issue 2: It prints "4294967295" when label column is "Species" (string), "0" when label column is "Label" (category) and "Setosa" (float), for every row
print(result_onnx[LABEL_COLUMN_NAME])

Output (for LABEL_COLUMN_NAME="Species")

ORIGINAL DATASET - using Species  as Label column
     Sepal_Length  Sepal_Width  Petal_Length  Petal_Width Label    Species  Setosa
0             5.1          3.5           1.4          0.2     0     setosa     1.0
1             4.9          3.0           1.4          0.2     0     setosa     1.0
2             4.7          3.2           1.3          0.2     0     setosa     1.0
3             4.6          3.1           1.5          0.2     0     setosa     1.0
4             5.0          3.6           1.4          0.2     0     setosa     1.0
..            ...          ...           ...          ...   ...        ...     ...
145           6.7          3.0           5.2          2.3     2  virginica     0.0
146           6.3          2.5           5.0          1.9     2  virginica     0.0
147           6.5          3.0           5.2          2.0     2  virginica     0.0
148           6.2          3.4           5.4          2.3     2  virginica     0.0
149           5.9          3.0           5.1          1.8     2  virginica     0.0

[150 rows x 7 columns]
Sepal_Length     float64
Sepal_Width      float64
Petal_Length     float64
Petal_Width      float64
Label           category
Species           object
Setosa           float64
dtype: object
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
Using 6 threads to train.
Automatically choosing a check frequency of 6.
Auto-tuning parameters: maxIterations = 9996.
Auto-tuning parameters: L2 = 2.667734E-05.
Auto-tuning parameters: L1Threshold (L1/L2) = 0.
Using best model from iteration 948.
Not training a calibrator because it is not needed.
Elapsed time: 00:00:00.9079426


ML.NET RESULT
C:\Users\anvelazq\AppData\Local\Temp\tmp7b539j8w.model.bin
0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: PredictedLabel, Length: 150, dtype: object
object

Onnx model path: C:\Users\anvelazq\Desktop\is23repros\model-labelissue.onnx


ORT RESULT
     Sepal_Length.output  Sepal_Width.output  Petal_Length.output  ...  Score.output.0 Score.output.1  Score.output.2
0                    5.1                 3.5                  1.4  ...    9.979612e-01       0.002039    7.896303e-15
1                    4.9                 3.0                  1.4  ...    9.935742e-01       0.006426    1.243418e-13
2                    4.7                 3.2                  1.3  ...    9.969639e-01       0.003036    2.946764e-14
3                    4.6                 3.1                  1.5  ...    9.950643e-01       0.004936    1.473649e-13
4                    5.0                 3.6                  1.4  ...    9.984953e-01       0.001505    4.957718e-15
..                   ...                 ...                  ...  ...             ...            ...             ...
145                  6.7                 3.0                  5.2  ...    6.576003e-09       0.002802    9.971976e-01
146                  6.3                 2.5                  5.0  ...    3.143095e-07       0.031589    9.684103e-01
147                  6.5                 3.0                  5.2  ...    4.240965e-07       0.031176    9.688237e-01
148                  6.2                 3.4                  5.4  ...    1.435240e-08       0.002293    9.977069e-01
149                  5.9                 3.0                  5.1  ...    7.885213e-06       0.121532    8.784599e-01

[150 rows x 19 columns]

Column: Species  - ORT RESULT
0      1
1      1
2      1
3      1
4      1
      ..
145    3
146    3
147    3
148    3
149    3
Name: Species.output, Length: 150, dtype: uint32


ONNX RUNNER RESULT
     Sepal_Length  Sepal_Width  Petal_Length  ...  Score.setosa Score.versicolor  Score.virginica
0             5.1          3.5           1.4  ...  9.979612e-01         0.002039     7.896303e-15
1             4.9          3.0           1.4  ...  9.935742e-01         0.006426     1.243418e-13
2             4.7          3.2           1.3  ...  9.969639e-01         0.003036     2.946764e-14
3             4.6          3.1           1.5  ...  9.950643e-01         0.004936     1.473649e-13
4             5.0          3.6           1.4  ...  9.984953e-01         0.001505     4.957718e-15
..            ...          ...           ...  ...           ...              ...              ...
145           6.7          3.0           5.2  ...  6.576003e-09         0.002802     9.971976e-01
146           6.3          2.5           5.0  ...  3.143095e-07         0.031589     9.684103e-01
147           6.5          3.0           5.2  ...  4.240965e-07         0.031176     9.688237e-01
148           6.2          3.4           5.4  ...  1.435240e-08         0.002293     9.977069e-01
149           5.9          3.0           5.1  ...  7.885213e-06         0.121532     8.784599e-01

[150 rows x 19 columns]
Sepal_Length                        float64
Sepal_Width                         float64
Petal_Length                        float64
Petal_Width                         float64
Label                                object
Species                              uint32
Setosa                              float64
311418708f7545c0a2fd7f3db667a0cd    float32
5ab7f7a1e38348f4b66ed5e3a9c2416e    float32
776cb47f18c24a52a72e93f759808599    float32
17f5b772493b497fa3dfca2abffc6049    float32
Features.Sepal_Length               float32
Features.Sepal_Width                float32
Features.Petal_Length               float32
Features.Petal_Width                float32
PredictedLabel                       object
Score.setosa                        float32
Score.versicolor                    float32
Score.virginica                     float32
dtype: object

Column: Species  - ONNX RUNNER RESULT
0      4294967295
1      4294967295
2      4294967295
3      4294967295
4      4294967295
          ...
145    4294967295
146    4294967295
147    4294967295
148    4294967295
149    4294967295
Name: Species, Length: 150, dtype: uint32

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions