Skip to content

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 132693: invalid start byte #28

@leosh64

Description

@leosh64

Getting this error during generation of embeddings:

Traceback (most recent call last):
  File "/home/user/.local/bin/sem", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.10/site-packages/semantic_code_search/cli.py", line 84, in main
    query_func(args)
  File "/home/user/.local/lib/python3.10/site-packages/semantic_code_search/cli.py", line 38, in query_func
    do_query(args, model)
  File "/home/user/.local/lib/python3.10/site-packages/semantic_code_search/query.py", line 51, in do_query
    do_embed(args, model)
  File "/home/user/.local/lib/python3.10/site-packages/semantic_code_search/embed.py", line 82, in do_embed
    functions = _get_repo_functions(
  File "/home/user/.local/lib/python3.10/site-packages/semantic_code_search/embed.py", line 71, in _get_repo_functions
    file_content = f.read()
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 132693: invalid start byte

after it already successfully processed quite a few files:

 27%|████████████████████████▎                                                                 | 35036/130013 [00:33<01:30, 1047.23it/s]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions