Skip to content

Improvements#1

Open
adrienbrault wants to merge 1 commit intoCodeRevolutionPlugins:mainfrom
adrienbrault:improvements
Open

Improvements#1
adrienbrault wants to merge 1 commit intoCodeRevolutionPlugins:mainfrom
adrienbrault:improvements

Conversation

@adrienbrault
Copy link
Copy Markdown

@adrienbrault adrienbrault commented Feb 3, 2023

You should also add the package to https://packagist.org

@adrienbrault adrienbrault marked this pull request as ready for review February 3, 2023 15:25
Copy link
Copy Markdown

@cosmastech cosmastech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks for doing it.

I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.

I'm of course not the package maintainer, just some things I saw that might be helpful.


public static function create(): self
{
$raw_chars = file_get_contents(dirname(__FILE__) . "/../data/characters.json");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the filepaths passed as parameters?

public static function create(): self
{
$raw_chars = file_get_contents(dirname(__FILE__) . "/../data/characters.json");
$byte_encoder = json_decode($raw_chars, true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be convenient to have JSON_THROW_ON_ERROR as the fourth parameter

"minimum-stability": "stable",
"require": {}
"require": {
"php": ">7.4"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to go with such an old PHP version? 7.4 is past its EOL. I would suggest just going with 8.0 here (security patches stopping in November of this year), if not 8.1 or 8.2

},
"minimum-stability": "stable",
"require": {}
"require": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should ext-mbstring also be added?

}
return -1;
}
use CodeRevolutionPlugins\GPT3Encoder\Encoder;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file would serve better as an example in a README


public function encode(string $text): array
{
$bpe_tokens = array();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$bpe_tokens = array();
$bpe_tokens = [];

preg_match_all("#'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+#u", $text, $matches);
if(!isset($matches[0]) || count($matches[0]) == 0)
{
error_log('Failed to match string: ' . $text);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to throw a custom RuntimeException here and let the caller determine how to handle it.

return $bpe_tokens;
}

$cache = array();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$cache = array();
$cache = [];

Comment on lines +88 to +89
$new_tokens = array();
$chars = array();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$new_tokens = array();
$chars = array();
$new_tokens = $chars = [];


private function split($str, $len = 1)
{
$arr = [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have consistent formatting.

@adrienbrault
Copy link
Copy Markdown
Author

This is awesome! Thanks for doing it.

I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.

I'm of course not the package maintainer, just some things I saw that might be helpful.

I am not planning on working on this MR anymore.

I am using https://github.com/Gioni06/GPT3Tokenizer now

@cosmastech
Copy link
Copy Markdown

This is awesome! Thanks for doing it.
I have added some suggestions to make it a bit more extensible and add in some PHP features from 8.0+.
I'm of course not the package maintainer, just some things I saw that might be helpful.

I am not planning on working on this MR anymore.

I am using https://github.com/Gioni06/GPT3Tokenizer now

Thanks for the tip! 👍

DrDub added a commit to DrDub/openai-cookbook that referenced this pull request May 24, 2023
The PHP alternative is discontinued, its author is using the project to which this patch points to, which is also available in the PHP package manager.

See: CodeRevolutionPlugins/GPT-3-Encoder-PHP#1 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants