Conversation
| component.train(cui=cui, entity=mut_entity, doc=mut_doc, | ||
| negative=negative, names=names) |
There was a problem hiding this comment.
Does this mean we're still doing unsupervised on a per entity basis? I can't think of a case where in an unsupervised manner you would need the entity.
There was a problem hiding this comment.
This is supervised training still. This is within the add_and_train_concept method.
|
A few queries but I think it looks good. I might've missed these within the commits: If you have two trainable components. is it possible to turn of training for one of them when running training methods? Do the dataset aware components serve that purpose? And one more above^^^ |
The description already had 2 examples for this :) The dataset aware implementation can serve that purpose. Because they replace the specific component with another one (which isn't trainable, but that's kind of irrelevant since it's a different component) for the duration of the context manager. But I think what makes it unclear is that in the example I've given it a dataset, but realistically, you could provide an empty dataset for it, i.e like this: # supervised
with dataset_aware_component(cat, CoreComponentType.ner, {"projects" : []}):
trainer.train_supervised_raw(DATASET, nepochs=1)
# unsupervised
with dataset_aware_component(cat, CoreComponentType.ner, {"projects" : []}):
trainer.train_unsupervisedsupervised(["list", "of", "texts'], nepochs=1) |
This PR does an overhaul to the training setup of MedCAT:
TrainableComponentprotocol to also include atrain_unsupervisedmethodTrainableComponentprotocol to be trained supervisedExample code snippets: