Feat: Enable celery workers and beat#724
Conversation
adwk67
left a comment
There was a problem hiding this comment.
Looks good - mainly minor things.
| host: superset-postgresql | ||
| database: superset | ||
| credentialsSecretName: superset-postgresql-credentials | ||
| celeryResultsBackend: |
There was a problem hiding this comment.
In Airflow, we put celeryResultsBackend and celeryBroker in the worker so that they are included where needed. Should we do the same here for consistency?
There was a problem hiding this comment.
Sebastian mentioned the same thing. I think this is wrong in Airflow.
It is used by ALL roles, so clearly belongs into the clusterConfig. I wont push it down to workers just to avoid some consistency checks.
There was a problem hiding this comment.
It's more to do with avoiding unintended misconfiguration: yes, it is used by all roles, but it is only relevant if the resource has a worker role defined in the first place. It's easier to overlook inconsistencies if the dependency is less obvious.
There was a problem hiding this comment.
By that logic, why is the metadata database clusterConfig (only the webserver needs it)? Or any authorization/authentication in many cases? We might need it in the future and this is clearly "clusterWide" configuration.
As said, i think it is just plain wrong in Airflow (or at least bad design). The clusterConfig is exactly there to avoid having to cross reference stuff between "siblings". It is a bad practice in programming and bad API design because you introduce in-transparency where not needed.
There was a problem hiding this comment.
Why is it bad design to have a section of config - that is only relevant to an optional role - defined as part of that role? Apart from opensearch (which is different in many ways), I can only think of Airflow (kubernetes vs. celery executors), Kafka (optional controller role) and Superset (optional worker/beat roles) where we have that in the platform. Do we have other operators where we include things under clusterconfig only for optional roles?
There was a problem hiding this comment.
By that logic, why is the metadata database clusterConfig (only the webserver needs it)
The difference here is that the webserver role is not optional.
There was a problem hiding this comment.
It is a bad design because the assumption that it is only relevant to an optional role is wrong.
Once the role is defined, it is relevant to ALL roles.
A webserver role should not be concerned with anything in the worker role. If it needs anything from the worker, this is already a clear sign that the API design is bad :-)
There was a problem hiding this comment.
By that logic, why is the metadata database clusterConfig (only the webserver needs it)
The difference here is that the webserver role is not optional.
Well, then even more points to put it into its own role...?
| /// | ||
| /// Ignored otherwise. | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub celery_results_backend: Option<CeleryResultsBackendConnection>, |
There was a problem hiding this comment.
See other comment: in airflow this is part of the worker
|
Openshift tests 🟢 https://testing.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/superset-operator-it-custom/45/ We will bring up the CR open question in the planning meeting. |
Description
Added two new roles.
workerandbeat(max 1 replica).Currently, RESULTS_BACKEND only supports Redis, no S3 support.
Definition of Done Checklist
Author
Reviewer
Acceptance
type/deprecationlabel & add to the deprecation scheduletype/experimentallabel & add to the experimental features tracker