Skip to content

Conversation

@Fredi-raspall
Copy link
Contributor

@Fredi-raspall Fredi-raspall commented Jan 30, 2026

This goes on top of #1245
Summary:

  • simplify code for failover, which was excessively permissive so as it could be merged without fabric implementing some failover values.
  • make the validation stricter
  • Some internal types remain optional, nevertheless.
  • They could be made non-optional but we should simplify tests first.
  • Change the criteria to prefer gateways within a group.

@Fredi-raspall Fredi-raspall requested a review from a team as a code owner January 30, 2026 19:14
@Fredi-raspall Fredi-raspall requested review from daniel-noland and removed request for a team January 30, 2026 19:14
@Fredi-raspall
Copy link
Contributor Author

Fredi-raspall commented Jan 31, 2026

The CI tests fail in this pipeline because I made it mandatory to know the BGP communities to advertise a route. Before, I made dataplane more forgiving and let it advertise anyway routes, while the fabric part was in the making. However, for a correct behavior of the failover strategy, no gateway should advertise routes without them IMO.

No community seems to be set according to the table dumped by dataplane.

  ━━━━━━━ Community mappings ━━━━━━━
     rank community       

And, then the config is rejected:

2026-01-30T19:44:20.904031Z ERROR mgmt dataplane_mgmt::processor::k8s_client: 175: Failed to apply the config for generation 4: Failure applying config: Could not assign BGP community to VPC peering vpc-02--vpc-03

This is affected by githedgehog/gateway#305

@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/cleanup-and-validation branch from 5c4ea1f to 8196637 Compare January 31, 2026 17:50
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/failover-revisit branch from 45d5c74 to ebf1a2e Compare January 31, 2026 17:51
Base automatically changed from pr/fredi/cleanup-and-validation to main February 2, 2026 17:55
@Frostman
Copy link
Member

Frostman commented Feb 2, 2026

We do now, you'll have to wait for the gateway bump in the fabricator

Forbid peerings that are not mapped to a gateway group.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
When analyzing the priorities of a gateway for a VPC peering,
if a gateway is part of the group responsible for that peering,
fail if no community is available for its position in the
ranked list of gateways to service the peering. This can only
happen if the number of members in a group exceeds the numer of
communities available.

Also, simplify the logic, avoid unnecessary conversions between
integer types and improve the tests.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Let the internal config have a community table and gateway
group, and copy them from the external config.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Set the BGP communities of prefixes advertised in the context
of a VPC peering from the internal config gateway group table
and BGP community table, instead of the Peering object.
Also, do not advertise any prefix if we can't determine the
corresponding community.

This change provides a better separation between what
constitutes input configuration (external config) and its
processing (internal config).

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
We no longer store communities in the Peering object, as
we compute them from the community and gateway tables
stored in the internal configuration.

The validation will still be made on the External config,
which will be significantly simplified by not storing the
communities in the Peering.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
A VPC peering has to be mapped to a gateway group to be valid.
In the initial implementation, internally, a peeering kept
an optional reference to a group. The validation now requires a
peering to have it. So, fix the tests that failed because
of this stricter validatio by adding a constructor that
sets the reference to a "default" group.

We do not add this in the implementation of Peering::default(),
as we do not want to default a value that is not provided by
the configuration in this case, and we may make it non-optional
in the future, once the tests are cleaned up.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Within a group, we prefer (order as first) gateways with highest
priority value. In case of ties, we prefer gateways in alphabe-
tical order. We do this by sorting the members of each group and
defining Ord for members. The impl Ord compares also the ip
address of members if their name is Equal. This will never happen
within a gateway group as we require all members to have a
distinct name.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/failover-revisit branch from bdce861 to 56793a4 Compare February 2, 2026 22:32
@Fredi-raspall Fredi-raspall requested review from qmonnet and removed request for daniel-noland February 3, 2026 15:14
@Fredi-raspall Fredi-raspall reopened this Feb 3, 2026
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK

@Fredi-raspall Fredi-raspall added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit 6893dee Feb 3, 2026
26 of 32 checks passed
@Fredi-raspall Fredi-raspall deleted the pr/fredi/failover-revisit branch February 3, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants