Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why kube-proxy add external-lb's address to node local iptables rule? #66607

Closed
BSWANG opened this issue Jul 25, 2018 · 58 comments · Fixed by #92312 · May be fixed by #97681
Closed

Why kube-proxy add external-lb's address to node local iptables rule? #66607

BSWANG opened this issue Jul 25, 2018 · 58 comments · Fixed by #92312 · May be fixed by #97681
Assignees
Labels
kind/bug kind/cleanup kind/feature needs-triage sig/network

Comments

@BSWANG
Copy link
Contributor

@BSWANG BSWANG commented Jul 25, 2018

/kind friction

What happened:
I have a LoadBalancer type service A of address 1.1.1.1. The external loadbalancer of service A is a TLS decoder, it will convert https requests to http hostport and endpoint. But since the kube-proxy add the external-lb's address to local iptables rule. Requests of https//1.1.1.1 will hijack to local http endpoints. Then https request failed.

What you expected to happen:
Kube-proxy don't add external-lb's address to local iptables. And the request will go through external-lb.

Environment:

  • Kubernetes version (use kubectl version):
    1.10.4
  • Cloud provider or hardware configuration:
    Alibaba Cloud
  • OS (e.g. from /etc/os-release):
    Centos 7.4
  • Kernel (e.g. uname -a):
    3.10.0-693
  • Install tools:
    kubeadm
@k8s-ci-robot k8s-ci-robot added needs-sig kind/bug labels Jul 25, 2018
@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Jul 25, 2018

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network kind/friction and removed needs-sig labels Jul 25, 2018
@BSWANG BSWANG changed the title [Question]Why kube-proxy add external-lb's address to node local iptables rule? Why kube-proxy add external-lb's address to node local iptables rule? Jul 25, 2018
@Lion-Wei
Copy link

@Lion-Wei Lion-Wei commented Jul 26, 2018

I think the reason is, traffic from in cluster have no need to go out side(lb) then come back.
And probably lb should not do TLS decoder work.

@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Jul 27, 2018

@Lion-Wei Will it be an option to config this behavior?Some functions like TLS,Monitor,Logging doing on lb are more reasonable and high performance.
If kube-proxy add this option, I would like to commit a PR to it.

@k8s-ci-robot k8s-ci-robot added kind/cleanup and removed kind/friction labels Aug 23, 2018
@fejta-bot
Copy link

@fejta-bot fejta-bot commented Nov 22, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale label Nov 22, 2018
@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Nov 28, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale label Nov 28, 2018
@fejta-bot
Copy link

@fejta-bot fejta-bot commented Feb 26, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale label Feb 26, 2019
@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Mar 4, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale label Mar 4, 2019
@thockin thockin added the triage/unresolved label Mar 8, 2019
@jcodybaker
Copy link

@jcodybaker jcodybaker commented Mar 19, 2019

We're also seeing this as an issue at DigitalOcean. It's a concern not just for load-balancer TLS termination, but also for supporting proxy protocol as encouraged in ( https://kubernetes.io/docs/tutorials/services/source-ip/ ). Traffic addressed to the LB ip from within the cluster never reaches the load-balancer, and the required proxy header isn't applied, causing a protocol violation.

The in-tree AWS service type loadbalancer supports proxy protocol and TLS termination, but because they populate status.loadbalancer.ingress.hostname rather than .ip they avoid this bug/optimization.

We're willing to put together a PR to address this there's interest from sig-network to accept it. We've considered a kube-proxy flag to disable the optimization, or the more complex option of extending v1.LoadBalancerIngress to include feedback from the cloud provider.

@bowei
Copy link
Member

@bowei bowei commented Mar 25, 2019

It seems like what is being implemented here is not modeled all that well in the current kube-proxy construct; your LB is not transparent from an L3/4 perspective and lives somewhere other than the node (it is terminating TLS, adding a proxy protocol field). It seems possible for nothing to break if we remove the external LB IPs from the node as a provider-specific setting, but it will require some thought.

@snuxoll
Copy link

@snuxoll snuxoll commented Apr 19, 2019

If I’m being quite honest this behavior goes against the principle of least surprise. If I want kube-proxy to route my traffic I would use a ClusterIP service or similar and configure the application to use it, if I am hitting a LoadBalancer provided by a cloud provider there is likely to be a specific reason for that and kube-proxy shouldn’t try to second guess it.

@johnl
Copy link

@johnl johnl commented Apr 23, 2019

We hit this same problem at Brightbox with our cloud controller manager. Our Load Balancers can terminate SSL and support the PROXY protocol. Confirmed with both ipvs and iptables kube-proxy modes. Any IPs we provide as LoadBalancerIngress are used to redirect outgoing connections and keep them within the cluster, breaking things.

To put it more clearly (imo), the problem is that kube-proxy assumes all load balancer services are straight TCP/UDP and that it can optimize them internally but that is not the case.

So either the assumption is wrong and kube-proxy needs fixing, or the assumption is right and external load balancers should not be doing these fancy things. I obviously think kube-proxy is wrong here :)

As @jcodybaker says above, AWS avoids this problem simply but not listing IP addresses as LoadBalancerIngress, only hostnames, which kube-proxy doesn't resolve.

We've changed our cloud controller to do the same, listing only hostnames in there, and it's fixed our problem too.

This feels a bit like a hack, but maybe it could be argued that's the way for a cloud controller to enable/disable the optimisation - a bit misguiding though.

We were also exposing the IPv6 addresses of our load balancers too, which actually didn't look properly supported further down the stack at all, so things do seem to be a bit of a mess here generally.

So a kube-proxy flag to disable the optimization would be a great start, but I think this all needs a closer look!

Plus, it's worth noting that in ipvs mode, the IP address is actually added locally to each node so it's not possible to selectively support this per-port. As soon as a load balancer lists the IP, it's completely unreachable directly.

@freehan freehan added kind/feature and removed triage/unresolved labels May 2, 2019
@whereisaaron
Copy link

@whereisaaron whereisaaron commented May 5, 2019

If I create an external loadbalancer I don't expect kube-proxy to hijack traffic addressed to that external loadbalancer.

That completely disregards any functionality the load balancer provides, including TLS termination, proxy protocol, logging, metrics, DPI firewalling, and... the actual the load balancing algorithm!

I'm perplexed that this is default behavior? It only seems useful if your external load balancer doesn't actually do anything useful 😄

As @jcodybaker says above, AWS avoids this problem simply but not listing IP addresses as LoadBalancerIngress, only hostnames, which kube-proxy doesn't resolve. I'd like to be able to lock this out on kube-proxy in general for cluster deployments. This behavior is only going to break things.

Thanks for raising this @BSWANG. This submerged iceberg could have spoiled my day one day, but now I know to watch out for this 👀

@mosheavni
Copy link

@mosheavni mosheavni commented Jan 12, 2021

The hostname workaround doesn't work if you're creating DNS records with external-dns.
external-dns picks up the load balancer DNS instead of the IP and since it's not an A record, it just doesn't create it.
Any workaround for this case?

@reesericci
Copy link

@reesericci reesericci commented Jan 23, 2021

I'm using coredns, what's the solution to this?

@reesericci
Copy link

@reesericci reesericci commented Jan 23, 2021

and metallb + traefik and pihole to store my dns routes

@mkreidenweis-schulmngr
Copy link

@mkreidenweis-schulmngr mkreidenweis-schulmngr commented Mar 12, 2021

Just to let everybody know who's looking at this issue and might think it got fixed because the issue is "Closed".
The PR that fixed this was actually reverted later: #96454

Problem is still there in Kubernetes v1.20.
https://github.com/compumike/hairpin-proxy can help as workaround.

@AlbinoDrought
Copy link

@AlbinoDrought AlbinoDrought commented Mar 12, 2021

hairpin-proxy works but had unexpected consequences in my usecase, cert-manager/cert-manager#3749

@mkreidenweis-schulmngr
Copy link

@mkreidenweis-schulmngr mkreidenweis-schulmngr commented Mar 12, 2021

hairpin-proxy works but had unexpected consequences in my usecase, jetstack/cert-manager#3749

That's only an issue if you need both dns01 and http01 challenges to work, right?
As per my understanding the dns01 challenges should not be needing the hairpin-proxy workaround at all.

@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Apr 28, 2021

/reopen

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Apr 28, 2021

@BSWANG: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Apr 28, 2021
@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Apr 28, 2021

@BSWANG: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage label Apr 28, 2021
@BSWANG
Copy link
Contributor Author

@BSWANG BSWANG commented Apr 28, 2021

reopen due this pr #96454 reverted

@aojea
Copy link
Member

@aojea aojea commented Apr 30, 2021

reopen due this pr #96454 reverted

the revert was done because that PR merged accidentally, to avoid misunderstandings, is not that the project doesn't want to implement, is that there are some process to add new features and that PR didn't follow them.
The KEP to implement the change has been approved https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1860-kube-proxy-IP-node-binding

@Sh4d1 are you still working on this #97681? if you want to include this in 1.22, the process has changed, please see https://groups.google.com/g/kubernetes-sig-network/c/3NYVEgvE1Uc/m/o68JAIt0DAAJ for more details

@Sh4d1
Copy link
Member

@Sh4d1 Sh4d1 commented Apr 30, 2021

@aojea yep! Layed the groundwork on #101027 but I was waiting on Tim's review, tough I understood he's been a bit busy! ( You're welcome if you want to review!).
Once this one is merged, I can rebase #97681 and it should be good.

Thanks, I'll read that 😄

Miniland1333 added a commit to kreut/libretext that referenced this issue May 2, 2021
Bugfix for OAuth redirect_url not being http. This is due to an upstream proxy issue with kubernetes/kubernetes#66607
@ryanlelek
Copy link

@ryanlelek ryanlelek commented May 23, 2021

Commenting for support/+1 that self-referencing the cluster from within would be useful

@mkreidenweis-schulmngr 's hair-pin solution (kind of) worked but didn't pick up the correct name of the pod(s) to reroute to and would have worked with further digging.
Setting up a separate cluster solved the issue in the meantime.

Thanks for your work on this

@thockin
Copy link
Member

@thockin thockin commented Jul 8, 2021

#101027 was not assigned to me, so I missed it.

@thockin
Copy link
Member

@thockin thockin commented Aug 5, 2021

I'm going to close this as a dup, since we do have the KEP open

@thockin thockin closed this as completed Aug 5, 2021
@liudongbo123
Copy link

@liudongbo123 liudongbo123 commented Nov 12, 2021

how is the progress ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/cleanup kind/feature needs-triage sig/network
Projects
None yet