Skip to content

直接建连方式的下游服务重连策略缺失与强制关闭问题,求解决 #1784

@jessesimpson

Description

@jessesimpson

问题描述:
Kitex v.13.1
下游服务采用直接建连方式与上游服务建立连接,而不是采用ETCD服务注册与发现(经测试无问题),当关闭上游服务会导致下游服务的强制关闭,重连也没有任何策略,1秒内刷了6次,日志如下:
2025/06/03 14:00:17.542890 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=1 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17.544487 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=2 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17.545828 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=3 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17.547422 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=4 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17.549004 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=5 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17.550093 middlewares.go:137: [Warn] KITEX: auto retry retryable error, retry=6 error=get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer
2025/06/03 14:00:17 get connection error: dial tcp6 127.0.0.1:7081: connection has been closed by peer

找到了6次重连的源码部分,发现注释: // TODO: generalize retry strategy

func newResolveMWBuilder(lbf *lbcache.BalancerFactory) endpoint.MiddlewareBuilder {
......
var lastErr error
for i := 0; i < maxRetry; i++ {
picker := lb.GetPicker()
ins := picker.Next(ctx, request)
if ins == nil {
err = kerrors.ErrNoMoreInstance.WithCause(fmt.Errorf("last error: %w", lastErr))
} else {
remote.SetInstance(ins)
// TODO: generalize retry strategy
err = next(ctx, request, response)
}
......
if retryable(err) {
lastErr = err
klog.CtxWarnf(ctx, "KITEX: auto retry retryable error, retry=%d error=%s", i+1, err.Error())
continue
}
return err
}
return lastErr
}

希望诉求:提供完整的Retry策略,并且不会强制关闭进程,即使尝试使用以下方式捕获也不行
func main() {
defer func() {
if r := recover(); r != nil {
klog.Errorf("happen panic: %v", r)
}
}()
......
c, err := rpcservice.NewClient("rpc.service",
......
client.WithErrorHandler(func(ctx context.Context, err error) error {
klog.Errorf("Kitex client error: %v", err)
return err
}),
)

}
求大佬给与帮助,万分感谢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions