Giter Site home page Giter Site logo

oceanbase / obdiag Goto Github PK

View Code? Open in Web Editor NEW
131.0 12.0 26.0 5.99 MB

obdiag (OceanBase Diagnostic Tool) is designed to help OceanBase users quickly gather necessary information and analyze the root cause of the problem.

Home Page: https://www.oceanbase.com/docs/obdiag-cn

License: Mulan Permissive Software License, Version 2

Shell 0.78% Python 99.22%
oceanbase toolkit python obdiag

obdiag's Introduction

OceanBase Logo

English doc Chinese doc last commit stars building status license

Join Slack Stack Overflow

English | 中文版

OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability. The OceanBase Database is not dependent on specific hardware architectures.

Key features

  • Transparent Scalability: 1,500 nodes, PB data and a trillion rows of records in one cluster.
  • Ultra-fast Performance: TPC-C 707 million tmpC and TPC-H 15.26 million QphH @30000GB.
  • Cost Efficiency: saves 70%–90% of storage costs.
  • Real-time Analytics: supports HTAP without additional cost.
  • Continuous Availability: RPO = 0(zero data loss) and RTO < 8s(recovery time)
  • MySQL Compatible: easily migrated from MySQL database.

See also key features for more details.

Quick start

See also Quick experience or Quick Start (Simplified Chinese) for more details.

🔥 Start with all-in-one

You can quickly deploy a stand-alone OceanBase Database to experience with the following commands:

Note: Linux Only

# download and install all-in-one package (internet connection is required)
bash -c "$(curl -s https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/download-center/opensource/oceanbase-all-in-one/installer.sh)"
source ~/.oceanbase-all-in-one/bin/env.sh

# quickly deploy OceanBase database
obd demo

🐳 Start with docker

Note: We provide images on dockerhub, quay.io and ghcr.io. If you have problems pulling images from dockerhub, please try the other two registries.

  1. Start an OceanBase Database instance:

    # Deploy a mini standalone instance.
    docker run -p 2881:2881 --name oceanbase-ce -e MODE=mini -d oceanbase/oceanbase-ce
    
    # Deploy a mini standalone instance using image from quay.io.
    # docker run -p 2881:2881 --name oceanbase-ce -e MODE=mini -d quay.io/oceanbase/oceanbase-ce
    
    # Deploy a mini standalone instance using image from ghcr.io.
    # docker run -p 2881:2881 --name oceanbase-ce -e MODE=mini -d ghcr.io/oceanbase/oceanbase-ce
  2. Connect to the OceanBase Database instance:

    docker exec -it oceanbase-ce obclient -h127.0.0.1 -P2881 -uroot # Connect to the root user of the sys tenant.

See also Docker Readme for more details.

☸️ Start with Kubernetes

You can deploy and manage OceanBase Database instance in kubernetes cluster with ob-operator quickly. Refer to the document Quick Start for ob-operator to see details.

👨‍💻 Start developing

See OceanBase Developer Document to learn how to compile and deploy a manually compiled observer.

Roadmap

For future plans, see Product Iteration Progress. See also OceanBase Roadmap for more details.

Case study

OceanBase has been serving more than 1000 customers and upgraded their database from different industries, including Financial Services, Telecom, Retail, Internet, and more.

See also success stories and Who is using OceanBase for more details.

System architecture

Introduction to system architecture

Contributing

Contributions are highly appreciated. Read the development guide to get started.

License

OceanBase Database is licensed under the Mulan Public License, Version 2. See the LICENSE file for more info.

Community

Join the OceanBase community via:

obdiag's People

Contributors

chyff avatar dependabot[bot] avatar duzp111 avatar ob-robot avatar teingi avatar wayyoungboy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obdiag's Issues

[Feature]: obdiag 收集的火焰图数据直接生成可视化图

Describe your use case

现有的obdiag gather perf 命令收集的数据是perf的原始数据,不方便用户直接查看

Describe the solution you'd like

希望obdiag gather perf 命令收集的数据再加工一下,方便用户查看火焰图

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 项目打包成二进制,避免依赖环境中的python

Describe your use case

环境中需要依赖python2或者python3,对环境有要求,不合适

Describe the solution you'd like

将项目打包成二进制,解除对安装环境的依赖

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: add more tasks about observer_env

Describe your use case

add more tasks about observer_env

Describe the solution you'd like

add more tasks about observer_env

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 诊断场景四:oceanbase 集群合并问题排查

Describe your use case

合并卡住,合并超时等问题排查起来需要很多的经验,很难入手

Describe the solution you'd like

期望obdiag 可以支持一下合并场景的问题排查

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 期望obdiag支持OceanBase或者其周边的一些产品(比如OBPROXY)版本巡检,告知版本风险

Describe your use case

OceanBase或者其周边的一些产品(比如OBPROXY)在不断的发展中,有一些存在已知bug的版本,期望巡检工具可以巡检出来给出提示。

Describe the solution you'd like

期望obdiag支持OceanBase或者其周边的一些产品(比如OBPROXY)版本巡检,告知版本风险

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: Support check the cluster of oceanbase

Describe your use case

巡检模块的目的是依赖已有的案例提供多个检测项对用户集群进行分析,发现已存在或可能会导致集群出现异常问题的原因分析并提供运维建议。

Describe the solution you'd like

支持使用yaml来编写巡检项,通过yaml编写的规则实现对集群进行巡检

Describe alternatives you've considered

No response

Additional context

No response

[Doc]: Who is using Oceanbase Diagnostic Tool?

Check Before Asking

  • Please check the issue list and confirm this issue is encountered for the first time.

Description

Who is using OceanBase Diagnostic Tool

We's like to thank everyone in this community for your constant support of OceanBase. We also sincerely hope that the OceanBase Diagnostic Tool can help community partners locate the root cause of problems encountered in the use of OceanBase. We're confident that, with our effort and your support, this OceanBase Diagnostic Tool tool could grow more prosperous and serve a greater number of users.

Our Intentions

  1. Help users of OceanBase to gather required information when OceanBase fails;

  2. Help users of OceanBase quickly analyze the cause of failure;

  3. We hope that while OceanBase Diagnostic Tool helps you, you can also participate and contribute together.

谁在用 OceanBase Diagnostic Tool

感谢社区每一位关注并使用 OceanBase 的伙伴, 感谢您们对 OceanBase 的信任, 也感谢您们对OceanBase 的支持, 社区的每一次进步都离不开您们的支持, 我们也衷心希望 OceanBase Diagnostic Tool 工具能帮助社区的伙伴定位自己在使用的OceanBase过程中遇到的问题. 我们会持续不断努力, 期望将 OceanBase 社区和生态打造的更加繁荣, 和更多的用户和伙伴一起成长.

此 Issue 初衷

  1. 帮助 OceanBase 的用户, 在OceanBase发生故障时收集所需的信息;
  2. 帮助 OceanBase 的用户快速分析故障原因;
  3. 期望 OceanBase Diagnostic Tool 在帮助大家的同时, 大家也能一起参与进来, 参与贡献.

Documentation Links

No response

Are you willing to submit a pull request?

  • Yes I am willing to submit a pull request.

[Bug]: python3.x obdiag analyze log : "can't decode byte 0xb7 in position"

Describe the bug

obdiag analyze log --since 10m

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 4018: invalid start byte

Environment

python 3.8

Fast reproduce steps

obdiag analyze log --since 10m

`2023-11-20 13:58:12,739 [INFO] start parse log analyze_pack_20231120135753/11_162_218_126/observer.log
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/python3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/jingshun.tq/project/oceanbase-diagnostic-tool/handler/analyzer/analyze_log.py", line 81, in handle_from_node
resp, node_results = self.__handle_from_node(args, ip, user, password, port, private_key,
File "/home/jingshun.tq/project/oceanbase-diagnostic-tool/handler/analyzer/analyze_log.py", line 160, in __handle_from_node
file_result = self.__parse_log_lines(analyze_log_full_path)
File "/home/jingshun.tq/project/oceanbase-diagnostic-tool/handler/analyzer/analyze_log.py", line 309, in __parse_log_lines
for line in file:
File "/usr/local/python3/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 4018: invalid start byte
2023-11-20 13:58:12,746 [INFO] [chan 12] sftp session closed.

Analyze OceanBase Online Log Summary:

+--------+----------+------------+-------------+-----------+---------+---------+------------+--------------------+-------------------+-------------+
| Node | Status | FileName | ErrorCode | Message | Count | Cause | Solution | First Found Time | Last Found Time | Trace_IDS |
+========+==========+============+=============+===========+=========+=========+============+====================+===================+=============+
+--------+----------+------------+-------------+-----------+---------+---------+------------+--------------------+-------------------+-------------+
For more details, please run cmd 'cat analyze_pack_20231120135753/result_details.txt'`

Expected behavior

No response

Actual behavior

No response

Additional context

No response

[Bug]: fix task about swap

Describe the bug

swap的检测措施有异常

Environment

1.5.1

Fast reproduce steps

直接执行即可复现

Expected behavior

No response

Actual behavior

No response

Additional context

No response

[Feature]: 支持OceanBase 4.x版本trace.log的全链路诊断

Describe your use case

OceanBase 数据库是分布式数据库,因此调用链路复杂,当出现超时问题的时,往往无法快速定位是OceanBase 内部组件或是网络的问题,运维人员只能根据经验和 observer 日志进行分析。OB内核在4.x新增了trace.log日志,可以用于分析全链路诊断。

Describe the solution you'd like

trace.log日志分散在各个节点上,希望obdiag可以支持收集对应trace_id的日志后进行聚合,生成全链路诊断的span树。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 磁盘占用率高时,希望obdiag可以支持一键诊断给出合理的迁移语句

Describe your use case

故障场景:OceanBase集群某些节点磁盘使用率过高时,比如某些节点超过85%的时候,不及时处理,继续写入会导致集群故障。

Describe the solution you'd like

OceanBase集群某些节点磁盘使用率过高时,比如某些节点超过85%的时候,不及时处理,继续写入会导致集群故障。如果我们的集群中有部分节点的磁盘使用率不高,是可以通过迁移unit将磁盘高的节点的数据迁移到磁盘使用率不高的节点上来应急的。不过判定哪些unit可以迁移到哪,比较繁琐,希望诊断工具能支持一下,,将unit迁移的判断逻辑工具话,直接生成迁移语句。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 内核参数max_map_count告警级别及报错信息调整

Describe your use case

因为内核参数配置导致集群出现问题,通过一键巡检很难快速定位原因。大部分告警级别都是warning,很难从中筛选出关键配置项。
3cbf9ed8e4b0c7a293bfe5a184668d83

Describe the solution you'd like

某些关键的内核参数如果配置有误,设置较高的告警级别,方便快速定位,告警信息可以给出此配置项对集群的影响。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]:诊断出问题后,在工具层能够通过脚本自动生成恢复的SQL或者能够执行一个脚本自动恢复

Describe your use case

我们有时会碰到一些缺陷,发现后需要执行一些命令来恢复环境,如果能把这一类经验沉淀到工具中,能够帮助运维人员节省大量的时间,以及减少误操作的可能性。

Describe the solution you'd like

工具能够自动执行恢复或者生成SQL帮助用户恢复,可以讨论下哪种方式更合适。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: change the way of check to get system parameters

Describe your use case

目前是通过sysctl -n获取内核参数,部分主机上禁用了sysctl远程执行,需要变更为cat /proc/sys/进行读取

Describe the solution you'd like

  1. 替换输入内核参数的中“.”为“/”
  2. 变更底层实现为cat /proc/sys/

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 使用阵列卡设置写入规则为Write back时Clog合并异常

Describe your use case

在设备迁移到使用Write back设置的阵列卡SSD是INTEL SSDSC2BB01的时候会导致OB 4.1.0.1的clog回放异常,导致所有节点日志同步不正常,导致节点的合并异常,导致节点的备份异常,导致节点无法操作

Describe the solution you'd like

目前来看只能是模拟消费来测试是否存在问题,有问题就提示用户修订为直写或者修改为JOB存储

Describe alternatives you've considered

No response

Additional context

No response

[Enhancement]: 使用普通用户进行检测时失败

Description

环境信息:独立部署的1.5版本obdiag,主机ssh用admin用户,配置了SSH免密
异常1:所有sysctl参数都取不到,都报错。用root就没此问题
异常2:命令不能执行,实际上在主机上admin用户下可执行
37712e0ea0dd12be7d4ad8b1f870a36c

[Feat.]: The host does not have the zip tool installed, obdiag gather failed.

Check Before Asking

  • Please check the issue list and confirm this feature is encountered for the first time.
  • Please try full text in English and attach precise description.

Description

The host does not have the zip tool installed. We hope obdiag to support other compression tools.

Other Information

No response

[Feature]: 巡检指标对接普罗

Describe your use case

对集群进行巡检,可以将各项指标输出至普罗,以此来监控集群的状态。也可以在集群出现异常时快速定位问题

Describe the solution you'd like

1、场景一:数据库突然变慢,立即执行巡检,不能影响当前数据库变得更慢,巡检报告中能明确重点标识出影响指标,如果是某一条sql性能有问题,则标识出具体的sql ,如果是多个server中的某一个有问题,则标识出问题server和指标    
2、场景二:通过监控发现某一条sql在一个时间段内执行的特别慢,但是其他时间就很快,这个时候  按执行慢的时间段 作为条件来执行下  巡检, 能定位到当时这个时间段内 为啥sql变慢了
3、场景三:数据库故障时,本身就连不上数据库了, 执行巡检还可以正常执行吗? 把不需要连接数据库巡检的指标巡检出来
4、场景四:日常运维巡检需求,集群本身的资源巡检、性能巡检、引擎巡检、集群架构中所有小组件的可用性巡检

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: fix some bug about step model

Describe your use case

fix: some bug about step model
add: some tasks about sysbech_check

Describe the solution you'd like

add Exception for execution
add tasks

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: observer节点 clog盘满根因分析

Describe your use case

对于分区数多、写入压力大的集群,如果转储慢(卡住)、或者租户unit规格异构,那么可能导致部分副本的clog回收不及时导致盘空间达到95%,此时observer会自动停写,这样这台机器上会有大量副本不同步。

这种场景下如何分析是需要很多专业知识的。

Describe the solution you'd like

希望obdiag工具可以支持一键分析clog盘满的根因,给出原因以及解决办法

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: obdiag支持ddl 的问题诊断以及诊断信息采集

Describe your use case

目前DDL的问题排查诊断主要依赖RD手动执行查询命令或者搜索日志,效率上可以提高。

报错场景
● 建索引
● table redefinition路径
● 删列、按表恢复
● 添加约束、外键
● 4103报错
● 卡主场景
● 性能慢场景
● 空间不足

Describe the solution you'd like

希望借助obdiag工具将一些经典问题的排查手段沉淀成脚本,方便后续大家自助进行问题排查。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 期望支持基于某一次SQL执行的诊断信息收集

Describe your use case

排查SQL问题的时候一般都需要获取 extended 计划,执行计划的trace log , 执行的sql audit 信息, 执行trace log等信息

Describe the solution you'd like

排查SQL问题的时候一般都需要获取 extended 计划,执行计划的trace log , 执行的sql audit 信息, 执行trace log等信息,希望能一键执行返回这些诊断信息。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 集群不可用时obdiag 信息收集功能不阻塞

Describe your use case

集群不可用时obdiag 信息收集功能会因为连接不上集群而报错

Describe the solution you'd like

集群不可用时obdiag 信息收集功能不阻塞

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: obdiag 支持docker部署的OB信息收集、日志分析和巡检

Describe your use case

obdiag目前只是针对物理机部署的OB集群可以进行信息采集、日志分析和巡检,不支持docker部署的OB。

Describe the solution you'd like

期望可以支持docker部署的OB的信息采集、日志分析和巡检

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 基于故障场景的诊断信息精准收集

Describe your use case

目前的obdiag信息采集支持每个单独采集项的采集以及所有项一起采集,不过没有有针对性的去进行组合来支持特定场景下的采集项集合。

比如:

  • 应用报错
  • clog盘满
  • 备份恢复问题
  • 事物问题
    ...

Describe the solution you'd like

期望每个场景下obdiag能对应捞取相关的日志和内部的视图数据,这样排查问题的时候不需要来回交互浪费时间,收集的信息也能在场景明确的时候更精准。

Describe alternatives you've considered

No response

Additional context

No response

[Bug]: build.sh failed

Check Before Asking

  • Please check the issue list and confirm this bug is encountered for the first time.
  • Please try full text in English and attach precise description.

Environment

No response

Fast Reproduce Steps

command "cd oceanbase-diagnostic-tool && sh ./build/build.sh" failed

Actual Behavior

command "cd oceanbase-diagnostic-tool && sh ./build/build.sh" failed

Expected Behavior

No response

Other Information

No response

[Question]:observer 内存爆问题排查

内存爆表现

一般来说,内存爆有两种直观现象
1、请求返回-4013/-4030;
2、日志中出现类似oops, alloc failed字样;

常用诊断表

  • __all_virtual_memory_info
    这张表所有版本都有,属于mod级别。mod是observer一个用于监控的概念,可以用于最快地定位出问题所属模块甚至代码,
OceanBase(root@oceanbase)>desc __all_virtual_memory_info;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| tenant_id   | bigint(20)   | NO   | PRI | NULL    |       |
| svr_ip      | varchar(46)  | NO   | PRI | NULL    |       |
| svr_port    | bigint(20)   | NO   | PRI | NULL    |       |
| ctx_id      | bigint(20)   | NO   | PRI | NULL    |       |
| label       | varchar(256) | NO   | PRI | NULL    |       |
| ctx_name    | varchar(256) | NO   |     | NULL    |       |
| mod_type    | varchar(256) | NO   |     | NULL    |       |
| mod_id      | bigint(20)   | NO   |     | NULL    |       |
| mod_name    | varchar(256) | NO   |     | NULL    |       |
| zone        | varchar(256) | NO   |     | NULL    |       |
| hold        | bigint(20)   | NO   |     | NULL    |       |
| used        | bigint(20)   | NO   |     | NULL    |       |
| count       | bigint(20)   | NO   |     | NULL    |       |
| alloc_count | bigint(20)   | NO   |     | NULL    |       |
| free_count  | bigint(20)   | NO   |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+

这张表是最常用的,一般来说可以定位大部分问题。

  • __all_virtual_mem_leak_checker_info
OceanBase(root@oceanbase)>desc __all_virtual_mem_leak_checker_info;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| svr_ip      | varchar(46)   | NO   |     | NULL    |       |
| svr_port    | bigint(20)    | NO   |     | NULL    |       |
| mod_name    | varchar(256)  | NO   |     | NULL    |       |
| mod_type    | varchar(256)  | NO   |     | NULL    |       |
| alloc_count | bigint(20)    | NO   |     | NULL    |       |
| alloc_size  | bigint(20)    | NO   |     | NULL    |       |
| back_trace  | varchar(4096) | NO   |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+

专用于mem leak功能的虚拟表,不开启时为空。

  • __all_virtual_tenant_ctx_memory_info
OceanBase(root@oceanbase)>desc __all_virtual_tenant_ctx_memory_info;
+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| tenant_id | bigint(20)   | NO   | PRI | NULL    |       |
| svr_ip    | varchar(46)  | NO   | PRI | NULL    |       |
| svr_port  | bigint(20)   | NO   | PRI | NULL    |       |
| ctx_id    | bigint(20)   | NO   | PRI | NULL    |       |
| ctx_name  | varchar(256) | NO   |     | NULL    |       |
| hold      | bigint(20)   | NO   |     | NULL    |       |
| used      | bigint(20)   | NO   |     | NULL    |       |
| limit     | bigint(20)   | NO   |     | NULL    |       |
+-----------+--------------+------+-----+---------+-------+

mod仅仅是一个监控级别的概念,但observer实际内存管理并不以mod为基本单元,因此可能会出现sum mod的hold与租户内存不匹配的情况(比如内存碎片/泄漏)。
租户下面最近的一级是ctx,可以用这张表查询每个租户下面所有ctx的内存使用,该表在3.2版本后生效,早期版本只能看日志。

日志

grep这个标签可以查到租户、ctx和mod相关的内存信息的日志,10秒一次。

grep "\[MEMORY\]"

grep这个标签可以查到observer进程级别的内存信息的日志,10秒一次。

grep "\[CHUNK_MGR\]"
内存爆的类型 日志信息(关键字为 [OOPS])
SINGLE_ALLOC_SIZE_OVERFLOW single alloc size large than 4G is not allowed(alloc_size: %ld)
CTX_HOLD_REACH_LIMIT ctx memory has reached the upper limit(ctx_name: %s, ctx_hold: %ld, ctx_limit: %ld, alloc_size: %ld)
TENANT_HOLD_REACH_LIMIT tenant memory has reached the upper limit(tenant_id: %lu, tenant_hold: %ld, tenant_limit: %ld, alloc_size: %ld)
SERVER_HOLD_REACH_LIMIT server memory has reached the upper limit(server_hold: %ld, server_limit: %ld, alloc_size: %ld)
PHYSICAL_MEMORY_EXHAUST physical memory exhausted(os_total: %ld, os_available: %ld, server_hold: %ld, errno: %d, alloc_size: %ld)

在内存爆的时间点,搜索OOPS日志,确定是哪个类型的内存爆
grep '[OOPS]' observer.log

[Feature]: 分布式场景寻找最初引起报错的机器,并打包日志带回

Describe your use case

分布式计划执行过程中,一个集群下往往对应多台Observer机器。当其中一台Observer机器发生错误时,集群中的其他机器作为受害者,也会出现一定的日志报错信息干扰,不便于用户排查问题。
期待在Obdiag工具下生成新的集成命令,一键协助用户在指定Trace ID的情况下,完成集群机器日志的诊断,提取出最先报错的Obsever地址信息(IP+PORT),进而收集最先报错机器的日志信息打包给用户。

Describe the solution you'd like

基于分布式交互逻辑中的日志信息

Describe alternatives you've considered

No response

Additional context

No response

[Bug]: get ocp conf error

Describe the bug

get ocp conf error

Environment

1.4.0

Fast reproduce steps

get true data

Expected behavior

No response

Actual behavior

No response

Additional context

No response

[Feature]: [check tool]Support the info_msg field of report in task yaml file

Describe your use case

Support the info_msg field of the report in the task yaml file, so that info-level information can be exposed in the inspection report

Describe the solution you'd like

Can be used to reveal the corresponding value when the step is executed successfully.

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 期望obdiag支持sysbench压测场景下的诊断

Describe your use case

sysbench压测场景是数据库选型的常用压测手段,也是遇到问题比较多的场景,当遇到问题时(比如qps不符合预期等),不知道具体原因,希望obdiag能支持。

Describe the solution you'd like

希望obdiag可以在巡检模块中增加sysbench压测场景下的巡检,帮助用户巡检出sysbench压测时候主机、集群或者sysbench本身的一些问题点,辅助更好的进行sysbench压测。

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 期望支持OMS的故障诊断,包括信息采集、信息分析、集群巡检

Describe your use case

OceanBase 迁移服务(OceanBase Migration Service,OMS)是 OceanBase 提供的一种支持同构或异构数据源与 OceanBase 数据库之间进行数据交互的服务,具备在线迁移存量数据和实时同步增量数据的能力。

OMS故障场景下的分析希望obdiag可以支持。

Describe the solution you'd like

期望支持OMS的故障诊断,包括信息采集、信息分析、集群巡检

Describe alternatives you've considered

No response

Additional context

No response

[Feature]: 诊断场景二:oceanbase 集群 锁冲突

Describe your use case

排查业务环境过程中,经常遇到锁冲突的问题,这类问题排查起来费时费力,需要关联日志和内部的虚拟表才能定位出来。

Describe the solution you'd like

希望obdiag可以支持一下锁冲突的这种场景检测,结合日志和内部虚拟视图帮助用户一键执行寻找锁冲突的问题点。

Describe alternatives you've considered

No response

Additional context

No response

[Feat.]: Whether it can support without filling in the host login user name and password

Check Before Asking

  • Please check the issue list and confirm this feature is encountered for the first time.
  • Please try full text in English and attach precise description.

Description

The host has been set up for mutual trust and confidentiality. Whether it can support without filling in the host login user name and password.

Other Information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.