Toward Database Mesh — the Evolution and the Future of Sharding-Sphere

Apache ShardingSphere
11 min readJun 19, 2018

--

The SOA has always been a focused topic going along with the trends of micro service and cloud native development, but the indispensable data-access-layer supporting SOA is rarely discussed. Although the present relational database cannot support cloud native and distributed database very well, it is undeniable that relational database still plays a very important role on current application system. In consideration of the maturity of relational database itself and surrounding ecosystems, the flexibility of data query, the ability of developer and DBA to master it and the ease of recruiting suitable staff, it cannot be completely replaced by NoSQL or NewSQL in the near future.

In that case, is there an effective solution to manage more and more database vertical sharding in the micro-service architecture and database horizontal sharding for the rapid expansion of data? Can the popular Service Mesh bring some enlightenment to database orchestration?

1 Database Mesh

Database Mesh, a new vocabulary derived from Service Mesh, uses a mesh layer to orchestrate the databases distributed throughout the system. The interaction networks among applications and databases, which is concentrated through mesh layer, is as complex and orderly as a cobweb. At this point, the concept of Database Mesh is not different from Service Mesh. It is called a “Database Mesh”, not a “Data Mesh”, because its primary goal is to mesh the interaction among applications and databases rather than to mesh data stored in databases. Database Mesh is centered on how to connect the distributed data access applications and databases together. It focuses more on the interaction, namely organizing the messy interaction among applications and databases effectively. By using Database Mesh, applications and databases will form a large grid architecture, in which they just need to be put into the right position, for they are all be orchestrated by mesh layer.

(1) Introduction of Service Mesh

Service Orchestration mainly focuses on non-functional requirements such as service discovery, load balancing, dynamic routing, service downgrading and circuit-breaker, APM and SLA collection. Generally, the solutions are proxy model and client model.

Proxy model is based on gateway. The application servers that provide service are hidden behind the gateway, through which all requests will be routed to the backend applications after service orchestration. Nginx, Kong, Kubernetes Ingress are classic cases for this solution. As for client model, service orchestration is provided by class library deployed in the application sand service provider could be visited point-to-point. Example cases are Dubbo and Spring Cloud.

Both the proxy model and the client model have its own advantages and disadvantages.

The advantage of proxy model lies in the gateway address is enough for the application, with complex backend deployment architecture totally hidden. The disadvantage is that the performance and availability of proxy itself is the bottleneck of the whole system, with service crash bringing serious consequence. Moreover, centralized architecture is not compatible with cloud native.

The third model, Sidecar, seems to much more meet the requirements of both zero cost and centerless of cloud native. Sidecar starts with an independent process that can be either shared by all the applications in the same physical server or employed by single application. All service orchestration functions are provided by Sidecar, and the RPC among applications just go through Sidecar. Obviously, Sidecar based on Service Mesh is a better way to implement cloud native; zero cost and centerless make Service Mesh more and more popular.

It is especially powerful combining Sidecar together with Mesos or Kubernetes, which could ensure that Sidecar starts in each host and takes advantage of its dynamic scheduling capability for containers. Kubernetes (Mesos) + Service Mesh = elastic scaling + zero cost + centerless; they work together to set up a cloud infrastructure.

(2) The difference between Database Mesh and Service Mesh

Both similarities and differences exist in the goals of database orchestration and service orchestration. Compared with Service, Database is stateful, which cannot be routed to peer node as Service can be, so data sharding is an important capability for database orchestration solution. Relatively speaking, the automatical discovery ability of database instance is less important, which is also because database is stateful; to start or stop a new database instance often signifies data migration. Certainly, it can be further processed by using replicas, read-write splitting, multi-writing masters and so on. Other functions such as load balancing for multiple slaves, circuit breaker and APM collection can also be used in database orchestration.

As with service orchestration, these three solutions can be also applied to the database orchestration.

The proxy model is to use a proxy server that implements database communication protocols, such as MySQL. Cobar, MyCAT, kingshard and forthcoming Sharding-proxy (part of Sharding-Sphere) are developed in this way. The client model has to be strongly tied to the development language, such as Java, which can be implemented by JDBC or an ORM framework, e.g. TDDL or Sharding-JDBC.

Similarly, for service orchestration, both proxy model and client model have their own advantages and disadvantages. The advantage of proxy model is that it supports heterogeneous language, while its disadvantage is centralized architecture. The advantage of client model is centerless architecture, while the disadvantage is that heterogeneous languages cannot be supported; therefore, it cannot support the CLI and GUI of databases.

Sidecar model can also effectively combine the advantages of proxy model and client model and remove their disadvantages. But is Sidecar based on Service Mesh the same as Sidecar based on Database Mesh? Of course not. The main difference lies in data sharding, a complicated process. If you want transparent sharding action to applications, the general process is to parse SQL, route it to the corresponding databases and tables and eventually merge the result sets to ensure the process logic is still correct during data sharding. It can be seen that the core process of data sharding is SQL parsing → SQL routing → SQL rewriting → SQL executing → results merging. In order to meet the requirement of zero cost into legacy codes, it also needs to encapsulate the execution protocol of SQL. For example, in proxy model, it is necessary to simulate the communication protocol of MySQL or other databases; in client model, it is necessary to implement JDBC interface if using Java.

So, is there a Database Mesh product currently? Unfortunately, not yet. Even Service Mesh, which is popular now, still has a long way to go to ripen its products. Although Linkerd and Envoy are available in production environment, a new generation product — Istio has attracted industry attention with a better architecture (unfortunately it is not yet available in production environment). As the extension of Service Mesh, Database Mesh is still in its early stage of development.

2 Sharding-JDBC & Sharding-Sphere

Sharding-JDBC was open sourced by DangDang company in 2016. Initially, it was a database middleware that implements data sharding on JDBC layer of Java. This year, JDF (JingDong Finance) Cloud has decided to promote Sharding-JDBC as a main product. Therefore, Sharding-JDBC needs to have more new features. For the cloud, Database Mesh is undoubtedly a possible trend. However, service supported only by JDBC cannot meet the requirement of user scenario diversity, and it is necessary to create a distributed database middleware ecosystem to support various scenarios. Thus Sharding-Sphere was born. Sharding-JDBC is also incorporated into the ecosystem, and Sharding-Proxy, which is a proxy product with the same core as Sharding-JDBC’s, is developed. Meanwhile, Sharding-Sidecar that is expected to be a pure cloud native database middleware is being incubated.

(1) The Objective of Sharding-JDBC

The ultimate goal of Sharding-Sphere is to make user use databases scattered along various systems as simply as using one database. We hope developers and DBAs could migrate their work to the cloud native environment based on Sharding-Sphere as smoothly as possible. Also,we want to provide a centerless, zero cost and cross-language cloud native solution through Sharding-Sphere.

(2) The Eevolution Pprocess of Sharding-JDBC

Data sharding in JDBC layer has always been the core philosophy of Sharding-JDBC. Its architecture is as follows:

It consists of data sharding, B.A.S.E transaction and database orchestration modules. As the core module, data sharding has implemented SQL parsing, routing, rewriting, executing and result merging completely. However, as a product that aims to serve the cloud native, providing services only in the JDBC layer is far from enough.

The three subprojects of Sharding-Sphere, namely Sharding-JDBC, Sharding-Proxy and Sharding-Sidecar, form a sharding ecosystem and provide more targeted and differentiated services for different requirements and environments. At present, Sharding-JDBC has been stable, Sharding-Proxy will be released soon, and Sharding-Sidecar has been on the developing schedule. It is not complicated to adjust the Sharding-Proxy architecture because the core function of data sharding has been implemented; data sharding logic of Sharding-JDBC is used in the core code of Sharding-Proxy, implementing MySQL protocol externally, and compatible protocol of other databases will be also supported afterwards. The architecture diagram is as follows:

The birth of Sharding-Proxy makes it possible that DBAs could operate databases through Sharding-JDBC. Sharding-JDBC does not need to be redirected through proxy, so its performance is better. Preferably, Sharding-Sphere ecosystem can be used with the following mixed scheme:

Sharding-JDBC is used by application on production environment to directly connect database to get the best performance, and users could use MySQL CLI or GUI to connect Sharding-Proxy to simply query data or to execute various DDL statements. Sharding-JDBC and Sharding-Proxy share the same registry center, which is managed by Sharding-Administrator and can automatically push configuration changes to JDBC and Proxy. If the number of connections increases sharply due to too much sharding databases, you can directly use Sharding-Proxy to effectively control the number of connections.

In the near future, Sharding-Sidecar will also be available, and its architecture is:

Database Mesh based on Sharding-Sidecar and Service Mesh complement each other. The interaction among services is taken over by Service Mesh Sidecar, and the SQL-based database access is handled by Sharding-Sidecar. For business applications, neither RPC nor database access needs to care about its actual physical deployment structure to make zero cost come true. Sharding-Sidecar lives with the physical server; therefore, it is not a static IP, but completely dynamic and elastic, without a center node in the whole system. You can also use Sharding-Proxy acting as a static entry to do some data operations through various CLI or GUI.

(3)Unravel the Secrets of Sharding-JDBC

Since the birth of Sharding-Sphere, its user manual has been relatively perfect; as it has always been in a state of rapid development, we rarely reveal its internal implementation details. Although Sharding-Sphere is open source, it is obviously difficult for end users to read all the details when determine a third-party project. Due to limited space, we cannot analyze all source code of Sharding-Sphere; therefore, we’d like to answer some common and important questions for our users here.

① 1. Q: How does Sharding-Sphere parse SQL? Is there any performance problem?

A: Sharding-Sphere uses lexer + parser to parse SQL: split SQL into token list firstly and parse them according to SQL syntax. SQL parsing of Sharding-Sphere extracts parsing context for sharding directly, such as Tables, Select Items, Conditions, Order Items, Group By Items, Limit and so on, instead of producing an AST (Abstract Syntax Tree). The parsing context is used by route engine directly, which avoids a second traversal of AST and further improving the performance. A relatively complex SQL parsing of Sharding-Sphere requires about 10ms, several times or even more than 10 times faster than a JavaCC-based SQL parser like JSqlParser.

② Q: Does Sharding-Sphere need to take all the all of data into memory for some operations such as paging, sorting, and grouping queries? If it needs to, will it to cause out of memory?

A: Sharding-Sphere uses stream merger and memory merger to merge data. Stream merger does not consume memory; every call for next() method will make the cursor move down one position, of which the principle is the same as ResultSet of JDBC. Memory merger consumes memory; when memory merger is used, all the data in ResultSet should be loaded into memory to merge. Sharding-Sphere uses memory merger only in one case that both ORDER BY and GROUP BY existing in one SQL but in a different order. When LIMIT, ORDER BY only, GROUP BY only or ORDER BY coexisting with GROUP BY in the same order appears in one SQL, stream merger is selected which will not occupy extra memory. It is difficult to explain all the detailed implementation in this article; we will provide more articles for further analysis. In short, we have done a lot to optimize the kernel of Sharding-Sphere.

According to incomplete statistics, scores or hundreds of companies and organizations are using Sharding-Sphere, including some well-known enterprises. In fact, we hope that Sharding-Sphere could bring sufficient confidence to users who choose us.

(4) Roadmap

We are confident that Sharding-Sphere will continue growing rapidly in 2018; and the future planning focuses on the following four aspects.

① Cloud native. Sharding-JDBC has been pretty mature, Sharding-Proxy is going to be released recently, and Sharding-Sidecar will be also put on the schedule in the near future. Embracing all the three models, Sharing-Sphere will shine in all directions of cloud native.

② SQL compatibility. At present, the SQL kernel of Sharding-Sphere supports most of the statements, such as DQL, DML, DDL, DCL and MySQL admin, except for subquery and OR statement. As the core module of a database product, we will improve the SQL compatibility of Sharding-Sphere as much as possible to achieve the best compatibility of legacy code.

③ B.A.S.E Transaction & Data Orchestration. Database middleware should focus on data sharding, distributed transaction and database orchestration. However, Sharding-Sphere mainly emphasizes on data sharding with little effort on the other two aspects. Thus, we will gradually pay more attention to B.A.S.E transaction and database orchestration while improving data sharding. Database Mesh is to orchestrate the data access layer and database. Currently, Sharding-Sphere has a lot to do to orchestrate database. In addition, we will also improve B.A.S.E transactions of Sharding-Sphere, so that it can automatically roll back data through parsing BINLOG.

④ APM collection. At present, Sharding-Sphere has supported open tracing protocol and got official approval. It has been also integrated with another well-known APM project, SkyWalking; by using SkyWalking, you can directly view and analyze the APM of Sharding-Sphere.

If you are interested in Sharding-Sphere, welcome to the official website:
http://shardingsphere.io/

or github:
https://github.com/sharding-sphere/sharding-sphere/

Sharding-Sphere, consisting of three independent products, namely Sharding-JDBC, Sharding-Proxy, and Sharding-Sidecar, is an ecosystem of open source distributed database middleware solutions. They all provide standardized data sharding, read-write splitting, flexible transactions and data management functions, which can be applied to various application scenarios such as Java isomorphic and heterogeneous languages, containers and cloud natives.

For more information, you could visit our new website:

http://shardingsphere.io/

or Twitter: @ShardingSphere

If you like it, please give us a star on Github as encouragement.

The project address:

https://github.com/sharding-sphere/sharding-sphere/

https://gitee.com/sharding-sphere/sharding-sphere/

Thank you very much.

--

--

Apache ShardingSphere
Apache ShardingSphere

Written by Apache ShardingSphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database. https://linktr.ee/ApacheShardingSphere

No responses yet