Sunday, April 15, 2018

MyBatis Paging, Sorting and Filtering




When considering which database persistence framework to use to complete the goal of development, we must take in consideration several factors:
  • Knowledge of your team and ability of the team leads to help out with problems
  • Available documentation
  • Maturity of the framework
  • Availability of the helper classes or supporting frameworks (e.g. how much custom functionality we need to create)
  • Underlying structure of the database and it's complexity (if exists)
  • Portability (if required)
  • "Top down" vs "bottom up" approach driven by either business requirement or existing technology
  • Whether we want or have to write SQL statements and how complex they need to be to fulfill the business goals
  • Do we "own" the data model or is this vendor maintained data model
As you can see, reaching the decision of what to use can be based on the  experience or trial and run errors.

In one of my recent projects, we reached the decision to use MyBatis as the framework that will enable us to fulfill most of the goals set upon us by the business and existing applications and database topology. In the enterprise environment, you may be faced with the decisions that span not only to your immediate application, but that may involve several others and their data models. We needed to do just that. Read data from the various data sources, integrate with several different web services (both REST and SOAP) and provide unified DB and API interface (facade). This interface, needed to bring web services and various databases together to work as one and with improved performance. Introducing ESB layer was needed but also we needed to create uniform data model (that we can model our facade objects on). MyBatis was perfect tool for this. The only problem we faced was the lack of dynamic paging, sorting and filtering (PSF) functionality, given that we needed to combine results from the different databases (where some of them had awkward design, to say the least). Hibernate was dead in the water here. We ended up using PL/SQL and SQL from various sources, using pipelines and extensive logic to bring order into the data models. This solution worked very well, and in the end, we only needed to implement and expose the Search + PSF to the API and clients. We chose this supporting framework to help us build dynamic additions to the query: squiggle-sql. In their own words:
Squiggle is a little Java library for dynamically generating SQL SELECT statements. It's sweet spot is for applications that need to build up complicated queries with criteria that changes at runtime. Ordinarily it can be quite painful to figure out how to build this string. Squiggle takes much of this pain away.
 This worked perfectly as we could expose REST API JSON with PSF parameters and reuse this through various interfaces. To avoid SQL Injection (as we were building these custom), we needed to use enumerations to match exact constructs, avoid certain risky operations like OR 1=1 or comments in filters, limit the field types and lengths. Overall, we achieved a good mix of security and usability with flexible interface.

We started first by defining the interface in JSON that must have Paging (or streaming), Sorting and Filtering functionality. Again, it is important to limit any functionality with Constants or Enums to make sure all constructs can exactly be matched to operations or underlying supporting Beans. This is important security feature.

Executing Paging in database is relatively straight forward in Oracle by adding this OFFSET <x> ROWS FETCH FIRST <y> ROWS ONLY. The other thing that is needed is to get total number of rows returned from database in order for GUI to calculate how many rows are present. This generally requires executing the statement second time without the row limiting clause or we could write a WITH statement in Oracle and execute once and link into broader query that can execute count on first occurrence and limit rows on second. Important constraint is that the page can start with 0 and up and you should not allow returning of huge pages. If you require a single result that has all records, this may be achieved by secondary API that is limited in use and users that can access it. The reason is that if parallel multiple request are executed on a huge underlying data set, it may lead on DoS type of attack.

Sorting can have multiple columns, so this means that we must match any columns for the sort with ascending or descending parameters and participating Bean properties. Sorting is added as ORDER BY clause. Important feature is to match ORDER BY clause by Enum and columns by underlying Bean. Bean that is exposed to the API should only carry fields that are necessary for API to function properly and to satisfy a business need. Any other database functionality regarding the tables should be hidden behind a services and transfer objects (ref). Order clause can be applied to the complex queries, e.g. multiple set joined by creating an encapsulating select statement around original request.

Filtering may be the trickiest one to implement due to wide range of criteria that can be applied. Same as before, using Enums to define operations and limiting search capability to the filter to use only single fields names and only AND operation is important for the aspect of security and speed. Allowing OR or free statement entry may be dangerous option for execution.

Overall, what we achieved is that now MyBatis has a potential to achieve some of the things that Hibernate has out-of-the-box.