Over the past 30 years, Java has significantly changed the way applications interact with databases. In the early days, Java applications typically used a client/server model and a three-tier architecture, where SQL statements were often embedded as string literals within the application. These statements were sent to databases through straightforward interfaces such as JDBC.
As programming paradigms shifted towards Object-Oriented Programming (OOP), Java developers began to integrate business logic more seamlessly into data structures. This move represented a shift from purely database-centric models to application-first designs. This transition enabled data abstraction through application programming interfaces (APIs), allowing SQL database tables and rows to be represented as classes and objects within the application. As a result, data sharing and persistence also evolved, moving from strictly normalized relational tables to document-oriented databases that align more closely with object-oriented structures and specific use cases.
The role of databases has evolved from being the central repository for enterprise information systems to becoming a component of a wider application ecosystem. Nowadays, databases are designed with specific services and access patterns in mind. They often function within more complex architectures, where data is transferred to other databases to support various distinct services.
Pre-Java Era: Interacting with Databases
Before Java’s emergence, database interactions were mainly handled through SQL statements that were deeply integrated into applications. The SQL standard governing this has not changed since SQL-92, which outlines four primary methods for executing SQL statements.
Two methods focused on end-user interactions through the Command Line Interface (CLI), today referred to as data scientists.
- Direct Invocation: SQL statements are executed directly via command-line interfaces with literal values. This method allows for instant command execution without additional code.
- Dynamic SQL Statements: These are prepared with placeholders (bind variables or parameters), allowing specific values to be bound before execution. This approach enables reusability and efficiency by preparing a statement once and executing it multiple times with different values but the same compiled execution plan.
Two methods were developed to program and execute SQL queries from a procedural language. Today, we might refer to it as a persistence layer API:
- Stored Procedures: These are predefined SQL statements packaged into procedures within the database module, which can be deployed alongside database tables. Stored procedures are optimal for encapsulating complex SQL logic and are executed within the database server environment.
- Embedded SQL is an extension of a host programming language that requires preprocessing by a precompiler to translate it into stored procedures or low-level database calls. This method was typical in database-centric deployments using limited programming languages but fell out of favor with more modern architectures.
By the late 1980s and early 1990s, the shift towards client-server architectures broadened development possibilities, introducing graphical user interfaces (GUIs) and more flexible runtime environments. This change encouraged developers to select their preferred programming languages, rendering stored procedures proprietary languages and embedded SQL precompilers less practical. Applications increasingly relied on dynamic execution of SQL text sent via lightweight APIs like Open Database Connectivity (ODBC).
ODBC standardized the use of SQL in applications by sending SQL statements as character strings through a limited set of methods to connect, prepare, bind, execute, and fetch. This abstraction enabled rapid development but introduced risks, such as inefficiencies and vulnerabilities like SQL injection when user inputs were mishandled. Hence, prepared statements, which separate SQL syntax from data inputs through placeholders (e.g., question marks), became the best practice. This approach safeguards against injection attacks and enhances performance by parsing and compiling SQL statements only once upon preparation.
This employed dynamic methods designed for user interaction rather than those meant for programming languages, resulting in increased agility. The database client is the application runtime instead of a user, yet the transaction dynamic uses the same chatty protocol with an interactive sequence of statements and responses. ODBC does not present the database as a data persistence service. Instead, it introduces a protocol to facilitate program communication, similar to what a command-line interface would send following the user interactions.
Using SQL as character strings in the code challenges application development and evolution. Such practices restrict compile-time error detection and necessitate synchronization between the application code and any modifications to the database structure. The schema is defined in the database using Data Definition Language (DDL) but utilized in the application with Data Manipulation Language (DML). This process involves different tools and teams. Unlike stored procedures or pre-compilers, no syntax or type is checked before runtime.
Java and the Advent of JDBC
JDBC, introduced in 1997 as part of JDK 1.1, defined how Java applications connect with database systems. It brought a standardized mechanism similar to ODBC, providing Java with a versatile toolset for accessing different kinds of databases using SQL. This innovation was key in abstracting the complexities of database protocols, thereby maintaining Java’s principle of cross-platform compatibility in data access.
One of JDBC’s fundamental contributions was providing a consistent set of methods to establish connections, execute SQL statements, retrieve results, and manage transactions. This standardization enabled application developers to focus more on logic and less on the underlying protocols of various database systems. Vendors subsequently created specific JDBC drivers, which were crucial to this ecosystem. These drivers varied significantly, ranging from layers on top of ODBC to dedicated Java implementations of the database’s client interface. Java was multi-platform, and JDBC introduced a client/server API standard for all databases. However, like many such abstractions, it had to be confined to the lowest common denominator of all databases, resulting in a limited interface for navigation through cursors or mapping data types between the application runtime and the database engine. It maintained the illusion of an SQL standard that allowed for building database-agnostic applications, but all SQL dialects are different, and JDBC transmits SQL as-is.
Java’s orientation toward object classes representing business entities conflicted with the relational database model of tables and rows. This paradigm difference, “impedance mismatch,” underscored the challenges of communicating with the database using dynamic SQL through JDBC. Developers were often required to craft SQL queries as hardcoded strings with placeholders, binding scalar values to execute them, and processing a flat result set where rows are fetched and columns are indexed. This method lacked semantic consistency between the object and relational models and provided no compile-time checking, increasing the likelihood of runtime errors and SQL injection vulnerabilities. The tedious process of mapping two-dimensional result sets to complex object graphs further revealed these limitations. To obtain results from multiple tables in relational databases, a developer had to decide between joining tables into a single two-dimensional table and using additional application code to break it into various application objects or executing several queries, which would increase database round trips and impact performance and scalability.
Database vendors, cautious about preserving established SQL paradigms, emphasized stored procedures and embedded SQL as efficient ways to co-locate logic and data within databases. However, these approaches conflicted with modern development practices that advocate a more apparent separation between business logic and data persistence. Consequently, they did not adapt SQL and RDBMS to keep pace with advancements in Java and object-oriented analysis and programming. They opted to adhere to the two methods outlined in the SQL standard: in-database stored procedures and embedded SQL. They argued that the data logic must be deployed to the database. The impedance mismatch was not only between two technologies using different data types and structures. It also impacted the communication between database administrators, who considered only their traditional practices good practices, and application builders, who viewed the SQL database as a barrier to their development agility.
Efforts like SQLJ, developed in the late 1990s, aimed to seamlessly integrate embedded SQL with Java, like it did with COBOL and C in the past, to improve database interactions within Java applications. SQLJ offered several benefits, including compile-time verification, which helped catch errors early in the development cycle, and performance tuning, which sought to optimize database interactions. However, SQLJ pre-compilers faced challenges in gaining widespread adoption, as they relied heavily on older procedural programming patterns that often conflicted with Java’s more modern object-oriented approach and design patterns.
As databases advanced, they increasingly adopted features to handle complex data types, such as Object-Relational Database Management Systems (ORDBMS), XML, and JSON support. ORDBMS systems aimed to incorporate elements of object-oriented programming, such as inheritance, polymorphism, and encapsulation, into traditional relational databases. Oracle and PostgreSQL extended their SQL capabilities to support user-defined types, complex data structures, and custom functions. They allowed the schema to be denormalized by adding new datatypes for nested tables, collections, arrays, or documents.
Despite these advancements, ORDBMS systems struggled with scalability and complexity. Maintaining a database-centric approach often resulted in monolithic systems that were challenging to scale and upgrade without significant downtime. This complexity sometimes conflicted with the evolving requirements of application developers, who needed systems that could adapt to changing business needs. In another effort to reintegrate application code into the database, database vendors such as Oracle added a JVM to the database instance to execute stored procedures written in Java. This further intensified the monolithic nature of the RDBMS and made database upgrades more challenging.
Instead of evolving to better integrate with modern applications, SQL databases maintained a database-centric approach. They became more monolithic and challenging to scale and upgrade. They failed to meet the demands of application developers and became more cumbersome to operate. Software development has advanced with the capability to run CI/CD and scale horizontally. However, this didn’t apply to the database, which became the primary problem for application builders.
The ORM Revolution: Hibernate and JPA
The early 2000s marked a significant transition in how Java applications managed database interactions, driven by the emergence of Object-Relational Mapping (ORM) frameworks. At the forefront of this revolution was Hibernate, which significantly eased the complexities of integrating Java objects with database tables. By automating the translation of JDBC result sets into domain-specific objects, Hibernate reduced the tedious manual coding efforts associated with traditional database operations. It also addressed the long-standing impedance mismatch between object-oriented and relational paradigms.
Hibernate’s standout feature was its ability to automatically translate actions performed on Java objects into corresponding database operations through a Hibernate session that synchronizes the state of objects and table rows. This allowed developers to work directly with domain models, as Hibernate tracked object changes and generated SQL statements to persist updates as needed. Its design was flexible enough to accommodate different database capabilities, re-ordering the statements to avoid deadlocks, allowing configurations to adjust prepared statement use or optimize dynamic updates.
Building on the promise of efficient ORM, the Criteria API and Hibernate Query Language (HQL) provide robust query capabilities without the need to write SQL queries. The Criteria API enables developers to construct queries programmatically using Java criteria, offering type safety, flexibility, and reduced syntax errors. Meanwhile, HQL allows developers to write SQL-like queries operating over the Java object model rather than directly on database tables and views. This allows for different fetching strategies without defining another mapping model. Use cases that know in advance all tables and columns to read can do so in one call to the database, avoiding multiple round trips that hinder performance.
For the numerous use cases that do not require further optimization, Hibernate also introduced features like lazy loading, which deferred data loading until it was explicitly accessed. Although this technique could lead to the notorious N+1 problem by causing multiple database roundtrips, it provided developers with flexibility and simplicity in coding scenarios that lacked critical performance requirements.
In addition to Hibernate, alternatives like iBATIS (now MyBatis) were available. These options enabled developers to retain precise control over SQL while facilitating object mapping of result sets. This capability empowered developers to enhance query performance and utilize database-specific features, striking a significant balance. Nonetheless, Hibernate presented a further benefit: it reduced the growing number of SQL statements needed as additional use cases and scenarios evolved within the application lifecycle.
Later, jOOQ (Java Object-Oriented Querying) offered another approach to using the data model declared in the database from Java. It follows the SQL semantics by generating type-safe SQL queries directly from Java code. jOOQ bridges the gap between ORM and direct SQL, offering a fluent API that provides compile-time verification of SQL syntax and enhanced control over database interactions.
Other programming languages have implemented similar approaches, but solutions have often been driven by Java first. Meanwhile, Hibernate became so popular that a need for a standardized ORM framework emerged. The Java Persistence API (JPA) was introduced, providing a set of standard annotations and interfaces that allowed for ORM integration without a commitment to a specific implementation. While Hibernate quickly became a leading JPA implementation, others like EclipseLink and OpenJPA provided developers with diverse choices.
Data Modeling in ORM
With the Java Persistence API (JPA), the paradigm has fundamentally shifted from database-centric to application-first. Traditionally, the database-centric or bottom-up approach concentrated on defining the data model within the database catalog, where applications would query directly using SQL commands on catalog views or cursor results. This method positioned the database at the core of schema management and application interaction. It represented a continuation of the conventional data modeling approach for monolithic information systems: the data model (Entity-Relationship Diagram) was created first, static, modeling the business entities before analyzing the dynamic access patterns of the application use cases.
However, with the advent of JPA, the location of schema definition shifted to the application itself. Developers utilize Java objects annotated with JPA-specific metadata as the authoritative source for schema generation. This ensures that the Java classes define the database schema, enhancing synchronization between application logic and data structures. With this approach, validation can occur during the application code compilation and deployment of the database objects. There is no risk that a change in the database will break the application at runtime.
The database must still understand the data it stores to validate data integrity and optimize data access. However, the Data Definition Language (DDL) is generated from the application using a top-down, or JPA-first, approach. The previously used bottom-up, or database-first, approach is now rarely employed, typically only to merge with the JPA-generated DDL to create migration scripts. You can read more about it in an article by Andrey Belyaev: https://jpa-buddy.com/blog/db-first-vs-jpa-first/
Data modeling is essential when using ORM frameworks. Developers must carefully design their entity classes to accurately map the underlying database schema while taking advantage of ORM features. Implementing JPA annotations for a relational database involves establishing one-to-many, many-to-one, and many-to-many relationships and applying relational model concepts to linked or embedded objects. It also requires evaluating access patterns and deciding which data will be retrieved together through eager or lazy loading. There are two types of developers. One group uses ORM as an abstraction of the database and does not consider what is generated. This approach works for applications without critical performance requirements, as the generated queries may not be optimal. This allows rapid application development with teams that do not need to be experts in each component used in the stack. The other group logs the generated SQL queries and analyzes their execution plans to ensure performance scalability. Those in this latter group may wonder why they must do the work twice: design and query an object model while validating what is executed on a relational model. Once the data is fully described in a model with linked or embedded objects, why not use a database that can store and retrieve it using the same model?
The Rise of NoSQL and Java’s Adaptation
While ORM frameworks provide many benefits, they also have notable drawbacks. When avoiding complications from native SQL or employing one mapping per use case, mapping Java objects to relational tables can lead to significant performance issues, especially when managing object relationships (e.g., one-to-many or many-to-one) in large applications with complex object graphs. A common concern is the N+1 query problem, where retrieving related entities results in excess database queries, severely affecting application performance. To counter this inefficiency, developers often need to implement optimization strategies such as batch fetching, join fetching, and caching techniques. They must avoid that one transaction involves multiple interactions with the database, which incur network latency, context switches, and memory to keep the state of the transactions. They should adopt a document-centric approach when designing their code, ensuring they query related objects in one SQL or JPQL statement to fetch them together instead of navigating on a per-object or per-table basis and maintaining a transaction between them.
A database-centric approach also presents scalability, flexibility, and adaptability limitations, primarily due to the central catalog containing metadata about the application schema. Although SQL and relational databases allow for agile schema modifications using data definition language (DDL), and many support transactional or online DDL, this agility diminishes when synchronization with the application is required. Adapting to rapidly changing application needs can become challenging. Some applications had to adopt an Entity-Attribute-value model to avoid any schema on the database. However, this is disastrous for performance as the database has no information to optimize the access patterns. Furthermore, as web applications grow to accommodate millions of users, relational databases often struggle with horizontal scalability, necessitating complex sharding techniques to distribute data across multiple nodes. While the problem has been solved for data with distributed SQL databases, the metadata, holding the SQL catalog or dictionary, must still be shared and be a single point of truth.
To resolve these issues, NoSQL databases have emerged, offering alternative data storage models tailored for applications where the schema is application-defined. In contrast to SQL databases, they retain minimal information, such as index definitions and schema validations, while storing most application data in documents defined by the application. Some NoSQL databases, like key-value stores, only need to know the sharding key, delegating all data access optimizations to the application that explicitly defines and queries the indexes, which is opposite to the data independence principle of relational databases. Other document databases, such as MongoDB, retain more information for transparent index usage, supported by a query planner optimizer. The significant advantage over Object-Relational Mappers lies in the absence of mapping and transformation from objects to multiple tables and matching the transaction bounds to one document.
With an API closer to their objects, Java developers are not required to understand the relational version of the application schema nor examine execution plans, index definitions, or join method algorithms to grasp the complexities behind their query access patterns. A Java object embedding another object is stored as a single document, with subdocuments, all colocated physically and fetched together without requiring multiple random I/O operations.
Additionally, the database uses the same syntax and semantics as the application. For instance, handling null values in Java differs significantly from SQL, which applies tree-valued logic to nullable datatypes. In summary, Java developers can utilize the database similarly to an ORM but without the complications associated with mapping to a different storage paradigm regarding persistence.
As document databases became popular for adding a persistence layer for business objects defined in Java, Java adapted and proposed ORM for NoSQL databases, like Hibernate OGM. However, ORMs were made for relational databases and made little sense with a document database. The main reason was to unify the definition of persistence, like with JPA, but another framework became popular to simplify interactions with all databases.
Spring Data: Simplifying Data Access in Java
Spring Data, introduced in 2011, is an integral part of the Spring Framework that addresses data access complexities in Java applications. It provides a unified programming model for interacting with various data sources, including relational databases (via JPA, JDBC) and NoSQL databases such as MongoDB, Cassandra, and Redis. By abstracting the details of database operations, Spring Data allows developers to concentrate on business logic rather than the intricacies of transaction persistence mechanics.
The launch of Spring Data JPA revolutionized ORM persistence by automatically generating repository implementations. This automation significantly reduces boilerplate code and simplifies CRUD operations, thus improving the development speed and maintainability of simple use cases. Over time, Spring Data has expanded to support various NoSQL databases, permitting flexible data modeling across document, key-value, and graph database paradigms.
Key features like pagination, sorting, customizable query methods, and integration with QueryDSL enhance the framework’s ability to manage data retrieval and manipulation efficiently. It implements the best design patterns to avoid inefficient interactions with the database. The 2017 introduction of Spring Data Reactive marked a transition to non-blocking, event-driven architectures, addressing the requirements of modern microservices and cloud-native applications. By promoting scalable application development and aligning with Java’s evolution, Spring Data is crucial in bridging the gap between relational and NoSQL data storage and supports best practices in domain-driven design within contemporary application architectures.
Conclusion
Over the years, Java has dramatically improved the way applications work with databases, shifting from focusing mainly on databases to being more about the applications themselves. At first, Java programs relied heavily on SQL commands directly in their code. The arrival of ORM frameworks like Hibernate and JPA has changed how applications use databases. These frameworks have helped reduce the problems that arise from the differences between relational databases and object-oriented Java programs. They allow developers to work directly with the structure of objects related to their applications and their business logic. One of Hibernate’s key advantages is its ability to optimize database interactions by reducing the number of queries to write, especially for CRUD operations, through caching, lazy loading, and batching.
JDBC drivers have failed to provide a convenient approach, which gave rise to ORMs, but Java has also evolved for developers who prefer better control and write native SQL statements. They can use JDBC templates, introduced in Spring Framework 1.0 (2004), and Spring’s JdbcClient, introduced in Spring Framework 6.1 (2023), to map their results to Java objects. This approach fits between JDBC and ORM. They can also fetch the query result to Java records, standardized in Java 16 in 2021, instead of objects, similar to the rows from tables and views.
As the need for more complex data handling grew, especially regarding scalability and flexibility, NoSQL databases emerged to address the limitations of traditional database systems and their monolithic schema definition. Document-oriented databases, in particular, better matched application-focused structures for Java developers, making it easier to manage data without complicated transactions. Java’s strong ecosystem has continuously adapted to these changes. Frameworks like Spring Data provide a unified way to access data, allowing the smooth integration of SQL and NoSQL systems within one programming style. These changes have helped developers focus more on business rules with greater efficiency while maintaining the ability to change and grow their systems with modern technology.
Java’s important role in this change highlights its ability to adapt and stay in sync with the fast-changing tech world. It effectively links application logic and various types of data storage. As a result, Java has significantly influenced the industry’s move towards more application-centered ways of interacting with data, addressing the growing complexity and variety of modern software development demands. From a database perspective, the evolution of Java over the past 30 years parallels the transition from database-centric monolithic applications through client-server models and finally to more flexible software design and deployment where the data schema is integrated within the application services.