Database Management Systems: Concepts and Techniques

Query Processing:-1.Parsing and Translation:-Query parsing is the first step in query processing. In this step, a query is checked for syntax errors. Then it converts it into the parse tree. So, a parse tree represents the query in a format that is easy to understand for DBMS. A parse tree is used in other steps of query processing in DBMS.2.Optimization:-After doing query parsing, the DBMS starts finding the most efficient way to execute the given query. The optimization process follows some factors for the query. These factors are indexing, joins, and other optimization mechanisms. These help in determining the most efficient query execution plan. 3.Evaluation:-After finding the best execution plan, the DBMS starts the execution of the optimized query. And it gives the results from the database. In this step, DBMS can perform operations on the data. These operations are selecting the data, inserting something, updating the data, and so on.Query optimization:-Distributed query optimization refers to the process of producing a plan for the processing of a query to a distributed database system. The plan is called a query execution plan. In a distributed database system, schema and queries refer to logical units of data. In a relational distributed relation database system, for instance, logical units of data are relations. These units may be be fragmented at the underlying physical level. The fragments, which can be redundant and replicated, are allocated to different database servers in the distributed system.Accessing Databases:-1.Client Request:-A user makes a request for a webpage by typing a URL into their web browser or clicking a link.2.Server Processing:-The web server receives the request and forwards it to the appropriate server-side script (e.g., a PHP file).3.Database Interaction:-The server-side script connects to the database using credentials stored in its code or configuration files.4.Query Execution:-Once connected, the script can execute SQL queries to retrieve, insert, update, or delete data from the database as needed.5.Data Processing:-The retrieved data is processed by the server-side script.6.Response Generation:-The server-side script generates a dynamic webpage using the processed data.7.Response Delivery:-The generated webpage is sent back to the user’s web browser as an HTTP response.8.Client Rendering:-The web browser receives the response and renders the webpage for the user to interact with.Entity integrity constraints:-Entity integrity constraints are a type of database constraint that applies rules to data within a single table. They ensure the validity and uniqueness of data within that table.1.Primary Key Constraint:-A primary key constraint dictates that a specific column, or a combination of columns, within a table can have no duplicate values. This column (or columns) acts as a unique identifier for each row in the table.2.Unique Constraint:-A unique constraint is similar to a primary key constraint, but it allows for the possibility of having null values in the designated column(s).  This means that no two rows can have the same value in that column, except for nulls.3.NOT NULL Constraint:-A NOT NULL constraint enforces that a specific column cannot have null values. This is useful for ensuring that critical data points are always filled in for each row in the table.


Transaction:-1.The transaction is a set of logically related operation. It contains a group of tasks……Properties of Transactions:-1.Atomicity:− The transaction is completed entirely or not at all..2.Consistency:− It is a term that refers to the transition from one consistent state to another.3.Isolation:− It is carried out separately from othertransactions.4.Durability − If a committed transaction brings about a change, that change should be durable in the database and not lost in case of any failure.Nested Transaction:-A nested transaction is used to provide a transactional guarantee for a subset of operations performed within the scope of a larger transaction. A nested transaction is a database transaction that is started by an instruction within the scope of an already started transaction.Multilevel Transaction:-1.Parent initiates a single subtransaction at a time and waits for its completion.2.All leaf subtransactions in the tree are at the same level.3.Only leaf transactions access the database. Transaction Processing Monitors:-Transaction Processing Monitors is also usually known as TP-monitors which provides functionalities such as managing, deploying, and developing transactional distributed information systems. It controls programs that monitor or manage a transaction of data as it passes from one stage in a process to another in an organized transaction-oriented manner. A transaction monitor can be used in various system components such as communication systems, and operating systems for transaction-protected applications. It provides an operating system on top of the existing operating system that connects thousands of computers with a pool of shared server processes in real-time.Strong Consistency:-Strong consistency means that all data on the primary node, replicas, and all related nodes adhere to validation rules and are always in the same state.1.All nodes observe the same view of data at any given time.2.Updates on one node are immediately visible to all other nodes.3.Read operations return only up-to-date value.4.Usually higher latency and lower availability due to synchronous operations.Weak Consistency:-Weak consistency does not guarantee that data is always the same on primary, replica, or node.1.Different nodes may see different views of the data.2.Updates may not be seen by all nodes immediately.3.Reading operation can contain obsolete or stale data.4.Typically better performance, scalability, and availability. Compensating Transactions:-A compensating transaction is nothing but a list of database operations that are capable to undo the changes of an incomplete or inconsistent transaction. Compensating transactions are helpful to get back to the previous consistent state of the database. Let’s consider a simple example. A user deleted an important record from the database accidentally. So, the compensating transaction is nothing but the insertion of that record again in the database.Oracle XXI:-Oracle Database 21c, released in 2021, is an innovation release designed to provide cutting-edge features and functionalities for various database workloads.It focuses on improving performance, security, and manageability of your data. Some key highlights include enhanced JSON support for faster data processing, in-memory capabilities for real-time analytics, and improved machine learning integration for data-driven insights.However, unlike long-term releases, 21c has a shorter support window, so it’s best suited for organizations seeking to leverage the latest advancements in database technology.


Generalization:-Generalization is the process of extracting common properties from a set of entities and creating a generalized entity from it. It is a bottom-up approach in which two or more entities can be generalized to a higher-level entity if they have some attributes in common. For Example, STUDENT and FACULTY can be generalized to a higher-level entity called PERSON as shown in Figure 1. In this case, common attributes like P_NAME, and P_ADD become part of a higher entity (PERSON), and specialized attributes like S_FEE become part of a specialized entity (STUDENT). Specialization:-In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-down approach where the higher-level entity is specialized into two or more lower-level entities. For Example, an EMPLOYEE entity in an Employee management system can be specialized into DEVELOPER, TESTER, etc. as shown in Figure 2. In this case, common attributes like E_NAME, E_SAL, etc. become part of a higher entity (EMPLOYEE), and specialized attributes like TES_TYPE become part of a specialized entity (TESTER). Aggregation:-Aggregation in DBMS is the concept where the relation between 2 different entities is considered as a single entity. In Aggregation, a relationship with its adjacent entities is aggregated into a parent or high-level entity.For Example, An employee working on a project requires some machinery. So, REQUIRE relationship is needed between the relationship WORKS_FOR and entity MACHINERY. By using aggregation, WORKS_FOR relationship with entities EMPLOYEE and PROJECT are aggregated into a single entity. Relationship REQUIRE is created between the aggregated entity and MACHINERY.ODBMS:-The structure of an object refers to the properties that an object is made up of. These properties of an object are referred to as an attribute. Thus, an object is a real-world entity with certain attributes that makes up the object structure.1.Messages:-A message provides an interface or acts as a communication medium between an object and the outside world. 2.Methods:-When a message is passed then the body of code that is executed is known as a method. Whenever a method is executed, it returns a value as output.3.Variables:-It stores the data of an object. The data stored in the variables makes the object distinguishable from one another. 2.Object Classes:-An object which is a real-world entity is an instance of a class. Hence first we need to define a class and then the objects are made which differ in the values they store but share the same class definition. The objects in turn correspond to various messages and variables stored in them.DF Aggregation and Association:-Aggregation:-1.Aggregation describes a special type of an association which specifies a whole and part relationship.2.It in flexible in nature.3.Special kind of association where there is whole-part relation between two objects.4.Diamond shape structure is used next to the assembly class.5.It is represented by a “has a”+ “whole-part” relationshipAssociation:-1.Association is a relationship between two classes where one class use another.2.It is inflexible in nature.3.It means there is almost always a link between objects.4.Line segment is used between the components or the class.5.It is represented by a “has a” relationship


Multiversion Two-Phase Lock:-Multiversion Two-Phase Locking (MV2PL) is a concurrency control mechanism used in database management systems (DBMS) to improve concurrency and prevent data inconsistencies when multiple transactions are accessing the same data. It combines two techniques.1.Multiversion Concurrency Control (MVCC):-Maintains multiple versions of data items (rows) for each update.When a transaction reads data, it reads the version committed before the transaction started. This allows concurrent read operations to proceed without blocking each other.2.Two-Phase Locking (2PL):-Two-phase locking(2PL):-1.The two-phase locking protocol divides the execution phase of the transaction into three parts.Strict two-phase locking:-1.The transaction can release the shared lock after the lock point. 2.The transaction can not release any exclusive lock until the transaction is committed.Rigorous two-phase locking:-1.The transaction cannot release either of the locks, i.e., neither shared lock nor exclusive lock.2.Serializability is guaranteed in a Rigorous two-phase locking protocol.Savepoint:-1.Savepoint is a command in SQL that is used with the rollback command.2.It is a command in Transaction Control Language that is used to mark the transaction in a table.3.Consider you are making a very long table, and you want to roll back only to a certain position in a table then; this can be achieved using the savepoint.4.Savepoint is helpful when we want to roll back only a small part of a table and not the whole table. In simple words, we can say savepoint is a bookmark in SQL.Data partitioning:-Data partitioning, also known as database partitioning, is a technique that divides large datasets into smaller, more manageable pieces called partitions. These partitions can be stored, accessed, and managed separately, and are usually spread across multiple database tables.The goal of data partitioning is to improve the database’s performance, scalability, and availability. By dividing the data into smaller pieces, managing and processing large amounts of data becomes simpler. Cost Estimation:-In query processing, cost estimation is the process of estimating the cost of executing different strategies to optimize a query. The goal is to select the strategy with the lowest cost and most efficient execution. This method is called cost-based query optimization. The cost of a query is the time it takes to process and return results from the database. This includes the time it takes to parse and translate the query, optimize it, evaluate it, execute it, and return the results to the user. The cost of a strategy is an estimate based on how many CPU and I/O resources the query will use.Timestamp Ordering Protocol:- 1.The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order of transaction is nothing but the ascending order of the transaction creation.2.The priority of the older transaction is higher that’s why it executes first. To determine the timestamp of the transaction, this protocol uses system time or logical counter.3.The lock-based protocol is used to manage the order between conflicting pairs among transactions at the execution time. But Timestamp based protocols start working as soon as a transaction is created.


Data Model:-The Data Model gives us an idea of how the final system would look after it has been fully implemented. It specifies the data items as well as the relationships between them. In a database management system, data models are often used to show how data is connected, stored, accessed, and changed……. Types of Data Models….. 1.Hierarchical Model:-This concept uses a hierarchical tree structure to organise the data. The hierarchy begins at the root, which contains root data, and then grows into a tree as child nodes are added to the parent node. 2.Network Model:-The main difference between this model and the hierarchical model is that any record can have several parents in the network model. It uses a graph instead of a hierarchical tree.3.Entity-Relationship Model:-The real-world problem is depicted in visual form in this model to make it easier for stakeholders to comprehend. The ER diagram also makes it very simple for developers to comprehend the system.Object reference:-Object reference acts like a bridge between two tables. It creates a connection, typically using a foreign key, that links data in one table to another. This is useful for representing real-world relationships. Imagine an Orders table – a foreign key referencing the Customers table creates a link between specific orders and the customers who placed them.Saga:-A saga is a transaction management pattern that uses a sequence of local transactions to update a database and trigger the next local transaction. Each local transaction is an atomic work effort performed by a saga participant. When a local transaction is completed successfully, it publishes a message or event to trigger the next transaction.Pipelining:-Pipelining is a technique that can be used in distributed database management systems (DBMS) to improve query processing. It involves segmenting pipeline tasks into smaller, more manageable pieces, which can then run concurrently. This allows the database system to handle more queries, process them faster, and use resources more efficiently.Pipelining works by creating a pipeline of instructions that the processor can execute in parallel.Persistent Programming Language:-A persistent programming language is a programming language that supports the concept of persistence directly in its design. Persistence refers to the ability to retain data across different program executions or system reboots. In a DBMS, persistent programming languages are often used to interact with the database and perform operations such as querying, inserting, updating, and deleting data. These languages are designed to seamlessly integrate with the database system, allowing developers to work with persistent data structures and manage data storage efficiently.Content Based Retrieval(CBR):-Content-based retrieval (CBR) in DBMS specifically refers to retrieving multimedia data (images,audio, video) based on the actual content of the data itself, rather than relying on keywords or textual annotations…..Benefits… 1.efficient Search for Multimedia Data:Enables searching for multimedia content based on its actual characteristics, even without keywords or annotations. 2.Improved User Experience:-Allows users to search using familiar visual, audio, or video examples. 4.Scalability:Can handle large databases of multimedia data efficiently by relying on feature comparisons.


Replication:-Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency…type…. 1.Transactional Replication:-In Transactional replication users receive full initial copies of the database and then receive updates as data changes. Data is copied in real-time from the publisher to the receiving database(subscriber) in the same order as they occur with the publisher therefore in this type of replication, transactional consistency is guaranteed. 2.Snapshot Replication:-Snapshot replication distributes data exactly as it appears at a specific moment in time and the does not monitor for updates to the data. The entire snapshot is generated and sent to Users. Snapshot replication is generally used when data changes are infrequent.3.Merge Replication:-Data from two or more databases is combined into a single database. Merge replication is the most complex type of replication because it allows both publisher and subscriber to independently make changes to the database. Merge replication is typically used in server-to-client environments….Replication Schemes….. 1.Full Replication:- The most extreme case is replication of the whole database at every site in the distributed system.2.No replication:-No replication means, each fragment is stored exactly at one site. Fragmentation:-Fragmentation is a process of dividing the whole or full database into various subtables or sub relations so that data can be stored in different systems. The small pieces or sub relations or subtables are called fragments….Advantages….1.As the data is stored close to the usage site, the efficiency of the database system will increase.2.Local query optimization methods are sufficient for some queries as the data is available locally… disadvantage 1. Access speeds may be very high if data from different fragments are needed.2.If we are using recursive fragmentation, then it will be very expensive….We have three methods for data fragmenting of a table:1.Horizontal fragmentation:-Horizontal fragmentation refers to the process of dividing a table horizontally by assigning each row (or a group of rows) of relation to one or more fragments.2.Vertical Fragmentation:-Vertical fragmentation refers to the process of decomposing a table vertically by attributes or columns.3.Mixed Fragmentation:-The combination of vertical fragmentation of a table followed by further horizontal fragmentation of some fragments is called mixed or hybrid fragmentation.Web Server:-Web server is a program which processes the network requests of the users and serves them with files that create web pages. This exchange takes place using Hypertext Transfer Protocol (HTTP).Basically, web servers are computers used to store HTTP files which makes a website and when a client requests a certain website, it delivers the requested website to the client. For example, you want to open Facebook on your laptop and enter the URL in the search bar of google. Now, the laptop will send an HTTP request to view the facebook webpage to another computer known as the webserver.This computer (webserver) contains all the files (usually in HTTP format) which make up the website like text, images, gif files, etc. After processing the request, the webserver will send the requested website-related files to your computer and then you can reach the website.


R Tree:-R-tree is a tree data structure used for storing spatial data indexes in an efficient manner. R-trees are highly useful for spatial data queries and storage.1.Consists of a single root, internals nodes, and leaf nodes.2.The root contains the pointer to the largest region in the spatial domain.3.Parent nodes contains pointers to their child nodes where the region of child nodes completely overlaps the regions of parent nodes.4.Leaf nodes contains data about the MBR to the current objects.Quad Trees:-Quadtrees are trees used to efficiently store data of points on a two-dimensional space. In this tree, each node has at most four children. We can construct a quadtree from a two-dimensional area using the following steps:1.Divide the current two dimensional space into four boxes.2.If a box contains one or more points in it, create a child object, storing in it the two dimensional space of the box. 3.If a box does not contain any points, do not create a child for it.4.Recurse for each of the children.K-D Tree:-A K-D Tree(also called as K-Dimensional Tree) is a binary search tree where data in each node is a K-Dimensional point in space. In short, it is a space partitioning(details below) data structure for organizing points in a K-Dimensional space. A non-leaf node in K-D tree divides the space into two parts, called as half-spaces. Points to the left of this space are represented by the left subtree of that node and points to the right of the space are represented by the right subtree.DF KD Trees R trees Quad Trees:-1.Data Storage:-1.Data Structure:-1.kd-tree:-Binary tree, splitting on alternating dimensions (e.g., x then y). Handles any number of dimensions (k-dimensional).R-tree:-Balanced tree, stores data in minimum bounding rectangles (MBRs) encompassing points. Handles any number of dimensions.Quadtree:-Designed for 2D data, recursively divides space into quadrants (four equal squares).2.Data Storage:-kd-tree:-Stores data points at all nodes in the tree.R-tree:-Stores data points only at leaf nodes, MBRs at all nodes.Quadtree:-Stores data points only at leaf nodes. 3.Weaknesses:-kd-tree:-Slow for updates/deletions (requires rebuilding). Performance degrades in high dimensions.R-tree:-More complex to implement compared to kd-trees. May require more storage due to MBRs.Quadtree:-Limited to two dimensions. May require tuning for optimal performance.Primary Index:-The indexing or the index table created using Primary keys is known as Primary Indexing. It is defined on ordered data. As the index is comprised of primary keys, they are unique, not null, and possess one to one relationship with the data blocks.Secondary Indexing:-It is a two-level indexing technique used to reduce the mapping size of the primary index. The secondary index points to a certain location where the data is to be found but the actual data is not sorted like in the primary indexing. Secondary Indexing is also known as non-clustered Indexing.Commit Protocol:-The concept of the Commit Protocol was developed in the context of database systems. Commit protocols are defined as algorithms that are used in distributed systems to ensure that the transaction is completed entirely or not. It helps us to maintain data integrity, Atomicity, and consistency of the data. It helps us to create robust, efficient and reliable systems. 1. One-Phase Commit:-It is the simplest commit protocol. In this commit protocol, there is a controlling site, and there are a variety of slave sites where the transaction is performed.2.Two-Phase Commit:-It is the second type of commit protocol in DBMS. It was introduced to reduce the vulnerabilities of the one phase commit protocol. There are two phases in the two-phase commit protocol.3.Three Phase Commit Protocol:-It is the second type of commit protocol in DBMS. It was introduced to address the issue of blocking.