Skip to main content

Principles Of Distributed Database Systems Exercise | Solutions

You must be able to rebuild the original relation using a relational operator (typically a UNION for horizontal fragments).

Fragmentation is the process of breaking a large database into smaller fragments, each stored at a different site.

You are given a global relation (e.g., EMPLOYEE(EmpID, Name, DeptID, Salary, ManagerID) ) and a set of applications/queries. Your task is to propose horizontal, vertical, or hybrid fragments.

"Can a system be CA (Consistent and Available) during a network partition?"

R = 10,000 tuples, S = 50,000 tuples. Hash function partitions data into 10 buckets. Each site sends its bucket to a single join site. Network cost = 1 per tuple. Local join cost negligible. Question: Compute total network cost. You must be able to rebuild the original

and send to Site 1: We transmit only the unique values of the join attribute from Site 2 to Site 1.

Querying a distributed system is expensive because of "communication costs." Exercises often ask you to calculate the cost of a Join operation across two different sites. Key Concept: Semijoins

In conclusion, distributed database systems are designed to store and manage data across multiple sites or nodes. The principles of distributed database systems include fragmentation, replication, distribution, autonomy, and transparency. By understanding these principles and how they are applied, we can design and implement effective distributed database systems that provide a unified view of the data, while ensuring that the data is consistent, reliable, and easily accessible.

Yes, atomicity is guaranteed.

: Developers and students often post personal notes and summaries of textbook exercises. For example, tech-notes

Standard problems ask you to simulate the Wait-Die (non-preemptive) or Wound-Wait (preemptive) schemes using transaction timestamps to prevent deadlocks entirely. 4. Distributed Commit Protocols

Older transactions are allowed to wait. Younger transactions are immediately aborted ("die") if they conflict with an older transaction.

Deadlocks in distributed environments can be managed via centralized, distributed, or hierarchical detection methods. Path-Pushing Algorithm Your task is to propose horizontal, vertical, or

Coordinator Participants | | |--- (Phase 1: Prepare) ------------>| | | (Vote Commit / Abort) |<-- (Phase 1: Vote Response) -------| | | |--- (Phase 2: Global Commit/Abort)->| | | (Log & Acknowledge) |<-- (Phase 2: ACK) -----------------| v v

Exercise Write-up: Principles of Distributed Database Systems

Availability (Replicated Database)=1−(1−p)nAvailability (Replicated Database) equals 1 minus open paren 1 minus p close paren to the n-th power (where is single-site availability and is number of replicas)

PHF partitions a relation based on predicates defined on that relation. Given a relation and a set of simple predicates: Task: Determine the set of minterm predicates ( ) and check for completeness and correctness. Solution Methodology: Generate Minterms: Minterms are conjunctions ( ∧logical and ) of every simple predicate or its negation. Each site sends its bucket to a single join site

Coordinator C, Participants P1, P2. All vote YES. Coordinator sends COMMIT, fails after writing COMMIT log but before sending to P2. P1 receives COMMIT, P2 still in READY state.