Having prided myself on my full stack skills, I was tested of my bonafides with a rather interesting and critical problem at work, which involved me to understand Oracle SQL database writes, GraphQL mutations, Java collections, run time exception handling, and RxJS - pretty much the entire full stack.
System architecture -
My system writes updates into an Oracle SQL DB, via GraphQL.This GraphQL mutation logic is called from a Java service, driven by Command and Builder patterns, and if it fails, should be handled appropriately through rollback mechanisms.
Mutations(write operations) are batched together to ensure readability and consistency.
An Angular MVC takes care of the UI end.
Context :
My system works as a group of parties. There's a root party A, and subordinate parties A1, A2 and so on. Changes to A, propagate to all its subordinates. --> Java principle of hybrid inheritance.
The problem -
The user updated a single subordinate party (A4), and all the rest of the parties started messing up. In fact, my framework was built to handle corruption oso well that data corruption with one party won't affect the rest. However, bang's the problem
The UX -
The user saw a big exception thrown to the screen in the form of an popup, and the user was transfixed, as he hadn't even touched anything except to load the application. How can something break, when you haven't even started working at it.
The analysis -
Common sense would've had me check the origination of the request and see what went missing. However, software engineers often don't go by common sense. So, I ended up checking logs for the approval of the last request (A4).
Why do that? Since ALL requests on other parties were corrupted, I figured there'd a common mess up between them. This mess up can not happen at an initiation - which is an operation on each party, but on updation - where common attributes might get changed.
Next up - what could've changed? And how did that happen?
The code intricacy -
We implemented the command, builder and factory patterns in our Spring Boot - GraphQL codebase, to follow a highly modularized and extensible approach. We first build a structure, run what we 'premutation commands' followed by the actual update of the party (the actual mutation), and finally 'post mutation commands'
Codewise,
PartyUpdateBuilder = party.addPartyId()
.addPartyName()
.addPartyStatus()
.build();
Followed by
PartyUpdateCommandBuilder.execute();
which actually writes the data into the tables.
Mutation is GraphQL version of an update query - it updates the database. It looks something like this
mutation ($params: Params!, $partyId: PartyId!) {
updateData(params: $params, partyId: $partyId) {
status
log
created
}
}
{
variables: {
params: {
"partyId": 1234,
"partyName": "ABC"
}
}
}
Implications of the structure
The framework we created ensured that even if multiple tables were being updated via our mutation, we remained fully ACID compliant by batching the entire update in a single mutation that was run centrally
Services written by different developers only need to call this central source, and all fields would be updated.
While the mutation was running, there is an inbuilt system of preemptive locking, to avoid stale data being overwritten in the microseconds it takes to update the data.
Addtionally, the entire operation is maintained as a business change log in a central database, for tracking.
Our update strategy was inspired from this
Now, in such an apparent 'fail safe' framework, there was corruption happening en masse. The question was how?
Stay tuned for part 2