Tuesday, January 31, 2023

GraphQL: what REST should have been?

Since the dawn of distributed computing, we've gone through many different mechanisms and standards for programmatically invoking remotely hosted functionality.  CORBA, SOAP based Web Services and REST have all played an important role in this evolution. The more recent entrant into this mix is GraphQL, a technology I've had the chance to play around with lately.

A pattern I've noticed with the evolution of the IT industry involves "throwing the baby with the bath water". While SOAP based web services was feature rich and promoted a well defined and formal API pattern, it was considered too heavy, with XML based schema definitions and payloads as well as complex WS standards which needed to be supported by vendors and developer tooling. REST was meant to be lightweight, more efficient and flexible. REST reduced how we "see" a system into resources and four verbs: GET, POST, PUT and DELETE, an artifact of the decision to implement REST over HTTP.  Instead of fixing just the challenges, REST introduced an entirely new paradigm that was very closely coupled to HTTP. While REST solved some of the complexity challenges with SOAP by focussing on HTTP and JSON, we also lost a few important features: the ability to have a self-explanatory and self-documenting schema and also the flexibility to think in terms of well defined "types" and "operations", which is undoubtably a more natural way to think about a remote system's capabilities.  Some subsequent standards/solutions such as Swagger and OpenAPI mitigated some of these issues but not all. 

To me, REST was an example of the "impedance mismatch" that can occur when we try to map a domain (in this example the "definition of a service or capability")  from its most natural form into an adjacent paradigm that does not support all the same semantics. Another popular example of such an impedance mismatch is the mapping between object and relational paradigms, commonly referred to as the object-relational impedance mismatch. An inevitable consequence of any impedance mismatch is the additional complexity it introduces that has nothing to do with the end outcome but a consequence of the impedance mismatch itself. This fact should be self evident whether you are trying to map object inheritance to relational tables using one of many different strategies each with their own trade-offs or trying to force-fit a set of natural system capabilities or a service definition into resources and HTTP verbs

When I started looking at GraphQL, I immediately fell in love with it as it felt like what REST should have been in the first place.  Following are the key advantages that struck me as game-changing:

1. Machine and Human Readable Schema

GraphQL provides a way of defining a schema in terms of Types, Queries, Mutations and Subscriptions in a manner that is both human and machine readable. This to me is a very natural representation of a service definition, similar in concept to a WSDL schema in traditional Web Services but much more light weight and simple. There is no impedance mismatch here. While WSDL definition was machine readable, GraphQL was an improvement as it is both machine and human readable and VERY self explanatory for anyone with some programing background. The GraphQL types should not be mistaken for database entities. These types are domain entities that define part of the common "domain language" understanding between the API provider and its clients, or in a single system, between the front-end engineers and the backend.

Following example:

#------------begin GraphQL definition----------------

# Domain Types

type Book {

  # Unique ID for the book

  id: ID!

  title: String!

  isbn: String

  # Author of the book  - notice that this is a reference to the Author type
  

  author: Author

}

 

type Author {

  id: ID!

  name: String! 

  # List of books written by the author 

  books: [Book]

}

 

 

#Queries, Mutations and Subscriptions - for some reason they are considered special types

 

type Query {

  # Retrieves all books

  books: [Book]

 

 # Retrieves an author by ID

  author(id: ID!): Author

}

 

type Mutation {

  # Adds a new author

  addAuthor(name: String!): Author

 

  # Adds a new book

  addBook(title: String!, authorId: ID!): Book

}

 

type Subscription {

  # Notifies when a new book is added - clients can subscribe to be notified of these events

  bookAdded: Book

}


#--------------end GraphQL definition---------------

You might notice that this is simply a service contract and has nothing to do with implementation details. If you find yourself donning your JPA or ORM hat and asking how the books field of the Author type and the author field of the Book type are linked without a reverse reference, you've lost the plot! 

2. The API user gets to specify what sub-graph it needs

While the API shown above can define the logical return (or response) type, graphQL allows (actually requires) the API client to specify exactly which fields of that type should be returned, including related entities and their fields, with no limit on the depth it can refer to. Following is an example of a query sent by the client:

query {

  books {

    id

    title

    isbn

    author {

      id

      name

    }

  }

}


The above query invokes the books query and specifies exactly which fields from its own as well as from its related entities must be returned. You can imagine how powerful this is. We derive many significant benefits from this approach: 
  • One of the challenges with both Web Services and REST is that the response from an API call is fixed. So for Mobile vs Desktop vs Third Party Integrators, you may end up defining different APIs since the payloads would be different. This reality is captured in the Backend For Frontend (BFF) architecture pattern. With GraphQL, the client can specify the exact subset of the response that it needs, so you don't need multiple APIs for different clients. This is a significant simplification both in the definition and implementation of APIs.
  • A second challenge of REST APIs is the chattiness and this is especially true for REST services exposed to third parties (which may be one of the BFF's mentioned above). In order to define the APIs to be reusable across many use-cases, designers have no choice but to resort to a high level of granularity.  So if you were an outside integrator connecting to the above service, you might end up sending multiple REST calls to the books resource and the author resource, resulting in a high level of chattiness. Can you overcome this? Yes, by including authors in your books response, but this may involve accessing and returning unnecessary amount of data for those usecases which don't need them. Can you avoid this again? Yes by having two different resource definitions for books with authors and books without authors. You can see how the so called "solutions" quickly become a series of complex decisions and trade-offs introducing unnecessary amount of complexity to the design and implementation of the APIs! 
  • Another obvious benefit of this is the resilience of the API to change but I've captured this in the separate versioning section below
The bottom line is that GraphQL avoids over-fetching or under-fetching of data, and is much more efficient in terms of network chattiness and payload. This results in higher performance and bigger bang-for-buck with the same infrastructure as well as simplicity of design and implementation.

3. A built in playground/UI to explore and invoke the API

While the GraphQL schema is human readable, graphQL includes a UI called GraphiQL that can be used in the development environment to explore and invoke the API. This can be a powerful tool to improve collaboration and reduce the back and forth conversations between the front-end and backend-engineers as well as across different teams accessing the API. Even before designing and implementing their user interfaces, front-end engineers can explore and test the backend APIs they need using this interface: A very powerful collaboration mechanism for the team.



4. Provides for federation - the benefits of Microservices with the convenience of a Monolith!

All of us have a good understanding of the advantages of the Microservices paradigm compared to a Monolithic service so I'm not going to get into that.   One of the downsides of Microservices compared to a Monolith is the complexity of orchestrating across microservices and navigating relationships that span multiple microservices. Say you have a products microservice and inventory microservice. The inventory microservice needs to return a response that includes product details. There are multiple ways of achieving this but with different trade-offs and no seemingly ideal solution. GraphQL enables a smart solution to this problem in the form of GraphQL federation. This allows GraphQL schemas to externalize some of the relationships without worrying about which GraphQL microservice will be the provider for that reference. A proxy is responsible for combining the schemas from these disparate microservices and exposing a "super-graph" to clients, an extremely elegant solution to the problem! Finally the elusive "best of both worlds" type of solution where you get all the benefits of separately deployable microservices with monolith like simplicity for the clients. 

5. No versioning required! GraphQL APIs are much more resilient to change

In GraphQL, the client specifies exactly what fields of the response entity and related entities it needs. When the schema evolves with additional fields, API provider does not need to introduce new versions and the client results does not need to modify its code, even though the underlying types could have changed form.  So GraphQL eliminates the need for versioning and facilitates API evolution without the massive overhead associated with REST or even SOAP web services. It also supports deprecation in case fields need to be dropped, so there can be a smooth transition of clients to a new version of the schema without having to maintain multiple versions.

I would be remiss if I didn't mention some of the drawbacks of GraphQL:

  • GraphQL has not reached the level of popularity of REST especially for external/third-party integrations so many organizations end up having to expose their GraphQL APIs as REST endpoints to their external clients.
  • REST allows security to be configured through URI filtering which is very simple to understand and implement even through proxies.  In GraphQL, since the client can specify which fields it needs  security needs to be handled both at the operation (query/mutation/subscription) level as well as the schema level. This needs to be facilitated in the implementation and cannot be done through a proxy. However, with the right effort to create a framework to facilitate this, it can be as simple as adding annotations to the operations and types/fields.
  • Caching in REST is simple as it can be simply a matter of caching the results against the URL + parameters. GraphQL caching however is complex as different API calls can retrieve different parts of the same sub-graph as the client has the flexibility to request only what it needs. Client frameworks (eg. Apollo client and Relay framework) has solved this problem by maintaining the cache in a relational manner. 

Some important frameworks and references:

Apollo GraphQL server (NodeJS) 

Apollo Federation

DGS framework for Springboot  (supports federation)


 

Friday, July 10, 2015

Using mini-specs to drive better quality in Agile projects

Agile projects are often prone to issues related to requirements and design clarity, especially when distributed teams are involved. One of the biggest quality levers in my opinion is "design" - both functional and technical. Putting thought upfront on the requirement and the system visualisation of the functionality and all its dependencies is one of the best ways of improving quality. I've found the concept of a "mini-spec" the right balance between zero documentation and over-documentation both of which can kill an agile project, especially for projects that already have a base architecture defined and the work is mostly in implementing user stories. User stories are broken down into chunks that can fit within a sprint and defined with sufficient details that allow upfront thought and debate as well as clarity required for implementation and testing.  A mini-spec captures the below points explicitly. A simple  mini-spec review with the technical and domain SMEs (incl. QA) will provide the opportunity to capture many potential issues upfront. Mini-specs have the benefit of forcing developers to think through many areas that will help make their effort lead to ultimate success. Completed mini-specs are used to enable both developers and QA engineers with implementation, test design and testing, and facilitates an efficient development workflow at a user-story level.

<<begin mini-spec>>

Need - Why is the feature required. What problem does it solve. Business objective(s) it is trying to achieve.

Feature Overview - A couple of lines explaining the feature.

Validation/Key success criteria - How do we know the feature will meet the need mentioned when rolled into production? If possible some measurable criteria that can prove the feature meets its intended need once it is implemented and put into production. This derives from lean-startup principles and forces the developer to think through the end objective.

Operational Requirements, if any
These are requirements that may not be obvious when thinking about the end user functionality but are needed to enable operational folks to ensure smooth running in production. Eg. this may involve configuration options, admin reports, alerts and monitoring requirements etc. Having this placeholder helps developers put themselves in the shoes of the operational folks upfront rather than when it's too late.

Approach and Design
- Functional components and work involved. Eg. what happens in the UI (add a new screen, modify an existing screen...), business layer (add these new APIs...), data layer (new data tables)...
- Where this user story treads into new cross cutting concerns that are not part of the existing base architecture, these designs need to be elaborated in full. Even better if skeleton code/architecture POCs are done for these elements. (eg. first time long polling is used, first time asynchronous jobs are required in the application etc. etc. - first time anything architecturally significant is needed to implement the user story - ideally such work should be identified upfront and allocated to a seasoned developer/lead)

UI/UX (if applicable)
- UI/UX. Level of detail will depend on the state of the team.  If everyone understands the base UI standards and expectations, keep this simple and descriptive. Otherwise, this can even be wireframes or mockups.

Dependencies - Components, requirements etc. affected by this feature. Especially upstream and downstream systems. Requirements/features pushed out to future sprints that will depend on this feature.

Assumptions - Assumptions we're making that are critical for success in terms of environmental dependencies, sequence of work order etc.

Positive test cases - What should happen when the expected conditions/input provided

Negative test cases - What should happen when unexpected conditions/input provided

<<end mini-spec>>

In the design factory model, user stories are elaborated into mini-specs by an independent team whose only job is to interact with clients and elaborate these features and also be proxy between clients and the development team. This creates a backlog of "implementable" user stories that can be included into implementation sprints. The advantage of this approach is that it eliminates a signifiant amount of wasted time by developers waiting for feedback. 

Post implementation reviews of user stories can also be aligned to the mini-specs, where the developer showcases the user story and shows the functionality as well as the unit tests working. 

Developers should also be involved in evaluating the success of the implemented feature in production in terms of the validation criteria. This increases their sensitivity towards the business outcomes of their efforts and helps build a lean-startup mentality across the entire team.

Tuesday, March 18, 2014

The Best Designs are the Simplest Designs but Carries the Most Thought Behind Them

One of my favorite quotes is from Steve Jobs: "When you first start off trying to solve a problem, the first solutions you come up with are very complex, and most people stop there. But if you keep going, and live with the problem and peel more layers of the onion off, you can often times arrive at some very elegant and simple solutions".

I've experienced this over and over again in my Software Engineering Career. Most people stop at the first incarnation of the solution. Key to getting to the optimal solution is to do rapid iterations of the visualization of the solution and to really listen to feedback from critics as well as that little voice in your head that says something is not quite in order. Your critics can be your biggest strength in getting to the ideal solution - because the right solution will balance different perspectives while providing simplicity. This means you need to have an open mind that can absorb the best parts of maximum possible solution options. But an open mind does not mean continuous wavering or indecision on the approach. How do you move from the listening or feedback mode to honing in on and clamping down on the right solution? You do that by boiling down the problem solving process to key principles and conscious trade-offs. This is one of the most important aspects of the architecture practice.

Tuesday, July 19, 2011

More on Code Quality

Is Code Quality, like beauty, in the eyes of the beholder? Perhaps not as much, but there is a significant amount of subjectiveness in any assessment of code quality. Technical leaders need to be aware of the many layers of code quality:



  1. Coding conventions and standards

  2. Adherance to the best practices of the programming language and paradigm

  3. Optimal algorithms and design

  4. Architectural and cross-cutting concerns

Static analysis provides a convenient way to improve layers (1) and (2) in a scalable manner. These tools are meant to reduce the noise and help technical leaders focus on the more important layers (3) and (4), which invariably require a deep-dive/manual review. I've covered the dangers of over-reliance on these tools in my previous blog post.

So what should you do if the static analysis tools come out with a bad report on your code?The most common mistake in this scenario is to focus the team on immediately fixing the layer (1) and (2) issues. It's easy to fall in to the trap of focussing on just those areas, that are immediately visible to the managers. Instead, this should be read as a warning sign of much worse things in layers (3) and (4), and the first order of activity should be to do a deep dive to figure out the more critical issues in layers (3) and (4). Fixing should prioritize on the most impactful areas across the four layers rather than just those immeidately visible to the "managers". For example, it's far more important to fix connection leaks and exception and logging issues than it is to fix a coding style issue. This is not to say the style issue should not be fixed - that should be a given. Setting a design right after the fact, however is a complicated and risky proposition, and that is where the technical leader's good judgement comes into play. I've seen technical leaders either shy away from this difficult decision, or become a cowboy and plunge right in, without the safeguards in place. Unit testing is your best friend when engaging in such a refactoring job.


Ofcouse, as always, an ounce of prevention is far better than a pound of cure. A deep dive into the first 100-500 lines of code written by the team and regular reviews thereafter is the best safeguard against code quality going astray later on. The paradox of code quality is that although the architecture and design is usually done by senior technical leaders in the project, the entire team including the junior-most engineer need to understand this in order for the code to come out good. Strong leadership and communication traits are essential to drive this common vision and understanding among the team and prevent the disaster caused by poor code quality.

Thursday, March 25, 2010

Code Quality bridge collapse

Henry Petroski popularized an interesting theory that there is a spectacular bridge collapse every 30 years. There are some very convincing datapoints behind this observation: The Tay Bridge in 1879, the Quebec bridge in 1907, the Tacoma Narrows bridge in 1940 etc. The reasoning is that engineers tend to get over confident over time and start stretching the boundaries of a new technology and start making unreasonable trade-offs between cost and safety until it gives way at some point. In the case of civil engineering, the results of the compromises made come back to bite much later.

The general principle seems to apply to Software Engineering as well, only the collapse will come much earlier than 30 years! Take the following classic example from a simple process improvement initiative:

Take a team with a partially effective manual code review process that is starting to leverage automated tools for static analysis. Initial results will be very positive. The automated process/tool will improve coverage and will be used as a mechanism to optimize the manual reviews. Rather than selecting code randomly, manual reviews will be done on the code blocks that show the most amount of complexity, duplications, best practice violations etc. The result will be more effective code reviews leading to improved code quality.

Slowly but surely over-confidence will set in as there are no observed code quality failures over a period of time. Manual code reviews themselves will start getting compromised as long as the automated metrics are within "reasonable" range - effectively compromising quality to reduce the cost. The ingredients are thus in place for a code quality bridge collapse.

The worst case scenario is that you blame such failures on the automated process or tool and go back to square one. The ideal solution is to make sure you combine the best of automated and manual processes to get the right balance of quality and cost.

Yet another important lesson for Software Engineers from the classic civil engineering world.

Tuesday, February 9, 2010

Losing the plot on SOA?

Many product development engagements seem to be losing the plot on SOA - concepts of Service Oriented Architecture abused to create systems that are unnecessarily complex and cumbersome to maintain, not to mention perform badly. The major issue is about service granularity and the common mistake of equating "logical services" in an SOA context with "web services".

Important to be first clear about what I mean by a "logical service". A "logical service" is a "set of functionality" that needs to be managed or may need to be provisioned autonomously. By this definition, a "logical service" will most often equate to a "system", since a system typically is an autonomous unit with a cohesive development/release cycle, persistence store etc. But the important exception to this is in relation to expected integration points. If there is a set of functionality in your system which you feel needs to be optionally supported through some third party integration - that is a good separation point, where a single system should be separated into multiple logical services even though they currently (in the default state) access the same system. All linkages between logical services should ONLY be via the service interfaces and any linkages and joins (also called service orchestration logic) between these logical systems should reside above the logical services. Whether or not you need a full-fledged BPEL orchestration layer for having this orchestration logic can be based on the complexity of the linkages between your logical services and the flexibility required to change that orchestration logic. If you’re designing a new system and want to leverage SOA principles to improve the extensibility and integratability of your application, then this may even not be necessary. If you’re trying to orchestrate end-to-end business processes across diverse logical services, or you're designing a massive ERP with the requirement to sell different pieces separately (with support for integration with other systems), then this makes a lot of sense.

So what's the relationship between a “logical service” and a “web service”? A logical Service in an SOA context could be organized as a collection of web services to make it easier to manage and access. This does not mean that any dependencies between the functionality of those web services should be done only by making web-service calls! What a total waste that would be?? As long as these web services form a single "logical service", their implementations should be free to use a common business logic layer as well as even a common persistence store, making it possible to write web services that are performant that use the full power of good old SQL etc. This will also ensure that you expose as web services only those services that are crossing the logical service boundary - making the "logical service" itself course grained and light weight from a contract perspective. When trying to carve out logical services, think at the level of current or expected system integrations.

Here are some of the indicators that you have lost the plot on SOA:

4) A lot of your business logic is starting to appear in the orchestration layer or your orchestration layer seems to be getting bulkier.
3) You're confused as to why you need a rules engine or a workflow engine when you have BPEL. 2) You are asking yourself why "WS-Transactions" is not better supported in major SOA frameworks or writing a lot of compensating transactions to calls that are eventually going into the same system.
1) You are pulling your hair out trying to essentially do what looks like "joins" in your services/orchestration layer, between services that are underneath going to the same database!

GraphQL: what REST should have been?

Since the dawn of distributed computing, we've gone through many different mechanisms and standards for programmatically invoking remote...