GraphQL at Shopify

GraphQL at Shopify


– Alright, good morning everybody. That was a really inspiring keynote. I’m also really excited
about the GraphQL Summit. We’ve seen so many talks
about why GraphQL is great, why people are using it, but we haven’t seen that many
talks on how people use it, and the problems they’ve had, and the actual solutions they found, so I’m really excited, I already talked to a few people here, and we shared problems we
encountered in our solutions, so. This talk is gonna be like, two parts. The first part, I’m gonna talk how we build the tools and solutions to some problems we had on
the server side for GraphQL. And Dylan’s gonna talk
about, on the mobile side, kind of the same thing, how we built smarter clients for mobile. So name’s Marc-Andre, I’m from the cold Montreal, Canada, and I work obviously, at Shopify. So Shopify’s an ecommerce company, we basically make it easy for anybody to sell
things on the internet, and like many companies, we’ve
had and still have REST APIs. And we had the same
pain points that maybe, a lot of you here too, so. Our clients were making
way too many round trips, to get the data they needed. They also, I’m sorry. Way too many round trips,
it was hard to customize what the other (mumbles) are getting. The way we did field filtering,
was a little awkward, so we wanted to explore something else. And a few months ago, we
played with React Native a little bit at Shopify, and although it wasn’t
perfect for our use case, we kind of discovered GraphQL through it, and fast forward to today, we’ve just released our mobile app, that’s powered entirely
by our GraphQL API. So me and Dylan, both work
on the GraphQL core team, and our goal is really to
make it easy for any team to build their own schema, or
extend their existing schemas. So, we’re getting pretty large, we have a lot of developers
working on the platform. We have many teams that might
wanna create a new schema, or extend existing schemas. And we also wanna make it
easy for mobile developers, to maybe extend the schemas too. So the first thing we tackle is, how will we define our schemas and execute GraphQL queries,
for the Shopify API. And first thing we did is, turn
to the open source community and found a great GraphQL gem,
that maybe some of you use. And we especially found a
great maintainer, Robert, so shout out to him, he’s been really helpful,
really responsive. So with that gem, it lets
you like many other (mumbles) define your types, define
your fields arguments, with a nice little DSL you can see here. But keeping in mind our
goal was to make it easy for really anybody to extend it, and we have a gigantic Rails app, a monolithic Rails app, and our developers, are used to, are really used to a
certain way of coding, and we thought, maybe
this was a little too far from what we were used too. So, what we did was
actually build a kind of two different abstractions
on top of that gem, to help us make it even easier. So we have the GraphQL
gem that we use to execute these queries, and define
the internal GraphQL types. But we also have a, what
we call GraphModel on top, which helps us define these types, with another syntax. And GraphQL, which is a layer, that really contains more of
the Shopify specifics doc. So this what our GraphModel
layer looks like. It’s quite similar, it’s class based. Types are defined using
simple symbols instead, and we have explicit
keyword arches for nulls, so it’s pretty straight forward too. And, we’re a Rail shop, people are used to ActiveRecord. You might have noticed the syntax before, kind of looked like ActiveRecord too. And we wanted to kinda push it further, and really make it easy for anybody. So we have like helper methods, like, belongs_to, which in the end
just defines a normal field, but doesn’t, you don’t
have to have knowledge, of how to define that,
(mumbles) GraphQL syntax. We have so many other
helpers, like cache_has_many, who’s gonna take care of,
fetching that from them cache for example. Or a paginated helper,
that is gonna help define a Relay connection for you. So it’s been great for helping people, just really extend our schemas, but naively building a schema using these, still result in some problems, sometimes. So let’s take a look at
this query for example. I’m gonna go through the
execution of that query, and just take a look at
what kind of calls are made, to fulfil that query. So for example here, we
might have shop here, that’s gonna select shop for our MySQL. We’re then gonna fetch products, which is also pretty simple, as we’re just fetching products, where shop_id is 1, right? But if you go down, to
the image resolver here, turns out it’s gonna
be called three times, and it’s gonna do three queries in MySQL. And that’s a little awkward, it’s not necessarily what we want. And the reason why it’s doing that, is, each resolver is actually called in different context, so maybe, if it was in the
context of one controller, obviously we would of maybe
like preloaded that data, so we don’t only do one call, but our resolvers are actually
called in different contexts, so each resolver doesn’t know, how many other images are gonna be loaded. So we could’ve went, with a solution, maybe analyzing the query, and trying to guess what
causes it to be made, so we can preload before. But it’s really complex, and maybe we didn’t need
something like that. So we went with the usual
solution of batching, and we’ve actually released the gem, called graphql-batch, which Dylan’s the main maintainer, and it’s really great, and its principles are
actually quite simple. And you might be familiar
with the principal, if you use the JS server too, this famous DataLoader, that a lot of people use, library. It’s pretty much the same principle, so the principle is to, instead of returning the value right away, resolver can actually return a Promise, for that value. And a loader for example, is gonna take the record class, maybe it’s gonna be product here, and an association, which was the image we wanted on. And, when we load for a particular record, we don’t actually fetch
the image right away, we just indicate, we’re interested in that
image at some point. So, the same kind of images before, instead of returning the
values in each resolver, we’re actually just telling our loader that we’re interested in each of these three products images. And inside an actual loader, it’s pretty simple to, we have the load method, that we just talked about. Caches, remembers and returns the Promise. And we have a perform method, that’s called only once, and will actually do the hard work, of batching that query. This example here, is our
Memcache AssociationLoader, which is gonna prefetch the images, on a certain set of products, and fulfil every Promise at once. The hard thing here, is
when to call perform. If you’ve used DataLoader,
you might know the use like, you use a little trick
that you can do in JS, they actually wait for process.(mumbles), to (mumbles) their perform, Promise, but we’re in Ruby, so we
couldn’t really do that, so what ended up happening is, we used Promise.rb to
return these Promises, which is a Promise A+
implementation in Ruby. And we use a custom executer, so basically, when we get the result, from the Ruby gem, we actually look for Promises, and resolve those. So, we actually know, that
every possible Promise was in queue. And we have a nice API for that too, so, all that logic’s
actually hidden behind a (mumbles), you can just
say, for that field :image, just please preload it, that should be field :product actually, preload images. And what that does, is basically wrapping, the normal resolve (mumbles) field, with a prefetch, (mumbles). So, before hitting the resolve, we’re actually waiting for that Promise to be fulfilled. So, that’s been really helpful, to hide a complexity behind batching, and helping really anybody, stop people from basically shooting themselves in the
foot with these queries. So if we take a look, at the same query, but with batching this time, we’re gonna have the same queries, except, this time we’re
actually gonna batch it. So this is great. We have other thing too, for authorization, authentication, is another kind of pain points, people are often hit, when using GraphQL. And one of the solution
proposes often having it, in resolver itself, checking for the user,
what they’re able to do. But we went with something
a little different. Our API clients, access tokens, already have the concept of scopes, so resources they’re allowed to access, and we actually coequated that,
with our type definitions. So for example, an order type
requires access to orders. And we can actually say, write, if you’re allowed to
write, or is it read only? Or we can even specify,
for a particular object, is the API client allowed
to access this type. And we compared that about, against an AccessControl object, that we pass in the context, and that AccessControl object, contain everything about the API client, its capabilities, what
types it can access. But there’s still, stuff that we need to
protect ourselves against, like evil clients, that might, build like really complex queries, and use many solutions,
for that query complexity, calculating a score for each field, maybe a MaxDepth, but we went with the simple solution, of a timeout, and we used just Ruby Timeouts class, which will raise, after
certain amount of time. Another thing that clients might (mumbles) like, really a lot of queries, and we’ve reused the same thing we had with our REST APIs, which is simple Throttle. It’s got a leaky bucket throw, which will raise to an API client, for a certain key, and the key might be a, a shop ID in our API client, we’re gonna Throttle it. So that’s helped us, mitigate
some of those bad clients. But really, the cool thing, that I really like about
our Shopify (mumbles) is really the developer experience, around building and extending schemas. So, we always checking
the IDL of the schema, in version control, so we can actually really see easily, what’s been changed on a
schema, breaking changes, and we enforce that. So if we compare the new
schema against the old schema, and if you change the
type, change the field, we’re gonna ask to
rebuild that schema_dump and check-in. And pay really careful attention, to not break the schema for no reason, so when you try to generate the new dump, if something has changed, and it’s creating a breaking change, we can detect that, and instruct, either you
wanna really wanna break it, or maybe rethink what you just did. So, we also have on GitHub, we have a bot that’s gonna
alert our GraphQL team, whenever the schema changes. That also helps us, maybe
help someone is new to GraphQL that might have just added a field. And we also have a strong of conventions, these are just a few examples. Our mutation actions, we always prefix with the object name
instead of the action. This is makes it a lot easier
for us to find mutations, since we have a lot. Our mutations return a userErrors, for errors that should
be shown to the clients, and we support the Relay spec always so, we have actually a test
that will check for these, for example, did you forget
the client mutation ID for a mutation? We might instruct you to rename mutations, and we have a few other things to be like instrumentation. Instrumentation is really
important for us at Shopify, and GraphQL wasn’t different, so we currently have any errors, that (mumbles) we have. Query time, clients that were troubled, or time outted. We really wanna have, like more precise instrumentation, so we’re working with Robert, the author of the GraphQL gem. So we can have maybe nicer hooks, to instrument actual resolver times, and things like that. So, we also have deprecated
field usage tracking, so every time a deprecated
field is executed, we actually log it, and we’re able to using a tool, a little bit like log stash,
that we use internally. We’re able to actually query for these, deprecate fields, and makes it really easy to know when you can safely remove a
deprecate field, or not. So that was a subset of
what we have currently, on server-side, and there’s so much more, so feel free to come talk to us after, but for now, I’m gonna let Dylan come talk about what we did, with
our mobile clients. So please welcome Dylan. (audience applause) – Alright, so, first I’m gonna give some background on what
we’re actually building, and help you understand
why we built what we built, and how we got there. I’ll talk about how we
use code generation, in order to build the
clients that we built. And also talk about, about how we do caching, and more importantly, how do we keep data
consistent across the app, when it changes. We were rebuilding the mobile Shopify app. We were told, we needed to rebuild it, so it could provide a full
Shopify Admin experience, rather than a limited companion app. So we decided to, make sure that it was also
fast, on slower networks, so we changed the text stack as well, and we looked at GraphQL as a way of reducing the response sizes. We managed to ship it,
in time for our deadline, which for us was, because we’d wanted our
merchants to have it before they were busy
with the holiday season. We started by building it in React-Native, we found it very quick
to build out the app, but we had some troubles with making sure we could polish it, and get it to the quality that we wanted. So we ended up, deciding that, since this was a mission critical app, and we weren’t sure if we could get the quality we wanted,
in the time we wanted, we ended up going with
native mobile (mumbles). So that meant writing Java and Swift, and also meant, we had
to leave behind relay, which is in JavaScript, unfortunately. It was a hard decision, but, let’s see what type of goals we had with building the new GraphQL clients. We wanted to keep it simple. At this point, I’d already worked on building out the server side, so we had something there, but we needed something
on the client to query it, we didn’t just want to specify, literal strings, which would be ugly, or deal with the raw JSON directly. We also wanted it to
be easy to understand, for our mobile developers. They’re the targets, they’re
the ones actually using this. So we want to embrace with their use too, which is a static type system, and IDEs. So we wanted to fit in well with that, and we want to think of
what’s going to happen, after we actually ship the app. We need to make sure we
can, evolve the schema, and we want to be able
to add to this over time. This initial release is just the point where we can replace the
companion app we had before. And for our (mumbles),
decoupling the code, so using components so we can
easily add components to it. In order to do all this, we realised that a GraphQL Schema was the killer feature for this. This is what, I think Facebook really developed this schema for, they wanted it to integrate
well, with static type system. So we use that for code generation, we built code for building a query, and for the response classes. And we want to make it easy to use, so we have a script that doesn’t require any arguments to use, and that only has to be around
in order to take advantage of anything added to the server. So let’s look at what the code
generator produces for us, starting with the query builders. On the left you can see the code, for building the query, and on the right, the
query it actually produces. Notice the similarity between that. That’s on purpose. We use the builder
pattern, to implement this, where we add fields by calling methods, which are (mumbles) by the returning the object they’re called on. And we use closures, to specify, subfields on those. We use literal values, in order to specify any arguments, which are provided when
we’re building the query. So the query is actually
built (mumbles) time. We have a similar API for Java, and at the time, we thought it was going to be hard to provide this because, Java 8 had lambda expressions, but there was no support
for that on Android. But we found out we could use retrolambda, to backport this to
older versions of Java. Now there’s an Android toolchain, which might adopt when it’s
more mature, called Jack, which also provides support for this. Normally, it seems like
fragents are used for composing queries, letting components specify what they want, so they can keep their data dependencies close to their data use. We took a much simpler approach. Instead, we just have a
function that takes in, query builder object, and we can put that on the component, and it can specify what it wants. And we don’t have to have
anymore support for it, or explain anything to our developers, because they know how
to do function calls. We do use fragments, inline fragments, in order to specify what fields we want, on a specific object type, when we’re dealing with an interface. In this case, an event is an interface, and type name is automatically
querying on that, which we’ll use in the response classes. We also automatically query IDE, which you’ll see that we use in the data consistency layer. By generating these type
specific query builder classes, and methods for each field, we integrate well with the IDE, because we can actually check these, at compile time, by
just compiling the code, and the IDEs since can
check for compiler errors, can highlight any errors for us. We also get auto-completion for free, and by generating documentation, we can surface that in the IDE. We also have deprecated field use, by using annotations for that. So we kept things simple, by leaving out some GraphQL features, and by intergrating with the type system, we were also able to get
all the IDE features, without having to implement an IDE plugin, just to provide that. We also were still able to do the loose coupling that we wanted, even without these features, and as the schema changes,
we just run a script, and we can adapt to that. And notibly, in order
to remove the fields, since we evolve the schema, we can just mark them as deprecated, and when they run that script, they’ll get some warnings
for deprecated field use, and can easily address them. Let’s look at the response classes, because we don’t wanna deal with raw JSON. Instead, what we get to deal with, is something much more pleasant. These response objects, are already validated
when they’re constructed. They all deserialize the fields, and provide access or methods, so that we can leverage the type system, notably dealing with nulls is a common pain point, so we use Swift optional types, which allow us to, be
forced to check for null, where necessary, for nullable fields. And for non null fields, we
don’t have to be concerned with can this thing be null or not, that’s all encoded in the scheme for us. We found this so convenient, we decided to replace the model error, and for this app it made sense. We wanted to keep as much logic
as we could, on the server, and that way, we didn’t
have to worry about avoiding code duplication. And notably we can leverage, all that code that’s already
written on the server and allow it to change over time. Since we wanted our response
objects to be convenienced, that also meant looking at custom scalars. We didn’t want to deal with,
the string as it’s encoded in the JSON response, where
money or daytime types. So instead we took, approach of keeping our GraphQL
code generator generic, so it’ll work on any schema, but providing a way for
it to be used, where custom scalars can be, we can pass them something, that can specify how to
generate code for it. So in this case, we specify what types to actually use for it, and how to serialize and deserialize it. So our update schema
script is what adapts, to our specific case, for that application, whereas the code generator
is separate from that, and ideally we can open
source it in the future. Another thing we have to address is, the fact that we don’t know all the types that are ever going to be there. We generate code, and we’re
going to ship the app, and people are going to
run with all the versions. So that means you have to
deal with the known values, that we don’t know about
when the code is generated. To address that, we
generate an unknown value, and add it to the unknown
type, for the language, like Swift or Java. And, in the case of
exhaustive switch statements, we can be forced to actually
have a case for that, as well as being forced to
handle all the known cases if we choose to, use an
exhaustive switch statement. For interface fields, we return the object type, for the actual object, and make sure that it
implements the interface. But that means using the type name, and making sure that, we
can actually match that, with a corresponding response class. So we have to deal with
the unknown case again. So we generate an unknown
class, for each interface type, so in this example for events, we have unknown events class, and that implements the event interface, and that way we can fall back
to using the common methods, on common fields on, that interface. So in summary for the response classes, we simplified things, by avoiding a hand written model layer. We integrated well with the type system, and we left room for schema evolution. And in addition to that, we left room for the
business logic to change, without having inaccuracies, by just being able to
add fields to the schema, that are computed fields. And we use that liberally, to keep code, that we’ve classified as
business logic out of the client. So, let’s end by talking
about how we cache things, and keep data consistent across the app. For caching, we cache whole responses, and we key that based on the query string, and persist that to disk. That way, when the app is reopened, we’re able to quickly render something, and if that data’s changed, we still actually send that query, so we can get back the
response and change the view, with the actual data
that we’ve just gotten. So we have to handle
both of those responses, and we have a handle relay query method, that’s defined on a relayable object, which is an interface. And we pass that relayable object, to the method query, for the data. So that handler method, would
get called multiple times. And we also keep that
relayable object around, in a list of active queries, which represent all the parts of the app that’ll need to be,
updated when data changes, And we have a weak reference, in order to clean up that
list of active queries, when anything in it, no longer is there to update
when the views gone away. When a mutation happens, we wanna update the whole app, so that we can see that
change represented elsewhere, and even as they navigate through the app. So what we do, is we find all the nodes, on the mutation payload, and match that against nodes
in the active query objects. In order to find everything
that needs to be updated, and we match those by
just comparing the IDs. We deep merge the fields
from the payload node, onto the active query node, and then call that
handleRelayQuery method, in order to update the view. Optimistic updates are similar, in that they target a specific node, and specify what action to perform on it, like deleting it, or
we might do an update. For the updates, we
use the response class, in order to specify that, and have setter methods, so we can specify what changes on those. When the response actually comes back, we wanna rollback the optimistic update, because we don’t wanna assume
that that actually happened, it’s just a present what
we think will happen, before the actual response comes back. So we kept things simple, we just had the whole response caching, and we use type checking,
for optimistic updates, and I think we’ve built
something for the long term, by making sure that we
have data consistency, without coupling, which means, when we do a mutation, we don’t say, here’s all the things that needs update, we just do the mutation, and
let the updates propagate. If this talk has been interesting to you, you can follow our work
on our engineering blog, and as pointed out that, some people are hiring GraphQL devs and, I could certainly use some help, and hopefully I can
open source some things we’ve been working on here. At least the generic parts of it. Thank you. (audience applause)

2 thoughts on “GraphQL at Shopify”

Leave a Reply

Your email address will not be published. Required fields are marked *