My Grocery Shopping Experiences and ... Snowflake
Carrie Ballinger makes an apt analogy between grocery shopping today and Snowflake's mandatory entrance queueing. Read more.
In response to the current virus pandemic most of the grocery stores here in LA have limited how many shoppers they allow into the store at any one time. They position a gate-keeper outside the store entrance who directs the next shopper inside only after another shopper has exited.
I was lined up outside my nearby Trader Joe’s store last week for about an hour, cart in hand, gloved and masked, waiting to pick up food supplies. I’m not the most patient person when I have to wait, but I was delighted once I got inside and saw that the aisles were relatively void of people and easy to navigate. I didn’t have to dodge other shoppers trying to get to items on the shelf, there was no line at the meat counter, and when I got to the checkout there was no wait there either. It was a strange experience, one big queue on the outside, then absolutely no waits and plenty of elbow room once you got inside.
During the time I was outside waiting, I began thinking about similarities between my grocery store experience and the Snowflake product. Snowflake operates using a similar approach of one big queue outside, but no lines or additional queues once you get inside. Currently, each Snowflake virtual warehouses has a default concurrency limit of 8 queries at a time. If a ninth query wants to start it will either queue up, or a new cluster waiting in the wings will be brought in to run it.
If you compare that to Teradata, the Advanced SQL Engine offers workload management throttles to limit how many queries can be run concurrently, similar to the Snowflake limits. But with Teradata, these throttles are completely optional and are easy to customize for your particular needs. There is no mandatory queue such as there are outside the supermarket doors these days.
One reason Teradata does not enforce mandatory entrance queueing is because it doesn’t need to. Several internal queueing mechanisms have been architected inside the Advanced SQL Engine at various points. These internal queues, such as the message queue that each AMP has available to temporarily queue up messages when an AMP is overloaded, allow Teradata to support high volumes of intensive work running at the same time. This approach of multiple levels of queues and small push-back points inside the database makes it possible for all available resources to be fully utilized without the system becoming overwhelmed.
Snowflake doesn’t have internal queues or other embedded techniques to manage the flow of work internally, so it has to use a brute-force approach of putting a one-size-fits-all fixed limit on what it allows to start up. The side-effect of this approach is the same as with a grocery store: Usable capacity often goes wasted, and customers end up waiting longer than they actually need to before they get inside.
One thing that is particularly frustrating when you are in a grocery store line are the times you only need one or two items, like a single bottle of wine or a carton of eggs. You still have to wait outside the same amount of time even though the items you need may be few in number.
Snowflake doesn’t let short work bypass the entrance queue and offers limited short cuts when you only want a few pieces of data. Snowflake does have techniques such as clustering and statistics on each column on a data block basis, but it doesn’t support indexes that can directly access one or a few rows. Snowflake typically will have to read volumes of data in situations where Teradata can do single-AMP access based on a primary index.
Grocery stores, on the other hand, provide signs at the front of their aisles, so the shopper can easily locate the wine section or the dairy section. Following these signs means shoppers don’t have to drag their carts up and down all the store aisles looking for what they need, slowing everyone down, and blocking other carts along the way.
More in line with grocery stores, Teradata offers not just primary index access, but a rich set of secondary indexing options which, like aisle markers, direct you to just the item you are looking for in an efficient manner. Additionally, Teradata offers hybrid row and column partitioning, and multi-level partitioning that can significantly reduce the I/O required to access very granular subsets of data.
Most grocery stores have another technique for speeding up the experience for single- or few-item shoppers: the 15-items-or-less checkout line. This gives favored treatment to small market baskets, once you get inside the store, similar to Teradata’s ability to give a higher priority to short queries over longer-running queries. When a grocery store (or a database) is heavily populated, using those fast checkouts can get you in and out quickly. Snowflake lacks those capabilities, underscoring the challenges it faces performing short-SLA types of applications.
And when it comes to the outside-the-store queues, grocery stores have started to identify some shoppers for preferential treatment and shorter lines. Here in LA, most grocery stores have instituted “seniors-only” shopping hours first thing in the morning, where waiting in line has been practically eliminated for the more mature segment of the population.
Another way in which the grocery store analogy can be applied to Snowflake has to do with location. Think of Snowflake as a grocery store located in a big mall and you have to go to that big mall every time you do your food shopping. This big mall contains a lot of empty stores waiting to open up and service customers based on possible future demand. When one of those empty stores opens up because the initial store is busy, that new store will be provided with the same inventory as your initial store, even when only a subset of the items is needed. It takes a lot of space and hardware to keep all those empty stores in readiness for unexpected growth, and this can be costly.
You may need to go to a similar big mall when provisioning a Teradata Cloud platform. But with Teradata you also have the choice of doing your shopping closer to home. You can choose to set up a local Teradata system, like a local grocery store right in your neighborhood. Then you have the option to decide where to do your shopping, based on inventory, convenience of location, and differences in service (and, of course, based on which location has the shortest lines out front).
There are many lessons we can learn from our new and rapidly changing experiences. One important lesson is that there are tradeoffs behind most conventions, whether its how the flow of customers coming into a grocery store is managed, or how a database regulates and maximizes the resources it has available. Look below the surface and beyond super-marketing chants to get a grip on what is actually taking place, both in your database and at your grocery store.
I was lined up outside my nearby Trader Joe’s store last week for about an hour, cart in hand, gloved and masked, waiting to pick up food supplies. I’m not the most patient person when I have to wait, but I was delighted once I got inside and saw that the aisles were relatively void of people and easy to navigate. I didn’t have to dodge other shoppers trying to get to items on the shelf, there was no line at the meat counter, and when I got to the checkout there was no wait there either. It was a strange experience, one big queue on the outside, then absolutely no waits and plenty of elbow room once you got inside.
During the time I was outside waiting, I began thinking about similarities between my grocery store experience and the Snowflake product. Snowflake operates using a similar approach of one big queue outside, but no lines or additional queues once you get inside. Currently, each Snowflake virtual warehouses has a default concurrency limit of 8 queries at a time. If a ninth query wants to start it will either queue up, or a new cluster waiting in the wings will be brought in to run it.
If you compare that to Teradata, the Advanced SQL Engine offers workload management throttles to limit how many queries can be run concurrently, similar to the Snowflake limits. But with Teradata, these throttles are completely optional and are easy to customize for your particular needs. There is no mandatory queue such as there are outside the supermarket doors these days.
One reason Teradata does not enforce mandatory entrance queueing is because it doesn’t need to. Several internal queueing mechanisms have been architected inside the Advanced SQL Engine at various points. These internal queues, such as the message queue that each AMP has available to temporarily queue up messages when an AMP is overloaded, allow Teradata to support high volumes of intensive work running at the same time. This approach of multiple levels of queues and small push-back points inside the database makes it possible for all available resources to be fully utilized without the system becoming overwhelmed.
Snowflake doesn’t have internal queues or other embedded techniques to manage the flow of work internally, so it has to use a brute-force approach of putting a one-size-fits-all fixed limit on what it allows to start up. The side-effect of this approach is the same as with a grocery store: Usable capacity often goes wasted, and customers end up waiting longer than they actually need to before they get inside.
One thing that is particularly frustrating when you are in a grocery store line are the times you only need one or two items, like a single bottle of wine or a carton of eggs. You still have to wait outside the same amount of time even though the items you need may be few in number.
Snowflake doesn’t let short work bypass the entrance queue and offers limited short cuts when you only want a few pieces of data. Snowflake does have techniques such as clustering and statistics on each column on a data block basis, but it doesn’t support indexes that can directly access one or a few rows. Snowflake typically will have to read volumes of data in situations where Teradata can do single-AMP access based on a primary index.
Grocery stores, on the other hand, provide signs at the front of their aisles, so the shopper can easily locate the wine section or the dairy section. Following these signs means shoppers don’t have to drag their carts up and down all the store aisles looking for what they need, slowing everyone down, and blocking other carts along the way.
More in line with grocery stores, Teradata offers not just primary index access, but a rich set of secondary indexing options which, like aisle markers, direct you to just the item you are looking for in an efficient manner. Additionally, Teradata offers hybrid row and column partitioning, and multi-level partitioning that can significantly reduce the I/O required to access very granular subsets of data.
Most grocery stores have another technique for speeding up the experience for single- or few-item shoppers: the 15-items-or-less checkout line. This gives favored treatment to small market baskets, once you get inside the store, similar to Teradata’s ability to give a higher priority to short queries over longer-running queries. When a grocery store (or a database) is heavily populated, using those fast checkouts can get you in and out quickly. Snowflake lacks those capabilities, underscoring the challenges it faces performing short-SLA types of applications.
And when it comes to the outside-the-store queues, grocery stores have started to identify some shoppers for preferential treatment and shorter lines. Here in LA, most grocery stores have instituted “seniors-only” shopping hours first thing in the morning, where waiting in line has been practically eliminated for the more mature segment of the population.
Another way in which the grocery store analogy can be applied to Snowflake has to do with location. Think of Snowflake as a grocery store located in a big mall and you have to go to that big mall every time you do your food shopping. This big mall contains a lot of empty stores waiting to open up and service customers based on possible future demand. When one of those empty stores opens up because the initial store is busy, that new store will be provided with the same inventory as your initial store, even when only a subset of the items is needed. It takes a lot of space and hardware to keep all those empty stores in readiness for unexpected growth, and this can be costly.
You may need to go to a similar big mall when provisioning a Teradata Cloud platform. But with Teradata you also have the choice of doing your shopping closer to home. You can choose to set up a local Teradata system, like a local grocery store right in your neighborhood. Then you have the option to decide where to do your shopping, based on inventory, convenience of location, and differences in service (and, of course, based on which location has the shortest lines out front).
There are many lessons we can learn from our new and rapidly changing experiences. One important lesson is that there are tradeoffs behind most conventions, whether its how the flow of customers coming into a grocery store is managed, or how a database regulates and maximizes the resources it has available. Look below the surface and beyond super-marketing chants to get a grip on what is actually taking place, both in your database and at your grocery store.
Restez au courant
Abonnez-vous au blog de Teradata pour recevoir des informations hebdomadaires