Rethinking data gravity

At some point I remember having a short chat with Werner Voegels about taking spot instances to extreme in a genuine market in which compute power can be traded. His response was “what about data gravity?” to which my counter was — but by making data transfer into S3 free (and, later, making true the adage about not underestimating the bandwidth of a truck full of tape) you, while understanding the gravity idea, also provide incentives to not make it an issue. As in — why don’t I make things redundant? Why don’t I just push data to multiple S3 regions and have my compute follow the sun in terms of cost? Sure, it doesn’t work on huge scale, but it just may work perfectly fine on some medium scale, and this is what we’ve used for implementing our DMP at OpenDSP.

Later on, I sort of dabbled in something in the arbitrage of cost space. I still think compute cost arbitrage will be a thing; 6fusion did some interesting work there; ClusterK got acquired by Amazon for their ability to save cost even when running heavy data-gravity workload such as EMR, and ultimately isn’t compute arbitrage just an arbitrage of electricity? But I digress. Or do I? Oh yes.

In a way, this is not really anything new — it is just another way to surface the same idea as Hadoop.

2 thoughts on “Rethinking data gravity”

[…] functionality here inserting things twice. The key idea here, again, is a variation on dealing with data gravity concerns by just duplicating storage. We create two sorted sets for each key, the “score” being […]

LikeLike

[…] not something to worry about; the outgoing traffic and compute is where the problem is at. So move the compute to the data, not the other way […]

LikeLike

OpenDSP’s DMP: Nanoput | DEBEDb says:

January 27, 2020 at 11:38 am

[…] functionality here inserting things twice. The key idea here, again, is a variation on dealing with data gravity concerns by just duplicating storage. We create two sorted sets for each key, the “score” being […]

LikeLike

Ad-hoc querying on AWS: Lambda, Glue, Athena | DEBEDb says:

December 4, 2020 at 9:19 pm

[…] not something to worry about; the outgoing traffic and compute is where the problem is at. So move the compute to the data, not the other way […]

LikeLike

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

DEBEDb

Fear, loathing, uncertainty, doubt, laziness, impatience, Oxford commas, and hubris

2 thoughts on “Rethinking data gravity”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “Rethinking data gravity”

Leave a comment Cancel reply