New ask Hacker News story: Redshift Research Project: Amazon Redshift Serverless [pdf]
Redshift Research Project: Amazon Redshift Serverless [pdf]
2 by Max-Ganz-II | 1 comments on Hacker News.
Redshift Serverless is not serverless. A workgroup is a normal, ordinary Redshift cluster. All workgroups are initially created as a 16 node cluster with 8 slices per node, which is the default 128 RPU workgroup, and then elastic resized to the size specified by the user. This is why the original RPU range is 32 to 512 in units of 8 and the default is 128 RPU; the default is the mid-point of a 4x elastic resize range, and a single node, the smallest possible change in cluster size, is 8 RPU/slices. 1 RPU is 1 slice. With elastic resize the number of nodes changes but the number of data slices never changes; rather, the data slices are redistributed over the new nodes, where if the cluster becomes larger, the slice capacity of each node is filled up with compute slices, which are much less capable than data slices, and where if a cluster becomes smaller, the original set of 128 data slices for 16 nodes are crammed into the remaining nodes. Both outcomes are inefficient for compute and storage; a 512 RPU workgroup has 128 data slices and 384 compute slices, rather than 512 data slices, and a 32 RPU workgroup in the worst case for disk use overhead, small tables (~150k rows), consumes 256mb per column, compared to the 64mb of a provisioned 32 slice cluster. The more recently introduced smaller workgroups, 8 to 24 RPU (inclusive both ends) use a 4 slice node and have two nodes for every 8 RPU. In this case, the 8 RPU workgroup is initially a 16 node cluster with 8 slices per node, which is resized to a 2 node cluster with 4 slices per node - a staggering 16x elastic resize; the largest resize permitted to normal users is 4x. An 8 RPU workgroup, with small tables, uses 256mb per column rather than 16mb per column. Workgroups have a fixed number of RPU and require a resize to change this; workgroups do not dynamically auto-scale RPUs. I was unable to prove it, because Serverless is too much of a black box, but I am categorically of the view that the claims made for Serverless for dynamic auto-scaling are made on the basis of the well-known and long-established mechanisms of AutoWLM and Concurrency Scaling Clusters. Finally, it is possible to confidently extrapolate from the ra3.4xlarge and ra3.16xlarge node types a price as they would be in a provisioned cluster for the 8 slice node type, of 6.52 USD per hour. Provisioned clusters charge per node-second, Serverless workgroups charge per node-query-second and so go to zero cost with zero use. On the default Serverless workgroup of 128 RPU/16 nodes (avoiding the need to account for the inefficiencies introduced by elastic resize), 10 queries run constantly for one hour (avoiding the need to account for the Serverless minimum query charge of 60 seconds of run-time) costs 460.80 USD. A provisioned cluster composed of the same nodes costs 104.32 USD. The break-even point is 2.26 queries for one hour. Serverless introduces zero usage-zero cost billing, which allows for novel use cases, but this could have perfectly well been obtained by introducing a zero-zero billing model for Provisioned Redshift, without the duplicity, considerable added complexity, end-user confusion, cost in developer time and induced cluster inefficiency involved in the pretence that Serverless is serverless. https://ift.tt/PH2pf5w https://ift.tt/cC6Lp04
2 by Max-Ganz-II | 1 comments on Hacker News.
Redshift Serverless is not serverless. A workgroup is a normal, ordinary Redshift cluster. All workgroups are initially created as a 16 node cluster with 8 slices per node, which is the default 128 RPU workgroup, and then elastic resized to the size specified by the user. This is why the original RPU range is 32 to 512 in units of 8 and the default is 128 RPU; the default is the mid-point of a 4x elastic resize range, and a single node, the smallest possible change in cluster size, is 8 RPU/slices. 1 RPU is 1 slice. With elastic resize the number of nodes changes but the number of data slices never changes; rather, the data slices are redistributed over the new nodes, where if the cluster becomes larger, the slice capacity of each node is filled up with compute slices, which are much less capable than data slices, and where if a cluster becomes smaller, the original set of 128 data slices for 16 nodes are crammed into the remaining nodes. Both outcomes are inefficient for compute and storage; a 512 RPU workgroup has 128 data slices and 384 compute slices, rather than 512 data slices, and a 32 RPU workgroup in the worst case for disk use overhead, small tables (~150k rows), consumes 256mb per column, compared to the 64mb of a provisioned 32 slice cluster. The more recently introduced smaller workgroups, 8 to 24 RPU (inclusive both ends) use a 4 slice node and have two nodes for every 8 RPU. In this case, the 8 RPU workgroup is initially a 16 node cluster with 8 slices per node, which is resized to a 2 node cluster with 4 slices per node - a staggering 16x elastic resize; the largest resize permitted to normal users is 4x. An 8 RPU workgroup, with small tables, uses 256mb per column rather than 16mb per column. Workgroups have a fixed number of RPU and require a resize to change this; workgroups do not dynamically auto-scale RPUs. I was unable to prove it, because Serverless is too much of a black box, but I am categorically of the view that the claims made for Serverless for dynamic auto-scaling are made on the basis of the well-known and long-established mechanisms of AutoWLM and Concurrency Scaling Clusters. Finally, it is possible to confidently extrapolate from the ra3.4xlarge and ra3.16xlarge node types a price as they would be in a provisioned cluster for the 8 slice node type, of 6.52 USD per hour. Provisioned clusters charge per node-second, Serverless workgroups charge per node-query-second and so go to zero cost with zero use. On the default Serverless workgroup of 128 RPU/16 nodes (avoiding the need to account for the inefficiencies introduced by elastic resize), 10 queries run constantly for one hour (avoiding the need to account for the Serverless minimum query charge of 60 seconds of run-time) costs 460.80 USD. A provisioned cluster composed of the same nodes costs 104.32 USD. The break-even point is 2.26 queries for one hour. Serverless introduces zero usage-zero cost billing, which allows for novel use cases, but this could have perfectly well been obtained by introducing a zero-zero billing model for Provisioned Redshift, without the duplicity, considerable added complexity, end-user confusion, cost in developer time and induced cluster inefficiency involved in the pretence that Serverless is serverless. https://ift.tt/PH2pf5w https://ift.tt/cC6Lp04
Comments
Post a Comment