We have recently seen some convergence of different database technologies. Many customers are evaluating heterogeneous migrations as their database needs have evolved or changed. Evaluating the best database to use for a job isn’t as clear as it was ten years ago. We’ll discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, Amazon Neptune, and Amazon Redshift. This session digs into how to evaluate a new workload for the best managed database option. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
- database workload classifications
- traditional approaches to rdbms
- how nosql databases compare
- the flavours of nosql on aws
- what database to use.
choose the database because of purpose built.
- operations: oltp
- common business process
- regular and repeatable
- same thing happens each time data is processed
- analytics: olap
- more adhoc access patterns
- do not know what type of questions the users will ask when they come to the application.
- decision support system. data lakes and data warehouses
- sizing a database
- we usually oversize to make sure we can handle spikes but this means lost dollars for paying for larger instances for short spikes in our workload.
- scaling rdbs
- start with small box then slowing scale up to bigger box and bigger box until we run out of bigger boxes.
- then we have to start sharding the db and that gets complicated.
- leverage denorm model
- sharded and provided horizontal scaling and unbounded storage capacity.
- uses partition keys to decide which node data should be distributed to.
- hash(2) => 48
- hash(1) = 7b
- hash(3) = cd
- shard key, partition key: each nosql db needs this to know which node the data needs to go to.
- CAP theorem
- C: consistency - consistent view of data. as soon as the write happens the read matches.
- A: availability - always be able to read and write. if you can read but not write then it is no available.
- P: partition tolerance - what happens when network between nodes starts to fail.
RDBS: CA NoSQL:
- MongoDB: CP -> document
- Redis: CP -> key/value
- memcache: CP -> key/value
- dynamo db: CP
- cassandra: AP -> wide column
- dynamodb: AP -> wide column
- riak: AP
- early adopters
- early majority
- late majority
- de normalized data modelling is going to become commonplace.
- dynamodb - wide column/document
- elasticache - indexed key value
- qldb - ledger
- neptune - graph
- timestream - tsdb
- elastic search
- sql server
- requires knowing access patterns before storing data.
- reshape data on the way out so you do not need to know the access patterns up front.
- wide column / document
- items have attributes. attributes do not need to be the same.
- partition key
- sort key
graph query types
- node query (primary) rbms can do this
- edge query (index) rbms can do this
- hybrid query (traversal) this is where graph databases shine.
- data warehousing
- fully scalable backend
sql / nosql / graph
sql: optimized for storage nosql: optimized for compute, pre-created denormalized views. graph: ad hoc entity/relationship aggregations
- P: pattern flexibility: supports random access
- I: infinite scale: can gracefully increase in size and throughput without practical limits
- E: efficiency: how fast do the results need to come back
Amazon DB: IE Amazon RDS: PE Elastic Search: PE Neptune: PE Redshift: PI Athena: PI
Purpose built db solutions
zero unplanned downtime
99.999 global tables on amazon dynamo db (seconds of downtime per year) 99.99 region amazon dynamo db