What can you do with Apache TinkerPop and Gremlin or RDF and SPARQL? How does Neptune provide multi-Availability Zone high availability? Learn about the features and details of Amazon’s fully managed graph database service.
- useful for traversing relationship
- kinds of data
- rigid highly connected data
- heterogenous schema and formats
- higly connected data (hr system, recommendation, social graph)
- value is from thinkng about the relationshps
- social networking
- knowledge graphs
- fraud detetion
- life sciences
- network and it operations
examples: whom might i know? what product should i buy?
we can see shared interests. we can identify new edges that create new triangles in the graph.
drawbacks of using a a rds
- query patterns are difficult and requires lots of joins and becomes complex and hard to be efficient.
- join querires are slow and need indexes
- indexes slow down write perf.
- what is the write perfor for this graph db?
- replication/sharding a thing? HA?
relationships are first class objects. this provides power for querying. graph dbs are optimized for highly connected data.
- storage and retrieval
2 graph models/frameworks
- property graph (apach tinkerpop)
- gremlin query language
- resoure description framework (rdf)
- SPARQL query language from W3C
nodes -> edge props insted of rows and tables
- gremlin is an imperitive language.
RDF is used described resources on the web.
example graph query:
- find all grad students who recieved an under grad from the same university gremlin
example is kind of gross so i wont write it. java like builder syntax.
nicer looking syntax
select ?student where( ?student rdf:type ub:gradstudent ?univ rdf:type ub:uni . ?dept rdf:type ub:department ?student ub:memberof :ub uni ?student ub:memberof :ub dept )
- difficult to scale (high ops workload)
- difficult to maintain HA (high ops workload)
- too expensive
- limited support for open standards (needs enterprise support and licensing)
neptune: high throughput/low latency
- reliable: 6 replicas across AZ’s
- OLTP queries
- OLAP queries -> 100 per server/second (high latency) -> depends on the shape of the data
durable and ACID supports both tinkerpop and rdf/sparql bulk load import from s3 json documenta via rest interface.j
- multi -az ha
- read replicas
- encryption at rest