mon•o
(.)
schema design
.•..
logen mongodb
j ared rosoff
b
@forjared
topics
introduction
• working with documents
• evolving a schema
• queries and indexes
• rich documents
common patterns
• single table inheritance
• one-to-many & many-to-many
• trees
• queues
how familiar are you with mongodb?
kaa,ddb*wa.rwkswnwe ‘pa il
w
‘a’ inplasnla
1%aapcn
ways to model dat
i!1 s
w -
!i
‘
bttd:llwww,fiickrcprniphptpsi423o4632(1noo[49363987oi
normalized
denormalizeil
terminology
table collection
row(s) json document
index index
join embedding & linking
i...
logen 0 mongodb
how can we manipulate this
data?
• dynamic queries
• secondary indexes
• atomic updates
• map reduce
access patterns?
• read i write ratio
• types of updates
• types of queries
• data life-cycle
considerations
• no joins
• document writes are atomic
schema-design criteria
cd
cd
(i,
a)
d
cd
x
a)
cd
a simple start:
map the documents to your application.
book = {author: “hergé”, date: new date(), text: “destination moon”, tags: [“comic”, “adventure”]}
> db.books.save(book)
_ _s
find the document
> db.books.find()
{ _id: objectld( “4c4bascø672c68sese8aabf3”)1, author: “hergé”, date: “sat jul 24 2010 19:47:11 gmt-0700 (pdt)”, text: “destination moon”, tags: [ “comic”, “adventure” )
}
notes:
• _id must be unique, but can be anything you’d like
• default bson objectid if one is not supplied
add an index, find via index secondary index on “author”
> db.booksensurelndex({author: 1})
> dbbooks.find({author: ‘hergé’}) ________
{ _id: objectld(”4c4ba5c0672c685e5e8aabf3”), date: “sat jul 24 2010 19:47:11 gmt-0700 (pdt)”, author: “hergé”, i i i }
examine the query plan
> db.books.find({author: hergè’}).explain()
“cursor” : “btreecursor author_i”,
“nscanned” : 1,
“nscannedobjects” : 1
isr’• • i
ii a
“millis” : 5,
“indexbounds” : {
“author” : [
[
“hergé”,
“hergé”
]
multi-key indexes
1/ build an index on the ‘tags’ array
> db.books.ensurelndex({tags: 1})
/1 find posts with a specific tag
/1 (this will use an index!)
> db.books.find({tags: ‘comic’})
query operators
conditional operators:
$ne, $in, $nin, $mod, $ali, $slze, $exists, $type,..
$11, $ite, $gt, $gte, $ne
l
update operators:
$set, $inc, $push, $pop, $puil, $pushall, $puiiall
extending the schema
new_comment = {author: “kyle”, date: new dateq, text: “great book”, votes: 5}
> db. books. update(
{text: “destination moon” },
{ ‘$push’: {comments: new_comment},
‘$inc’: {comments_count: 1}})
extending the schema )
{ _id : objectid(’4c4ba5c0672c685e5e8a?, author : “hergé”, date : “sat jul 24 2010 19:47:11 gmt-0700 (pdt)”, text : “destination moon”,
tags : [ “comic”, “adventure” 1’ 7
comments : [ - - -
{
author : “kyle”,
date : “sat jul 24 2010 20:51:03 gmt-0700 (pdt)”,
text : “great book”,
votes : s
}
1
comments_count: 1
the ‘dot’ operator
// create index on nested documents:
> db.books.ensurelndex({”comments.author”: 1})
> db.books.find({comments.author:”kyle”})
// create index comment votes:
> db.books.ensurelndex({comments.votes: 1})
// find all books with any comments with more than
// 5e votes
> db.books.ensurelndex({comments.votes: {$gt: 5o}})
c)
id
0
c)
b
cd
d
cr
v
rich documents
______________ magento dotthoie oron.x.xxj
i ..iel-’— — — i -— ______________ j - -• =
l2i---—- i 4j’ji
i __ _____ ___ f lii
(. f :,-- :f. _
_____ ____ i’lj1
ii.laii bt ______ ___
• -. — .-• ____ ______ _____
l
rich documents
ectld( “4c4ba5c0672c685e5e8aabf3”)
line_items : [ { sku: ‘tt-123’,
name: ‘tintin a la lune’ }.,
{ sku: ‘tt-457’,
name: ‘coke en stock’ } 1
address : { name: ‘banker’ ,
street: ‘111 main’, =4
zip: 10010 },
payment: { cc: 4567,
exp: date(2011, 7, 7) },
subtotal: 2355
inheritance
rect
- length width
single
table inheritance - rdbms
shaes table
],pe
1rea
radiusj,d
ilength
.lwidth
i
circle
3.14
1
2
square
4
2
3
rect
10
5
2
single table inheritance - mongodb
> db.shapes.find()
{ _id: “1”, type: “circle”,area: 3.14, radius: 1}
{ _id: “2”, type: “square”,area: 4, d: 2)
{ _id: “3”, type: “rect”, area: 10, length: 5, width: 2)
ii find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})
ii create sparse index
> db.shapesensurelndex({radius: 1>, {sparse: true})
one to ivany
one to many relationships can specify
— _
ii
author
-
author
author
blog entry
(
‘
blog
entry
‘ (
4
,
[bio
entry
—4
one to
- embedded array i array keys
- $slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- embedded tree
- single document
- natural
-hardtoquery ___
- normalized (2 collections)
- most flexible
- more queries
one to many - patterns
embedded array i array keys
- embedded tree
- normalized
- embedded array i array keys
[j
{ _id : objectld(”4c4ba5c0672c685e5e8aabf3”), author : “hergé”, date : “sat jul 24 2010 19:47:11 gmt-0700 (pdt)”, text : “destination moon”, tags : [ “comic”, “adventure” ],
comments :
author : “kyle”,
date : “sat jul 24 20i0 20:51:03 gmt-0700 (pdt)”,
text : “great book”,
votes : 5
).
comments count: 1
}
{ _id : objectld(”4c4ba5c0672c685e5e8aabf3”), author : ‘hergé”, date : “sat jul 24 2810 19:47:11 gmt-0700 (pdt)”, text : “destination moon”, tags : [ “comic”, “adventure” ] ____
book_id: objectld(”4c4basco672c68sesesaabf3”), author : “kyle”,
date : “sat jul 24 2010 20:51:03 gmt-0700 (pot)”,
text : “great book”,
votes
iii __
referencing vs. embedding
- embed when the ‘many’ objects always appear with their parent.
reference when you need more flexibility.
a n y — m a n y
example:
- product can be in many categories
- category can have many products
1product ]
product_id
product
category
- product_id
- category_id
category egory_
li1 a n y — ii! a n y
products:
{ _id: objectld(”4c4ca23933fb5941681b912e”), name: “destination moon”,
category_ids: [ objectld(”4c4ca25433fb5941681b912f”), objectid(”4c4ca2s433fb5941681b92af’]}
categories:
{ _id: objectld(”4c4ca25433fb5941681b912f”), ndme: “adventure”,
product_ids: [ objectld(”4c4ca23933fb5941681b912&’), objectld( “4c4ca30433fb5941681b9130”),
objectld( “4c4ca3o433fbs941681b913a”j }
/1 multi-key index on array fields
> db.products.ensurelndex({category_ids: 1});
> db.categories.ensurelndex({productids: 1});
better alternative ) ___
products:
{ _id: objectld(”4c4ca23933fb5941681b912e”), name: “destination moon”,
category_ids: [ objectld(”4c4ca25433fb5941681b912f”), objectld(”4c4ca25433fb5941681b92af”)}
categories:
{ _id: objectld(”4c4ca25433fb5941681b912f”), name: “adventure”}
__ iibd
____ ii
alternative
products:
{ _id: objectld(”4c4ca23933fb5941681b912e”), name: “destination moon”,
category_ids: [ objectld(”4c4ca25433fb5941681b912f”), objectid(”4c4ca25433fb5941681b92af)}
categories:
{ _id: objectld(”4c4ca25433fb5941681b912f”), name: “adventure”}
ii all products for a given category
> db.products.find({category_ids:
objectld(”4c4ca25433fb5941681b912f”)})
i
—
alternative
products:
{ _id: objectld(”4c4ca23933fb5941681b912e”), name: “destination moon”,
category_ids: [ objectld(”4c4ca25433fb5941681b912f”), objectld(”4c4ca25433fb5941681b92af”) }
categories:
{ _id: objectld(”4c4ca25433fb5941681b912f”), name: “adventure”}
ii all products for a given category ____
> db.products.find({category_ids:
obj ectld( “4c4ca25433fb5941681b912f” ) })
1/ all categories for a given product
product = db.products.find(_id : some_id)
> db.categories.find({_id : {$in : product.category_ids}})
trees
full tree in document
{ comments: [
{ author: “kyle”, text: “i..’,
replies: [
{author: “fred”, text:
replies: [j}
z,,_,.j
pros: single document, performance, intuitive
cons: hard to search, partial results, 16mb limit
trs
n
parent links
- each node is stored as a document
- contains the id of the parent
y
child links
- each node contains
- can support graphs
the id’s of the children (multiple parents i child)
•i.
e —
•ii
i
i
[ “a” ], parent: “a” }
[ “a”, “b” ], parent: “b”
[ “a”, “b” ], parent: “b”
[ “a” ], parent: “a” }
[ “a”, “e” ], parent: “e”
i/find all descendants of b:
> db.tree2.find({ancestors: ‘b’))
i/find all direct descendants of b:
> db.tree2.find({parent: ‘b’))
array of ancestors
- store all ancestors of a node
{ _id: “a” }
{ _id: “b”, ancestors:
{ _id: “c”, ancestors:
{ _id: “d”, ancestors:
{ _id: “e”, ancestors:
{ _id: “f”, ancestors:
n
}
}
}
=1
array of ancestors
- store all ancestors of a node
{ _id: “a” }
{ _id: “b”, ancestors: [ “a” ], parent: “a” }
{ _id: “c”, ancestors: [ “a”, “b” ], parent: “b” }
{ _id: “d”, ancestors: [ “a”, “b” 3, parent: “b” }
{ _id: “e”, ancestors: [ “a” 3, parent: “a” }
{ _id: “f”, ancestors: [ “a”, “e” 3, parent: “e” }
i/find all descendants of b:
> db.tree2find({ancestors: ‘b’})
//find all direct descendants of b:
> db.tree2.find({parent: ‘b’})
i/find all ancestors of f:
> ancestors = db.tree2findone({_id:’f’}) ancestors
> db.tree2find({_ici: { sin : ancestors})
queu’.
requirements
• see jobs waiting, jobs in progress
• ensure that each job is started once and only once
{ inprogress: false,
priority: 1,
message: “rich documents ftw!”
}
// find highest priority job and mark as in-progress job = db.jobs.findandmodify({
query: {inprogress: false}3
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new dateq}}})
i
i
s u m m a ry
schema design is different in mongodb
basic data design principals stay the same
focus on how the apps manipulates data
rapidly evolve schema to meet your requirements
enjoy your new freedom, use it wisely :-)
n
download at mongodb.org
we’re hiring!
______ info1ogen.com
conferences, appearances, and meetups
http://www.iogen.com/events
facebook i twitter i linkedin
http:i/bit.iylmonqofb ©monçjodb http:/illnkdjnhjonmongo
_____ logen 0 mongodb ____