WiredTiger File Forensics Part 3: Viewing all the MongoDB Data

wiredtiger file forensics

This article continues on from 1: Building “wt” and “Part 2: wt dump” to show how to extract any of your MongoDB documents directly from WiredTiger’s raw data files. It’ll also show how to take a peek into the index files. Lastly, it’ll show how to also look in the WT transaction log to see updates made since the latest checkpoint.

[porto_content_box align=”left”]⚠ Warning: the wt dump tool usually opens files in read-write mode, even for commands you’d think would be read-only. It will automatically step through its normal recovery process most of the time, so it may change files.

Until you know its effects on data files do not use it on your only copy of precious data – make a copy of the data directory and learn with the copy first.[/porto_content_box]

List up the Collections and Indexes

WiredTiger doesn’t name any of its data files according to the MongoDB object names, so as a first step you’ll have to extract a table of WT idents (=identifiers) vs. collection and index names.

The _mdb_catalog.wt file is not the top table in the WiredTiger storage engine’s own hierarchy of data sources. To the MongoDB layer of code though it is the complete definition of collection and index objects in the database. This includes both MongoDB system and user-made collections and indexes.

Dump WT ident vs Collections

As explained in the previous Part 2: wt dump blog post, you should reuse this command to save a tab-delimited file of WiredTiger ident(ifier)s vs collection names. I’ll call this file wt_ident_vs_collection_ns.tsv in this article.

$ #cd to a copy of a MongoDB data directory
$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("md")) | [.ident, .ns] | @tsv' | sort > wt_ident_vs_collection_ns.tsv
$
$ head -n 5 wt_ident_vs_collection_ns.tsv
collection-0--4131298130356306083	config.cache.chunks.test.bar
collection-0-5834121039240263510	local.replset.initialSyncId
collection-0-5841128870485063129	local.startup_log
collection-0--6422702119521843596	config.system.sessions
collection-10--4131298130356306083	config.cache.chunks.test.foo
        

Eg. if an ident is “collection-4-5841128870485063129” then there will be a file collection-4-5841128870485063129.wt that has the data of a MongoDB collection.

In case you have a vague memory of seeing strings like these somewhere whilst using the mongo shell, you probably have. These idents are the same as the ones shown in the “wiredTiger.uri” field of the db.collection.stats() command output.

Optional: Dump WT ident vs Indexes

The index WT idents can be dumped as well. The example below will save them to wt_vs_index_ns.tsv in three columns [WT ident, collection name, index name].

$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("idxIdent")) | .ns as $nsT | .idxIdent | to_entries[] | [.value, $nsT, .key] | @tsv' | sort > wt_ident_vs_index_ns.tsv
$
$ head -n 5 wt_vs_index_ns.tsv
index-11--4131298130356306083	config.cache.chunks.test.foo	_id_
index-12--4131298130356306083	config.cache.chunks.test.foo	lastmod_1
index-12-5841128870485063129	local.system.replset	_id_
index-1--4131298130356306083	config.cache.chunks.test.bar	_id_
index-14-5841128870485063129	admin.system.version	_id_
        

Looking at the Application (= MongoDB) Table Data

collection-*.wt and index-*.wt files contain the data of the MongoDB collections and indexes that you observe as a client connected to a MongoDB instance. Once you’ve identified the WT ident value for the collection or index you want to inspect you can use that in the URI argument to the wt dump command.

wt dump collections to *.bson files

The wt dump output of collection-* tables has a basic numeric type key and BSON data as the values.

Use wt dump -x in your terminal on the WT file for a collection and you will see, after the header section, records of the WT table in alternating lines of key and value.

Eg. in the example below the first key is numeric/binary value 0x81 and the first value is the one starting with binary bytes 0xce000000025f6964001e. The second key is 0x82, and its value starts with 0xce000000025f69640021. The values are BSON which were much longer in reality, I only trimmed it for readability at the moment.

$ wt dump -x collection-X-XXXXXXXXXX
WiredTiger Dump (WiredTiger Version 10.0.0)
Format=hex
Header
file:WiredTiger.wt
access_pattern_hint=none,allocation_size=4KB,app.....
Data
81
68000000025f6964001e000000424d50524f442d425a41342d31...
82
6b000000025f69640021000000424d50524f442d574f524b4552...
...
        

FYI Using the alternative URI argument syntax “wt dump -x table:collection-X-XXXXXXXXXX” or “wt dump -x file:collection-X-XXXXXXXXXX.wt” will produce the same.

The key value in the collection-* tables isn’t needed to see the document content so print only every second line for the values using the following command:

$ wt dump -x collection-X-XXXXXXXXXX | tail -n +7 | awk 'NR%2 == 0 { print }'
68000000025f6964001e000000424d50524f442d425a41342d31...
6b000000025f69640021000000424d50524f442d574f524b4552....
66000000025f6964001c000000424d50524f442d574f524b4552...
6b000000025f69640021000000424d50524f442d574f524b4552...
64000000025f6964001a000000424d50524f442d44415441312d...
        

The command above prints binary BSON in a hex string format with a newline separating each record. We can translate that hexadecimal back to the original binary using the xxd command utility with the “-r” and “-p” flags. (Note: Don’t combine as one “-rp” flag. It doesn’t work like most unix command’s short options.)

$ #Look for the WT ident of my test.customerOrder collection
$ grep customerOrder wt_ident_vs_collection_ns.tsv
collection-14--3398103177079662761	test.customerOrder
$
$ ls -lh collection-14--3398103177079662761.wt
-rw------- 1 akira akira 40K Jun 10 15:12 collection-14--3398103177079662761.wt
$
$ #dump and convert its values to a plain BSON file:
$ wt dump -x collection-14--3398103177079662761 | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p > test.customerOrder.bson
$
$ #Confirm the content using bsondump
$ bsondump --quiet test.customerOrder.bson
{"_id":"123456","_class":"model.customer_order.CustomerOrder","orderReference":{"businessInteractionId":"str ... "orderAttributeValue":"string"}]}]}]}]}}
{"_id":"ORN-billingissue01","_class":"com.ctl.bm.servi ... startDate":"2018-02-02T00:00:00.000Z"}],"existingTN":[]}
2021-06-10T15:12:16.627+0900	2 objects found
        

“wt read” a single record?

Sorry, you can’t read your MongoDB collection data with the “ wt read” shell command. The blocker is trivial – as of WT v10.0.0 the “wt read” command only accepts plain text or its own “r” recordid numeric value as the lookup key value. The keys for the mongodb collections and indexes however are ‘q’ and ‘u’ types respectively. (Documentation: WT Schema Format types.)

In MongoDB, you might know you can use a showRecordId cursor option so it’s tempting to think that this is the same “r” type that wt read can currently accept, but unfortunately it is not. See the key_format=X value in the header of wt dump output samples to confirm.

If wt read (code = utilities/util_read.c) was modified to accept an -x argument so we could pass the hex strings we already see in wt dump output this issue would be solved. But as it isn’t, for now, you have to dump all records even if you just want one.

This is only a limitation in the shell. If you use the API from within a programming language, including the Python SWIG binding available, you should be able to read just a single record.

Looking at MongoDB’s Index Files

wt dump index-*.wt files

index-* WiredTiger tables have two different formats that are easy to see when using wt dump to look inside them – one with just keys, and one with both keys and values. The WT keys are binary values generated from MongoDB’s KeyString structure.

Below is an example of an index on {state: 1, process: 1} on a collection called config.locks. This is a case where there are no values in the WT table for the index.

$ grep index-15--749209957533832251 wt_ident_vs_index_ns.tsv
index-15--7492099575338322516	config.locks	state_1_process_1
$
$ wt dump -x index-15--7492099575338322516 | tail -n +7
293c436f6e66696753657276657200040008    <keys on odd-numbered line>
   <these even-numbered lines are where the WT value would be>
293c436f6e66696753657276657200040040
 
293c7072642d6d6f6e2d7065722d73686172642d6130333a32373031383a313538363938343138383a3139393331363932353337323937373139383300040028
 
....
 
293c70726f642d6d6f6e676f2d73686172642d6d6f6e676f732d302e6d6f6e676f2e706572636f6e612e636f6d3a32373031373a313535343633323838333a38363534383239393034393534383037333200040078
 
2b043c436f6e66696753657276657200040060
 
$
        

Below is an example of an {_id: 1} index on the collection test.rlru. This is a case when there are both WT keys and values.

$ grep index-1-2871579003788456567 wt_ident_vs_index_ns.tsv
index-1-2871579003788456567	test.rlru	_id_
$
$ wt dump -x index-1-287157900378845656 | tail -n +7
2904
0008
2b0204
0010
2b0404
0018
...
...
2c0faa04
203eb1
2c0fac04
203eb9
...
        

Given the point of an index record lookup is to have a value that points to a record in the collection-X-XXXX WT table, you should be asking “How can that first index above be useful without values?”

The answer is the recordid is packed on the end of the key. You’ll notice in the first example they all have 0x04 as the third-last byte. This is how MongoDB packs a recordId when the value is between 0 and 2^10 – 1 I believe. See KeyString appendRecordId() if you want to get into it further.

By the way, an index’s field names are constant and thus irrelevant to sort order, so they’re not part of the keystrings.

Writing about the KeyString format even in just the shortest summary would take a whole blog post. So I’ll just punch out some translations of the binary above as a teaser and stop there.

Keystring 293c436f6e66696753657276657200040008 =>

Keystring 2b043c436f6e66696753657276657200040060 =>

Keystring 2b0404 =>

Value 001001 =>

If looking for matching records in the collection-X-XXX table look for key (0x80 + the recordid) value from the index file. Eg. 2 -> 0x82; 12 -> 0x9c. For some reason 0x0 – 0x80 seems to be reserved and so key values in the collection-X-XXX.wt files are all incremented by 0x80 higher than the recordId value in the index-X-XXX.wt records.

In the end though, as we don’t have an ability to use wt read, all this poking around in the indexes from a shell command can only be for satisfying curiosity. Not for making single-record access fast.

Looking at the WT Transaction Log

A WiredTiger checkpoint saves a copy of documents in all collections and indexes as they were at one exact point in time (the time the checkpoint is started). MongoDB will call for a checkpoint to be made once per minute by default.

Without something else being saved to disk, a sudden crash would mean that restores/restarts could only revert to the last checkpoint. The classic database concept of write-ahead log is the solution to this of course. In WiredTiger this is provided by the transaction log, often just called “log” in its own documentation. Or as the documentation also says it adds “commit-level” durability to checkpoint durability.

At restart, whether it is after a perfectly normal shutdown or a crash, WiredTiger will read and replay the writes it finds in its transaction log onto the tables, in memory. In time, when the MongoDB layer requests a new checkpoint be created, the in-memory restored data will be saved to disk.

When you use wt dump it’s not easy to say if you’re looking at the collection (or index) restored as of the last checkpoint or with the transaction log “recovered” (read and applied) as well. If the “-R” global option of the wt command is used then yes log is recovered; if the opposite “-r” option is used then no. But which is in effect if you specify neither is unclear. Also, I’ve seen comments or diagnostic messages that suggest the -r option isn’t perfect.

Two Layers of Recovery

WiredTiger would, by default, keep no transaction log for the tables it manages for the application embedding it. It’s only the request of the application that engages it.

When a mongod node restarts with an existing data directory the WiredTiger library will run recovery. It can’t be stopped (at least not by the MongoDB user). This is not the end of the story though. When replication was used it means only the “local” db collections, in particular the oplog, is restored. The replication system code has a ReplicationRecovery class that is used next, and this will apply updates made since the last durable checkpoint time from the oplog to the user and system collections they’re supposed to be in. Only after that occurs will recovery be complete and the db server will make the data available to clients.

ReplicationRecovery allows MongoDB to do rollbacks (at least to the checkpoint time) and also to trim off ops in the last replication batch application. It’s an ugly point but MongoDB applies replication ops in small batches in parallel to improve performance; this only happens on secondaries usually of course, but also after a restart. If all the writes in the batch finish that is fine, but if they don’t some writes may be missing which is a clear consistency violation. So the unfinished batch’s writes should all be cleared, back to the last saved oplogTruncateAfterPoint value. repl.oplogTruncateAfterPoint is a single-document collection in the local db. It exists only for this purpose as far as I know.

Diagram for Typical Restart Recovery

“Typical” = restarting a node that is a replica. Doesn’t matter if it was a primary or secondary, or even the only node in the replica set.

                                                WiredTiger’s recovery
                                                        ⇓

                                        WT table for local.oplog.rs updated
                                                        ⇓

                                                ReplicationRecovery
                                                        ⇓

                                    Trim oplog docs where “ts” > oplogTruncateAfterPoint
                                                        ⇓

                                                oplog application
                        Pseudo code:
                        db.oplog.rs.find({ts > durable checkpoint time}).forEach(function(d) {
                            //re-apply write to intended collection in “test”, “mydb”, “admin”, etc. db
                            applyOps(opdoc);
                        }
                                                        ⇓

                                                      Ready!
        

wt printlog

You can use the wt printlog command to see what is in the log currently in the WiredTigerLog. files in the journal/ subdirectory. If you decode the write ops in there you can understand what document versions will be restored by MongoDB when it restarts.

Preparation to Using wt printlog – Map WT idents to fileid

Until now you’ve learned that every MongoDB collection and index is in its own WT table file, which means needing to learn what the WT ident(ifier) is to find the right raw WT file to look into.

There’s another small integer id value, a.k.a. fileid, in WT metadata / WT table config strings for each file too. I suppose it saves space; at any rate, the WT transaction log references files only by this number. This means you’ll have to build a handy mapping or list of which WT ident is which fileid. Use the following command to create a file I call wt_ident_vs_fileid.tsv.

$ wt dump file:WiredTiger.wt | grep -B 1 ',id=' | sed 's/^file:(.*).wt0/1t/; s/.*,id=([0-9][0-9]*),.*/1/' | paste -d" " - - > wt_ident_vs_fileid.tsv
 
$ cat wt_ident_vs_fileid.tsv
WiredTigerHS	 268
_mdb_catalog	 3
collection-0--7000593926523175565	 269
collection-0--7468664482515188034	 196
collection-0--9200920937255444113	 79
...
collection-8-2787176720526139723	 12
collection-83--6694766911154494072	 194
index-1--7000593926523175565	 270
index-1--7468664482515188034	 197
index-1-2787176720526139723	 5
...
index-9--7468664482515188034	 205
index-9-2787176720526139723	 13
sizeStorer	 2
        

WT idents don’t suggest to the human reader which MongoDB collection or index is involved though, so after that use this shell command to join wt_ident_vs_fileid.tsv to the wt_ident_vs_collection_ns.tsv and wt_ident_vs_index_ns.tsv files made earlier.

$ join <(sort wt_ident_vs_fileid.tsv) <(cat wt_ident_vs_collection_ns.tsv wt_ident_vs_index_ns.tsv | sort) | sort -n -k 2 | sed 's/ /t/g' > wt_ident_vs_sorted_fileid_and_mdb_name.tsv
$ cat wt_ident_vs_sorted_fileid_and_mdb_name.tsv
collection-0-2787176720526139723	4	local.startup_log
index-1-2787176720526139723	5	local.startup_log	_id_
collection-2-2787176720526139723	6	local.replset.oplogTruncateAfterPoint
index-3-2787176720526139723	7	local.replset.oplogTruncateAfterPoint	_id_
collection-4-2787176720526139723	8	local.replset.minvalid
index-5-2787176720526139723	9	local.replset.minvalid	_id_
...
collection-16-2787176720526139723	20	local.oplog.rs
collection-17-2787176720526139723	21	admin.system.version
index-18-2787176720526139723	22	admin.system.version	_id_
...
        

Not shown: WiredTiger’s own metadata table WiredTiger.wt is fileid 0. I presume this is hardcoded.

Transaction Log Files

In a brutally trimmed summary, the content of the transaction log is something like this:

{ checkpoint marker }
...
...
{ lsn : [42, 7424], ops: [ {fileid: i, key: x1, value: y1 } ] }
{ lsn : [42, 8448], ops: [ {fileid: i, key: x2, value: y2 }, {...}, {...} ] }
{ lsn : [42, 8960], ops: [ {fileid: j, key: x6, value: y6 } ] }
{ lsn : [43,    0], ops: [ {fileid: i, key: x7, value: y7 }, {...} ] }
{ lsn : [42,  256], ops: [ {fileid: i, key: x9, value: y9 } ] }
...
        

Going into details:

With that explained, let’s look at a real sample now:

$ #To use wt printlog the WiredTiger config string definitely needs to be set.
$ #  See the part 1 of this blog series for details. In this example I'll do it by
$ #  the -C option, but there are two other ways as well
$
$ wt -C "log=(compressor=snappy,path=journal)" printlog -xu | head -n 100
[
  { "lsn" : [55,128],
    "hdr_flags" : "",
    "rec_len" : 128,
    "mem_len" : 128,
    "type" : "system",
    "ops": [
      { "optype": "prev_lsn",
        "prev_lsn": [54, 98566272]
      }
    ]
  },
  { "lsn" : [55,256],
    "hdr_flags" : "",
    "rec_len" : 384,
    "mem_len" : 384,
    "type" : "commit",
    "txnid" : 812443,
    "ops": [
      { "optype": "row_put",
        "fileid": 20 0x14,
        "key": "u00e8`u0088u000bu00c2u00ffu00ffu00e0u00e7",
        "key-hex": "e860880bc2ffffe0e7",
        "value": "u00f3u0000u0000u0000u0002opu0000u0002u0000u0000u0000du0000u0002nsu0000 u0000u0000u0000config.transaction_coordinatorsu0000u0005uiu0000u0010u0000u0000u0000u0004u00f7zu001082u00c6Aru0089nxu00cb(u00edDu0080u0003ou0000pu0000u0000u0000u0003_idu0000fu0000u0000u0000u0003lsidu0000Hu0000u0000u0000u0005idu0000u0010u0000u0000u0000u0004u00d4u00dcu00acu00efu00e7u0087@u00a7u009du00dfu00c6.u00d6u00f4u00f0u00ecu0005uidu0000 u0000u0000u0000u0000u00e3u00b0u00c4Bu0098u00fcu001cu0014u009au00fbu00f4u00c8u0099ou00b9$'u00aeAu00e4du009bu0093Lu00a4u0095u0099u001bxRu00b8Uu0000u0012txnNumberu0000su0004u0000u0000u0000u0000u0000u0000u0000u0000u0011tsu0000'u0001u0000u0000u00c3u000bu0088`u0012tu0000iu0000u0000u0000u0000u0000u0000u0000twallu0000(u00f5mu0013yu0001u0000u0000u0012vu0000u0002u0000u0000u0000u0000u0000u0000u0000u0000",
        "value-hex": "f3000000026f7000020000006400026e730020000000636f6e6669672e7472616e73616374696f6e5f636f6f7264696e61746f727300057569001000000004f77a103832c6410d890a78cb28ed4480036f0070000000035f69640066000000036c7369640048000000056964001000000004d4dcacefe78740a79ddfc62ed6f4f0ec05756964002000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855001274786e4e756d62657200730400000000000000001174730027010000c30b886012740069000000000000000977616c6c0028f56d1379010000127600020000000000000000"
      }
    ]
  },
  { ....
        

The plain “key” and “value” fields are a mixture of printable chars and hex encoding for non-printable chars, unfortunately, encoded as UTF-16 rather than ascii or UTF-8. This is too hard to work with, so I add the -x option to also print the hex strings the same as wt dump -x does.

In this sample fileid 20 is for local.oplog.rs and this typical content for the WT log with MongoDB. Using xxd -r -p we can see what is going into the value:

$ echo  "f3000000026f7000020000006400026e730020000000636f6e6669672e7472616e73616374696f6e5f636f6f7264696e61746f727300057569001000000004f77a103832c6410d890a78cb28ed4480036f0070000000035f69640066000000036c7369640048000000056964001000000004d4dcacefe78740a79ddfc62ed6f4f0ec05756964002000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855001274786e4e756d62657200730400000000000000001174730027010000c30b886012740069000000000000000977616c6c0028f56d1379010000127600020000000000000000" | xxd -r -p | bsondump --quiet | jq  .
{
  "op": "d",
  "ns": "config.transaction_coordinators",
  "ui": {
    "$binary": {  "base64": "93oQODLGQQ2JCnjLKO1EgA==", "subType": "04" }
  },
  "o": {
    "_id": {
      "lsid": {
        "id": {  "$binary": { "base64": "1Nys7+eHQKed38Yu1vTw7A==", "subType": "04" } },
        "uid": {
          "$binary": { "base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=", "subType": "00" }
        }
      },
      "txnNumber": {  "$numberLong": "1139" }
    }
  },
  "ts": {  "$timestamp": {"t": 1619528643, "i": 295 } },
  "t": { "$numberLong": "105" },
  "wall": { "$date": { "$numberLong": "1619528643880" } },
  "v": { "$numberLong": "2" }
}
        

An insert to the oplog that is for a d(elete) of the document in “config.transaction_coordinators” with “id”: { “$binary”: { “base64”: “1Nys7+eHQKed38Yu1vTw7A==”, “subType”: “04” } }.

Summary

To look at the MongoDB collections, indexes, and the transaction log (a.k.a. the mongodb journal) you can use the wt dump and wt printlog commands. Before doing any of that you’ll have to make a mapping of WT table ident(ifier)s vs. MongoDB collection and index names, and also a list of fileids for the log. This information can be extracted using wt dump on WiredTiger.wt and _mdb_catalog.wt tables. Well, that and a medium amount of grep, sed, awk, etc.

The collection-*.wt files have key-value pairs of [recordId, BSON binary]. The index-*.wt files can also be read, where you can see the binary of the MongoDB-defined “keystrings”. Due to limitations with wt read described above in the article, there is no practical way currently with the “wt” util command to do index lookups.

Looking at the transaction log with wt printlog command will show the operations that will be applied during restart to progress the data from the state it was at the last durable checkpoint to the moment the last journal commit happened before shutdown.

This article is cited from Percona.