Go to file
Igor Canadi ae25742af9 Fix race condition in manifest roll
Summary:
When the manifest is getting rolled the following happens:
1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current)
2) mutex is unlocked
3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp
4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT
5) mutex is locked

If FindObsoleteFiles happens between (3) and (4) it will:
1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_)
2) Delete old manifest (because the manifest_file_number_ already points to a new one)

I introduce the concept of prev_manifest_file_number_ that will avoid the race condition.

However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it?

Test Plan: make check

Reviewers: ljin, dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16929
2014-03-17 21:50:15 -07:00
build_tools Merge pull request #74 from alberts/lz4 2014-02-10 15:46:56 -08:00
coverage Disable the html-based coverage report by default 2014-02-06 12:58:13 -08:00
db Fix race condition in manifest roll 2014-03-17 21:50:15 -07:00
doc doc: table_stats_collectors -> table_properties_collectors. 2014-02-07 12:19:25 -08:00
hdfs Env to add a function to allow users to query waiting queue length 2014-03-11 10:19:02 -07:00
helpers/memenv Fsync directory after we create a new file 2014-01-27 11:02:21 -08:00
include keep_log_files option in BackupableDB 2014-03-17 15:39:23 -07:00
linters allow lambda function syntax in cpplint 2014-02-20 12:47:05 -08:00
port cache SuperVersion in thread local storage to avoid mutex lock 2014-02-27 11:38:55 -08:00
table Consolidate SliceTransform object ownership 2014-03-10 12:56:46 -07:00
tools Check starts_with(prefix) in MultiPrefixIterate 2014-03-17 17:02:34 -07:00
util Breaking line 2014-03-14 23:56:58 +00:00
utilities keep_log_files option in BackupableDB 2014-03-17 15:39:23 -07:00
.arcconfig Improve/fix bugs for the cpp linter 2014-02-13 17:48:11 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Put *.out to the ignore list (for MacOS) 2014-02-13 14:15:02 -08:00
CONTRIBUTING.md Update to CONTRIBUTING.md 2014-02-20 10:55:54 -08:00
HISTORY.md Env to add a function to allow users to query waiting queue length 2014-03-11 10:19:02 -07:00
INSTALL.md Update the instruction to build shared library 2014-02-24 12:29:26 -08:00
LICENSE Fix copyright year 2014-03-12 12:06:58 -07:00
Makefile Don't care about signed/unsigned compare 2014-03-17 09:41:41 -07:00
PATENTS Fix the patent format 2013-10-16 15:37:32 -07:00
README Add a pointer to the engineering design discussion forum. 2013-12-23 12:19:18 -08:00
README.fb update the latest version in README.fb to 2.7 2013-12-30 16:16:24 -08:00

README

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
for more explanation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/rocksdb/db.h
    Main interface to the DB: Start here

include/rocksdb/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/rocksdb/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/rocksdb/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/rocksdb/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/rocksdb/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/rocksdb/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/rocksdb/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/rocksdb/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/rocksdb/cache.h
    An API for the block cache.

include/rocksdb/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/rocksdb/filter_policy.h
    An API for configuring a bloom filter.

include/rocksdb/memtablerep.h
    An API for implementing a memtable.

include/rocksdb/statistics.h
    An API to retrieve various database statistics.

include/rocksdb/transaction_log.h
    An API to retrieve transaction logs from a database.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/