rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-30 04:41:49 +00:00

Author	SHA1	Message	Date
Raphael Bost	468ca61105	Break large file writes into 1GB chunks (#5213 ) Summary: This is a workaround for the issue described in #5169. It has been tested on a database with very large values, but not dedicated test has been added to the code base. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5213 Differential Revision: D15243116 Pulled By: siying fbshipit-source-id: e0c226a6cd71a60924dcd7ce7af74abcb4054484	2019-05-15 14:20:24 -07:00
Andrew Kryczka	8272a6de57	Optionally wait on bytes_per_sync to smooth I/O (#5183 ) Summary: The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways. My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes. Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE \| SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it. There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically). The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183 Differential Revision: D14953553 Pulled By: siying fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9	2019-04-22 11:51:39 -07:00
Fosco Marotto	6c2bf9e916	Add copyright headers per FB open-source checkup tool. (#5199 ) Summary: internal task: T35568575 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199 Differential Revision: D14962794 Pulled By: gfosco fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe	2019-04-18 10:55:01 -07:00
Siying Dong	ed9f5e21aa	Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165 ) Summary: Change the behavior of OptimizeForSmallDb() so that it is less likely to go out of memory. Change the behavior of OptimizeForPointLookup() to take advantage of the new memtable whole key filter, and move away from prefix extractor as well as hash-based indexing, as they are prone to misuse. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5165 Differential Revision: D14880709 Pulled By: siying fbshipit-source-id: 9af30e3c9e151eceea6d6b38701a58f1f9fb692d	2019-04-11 10:45:36 -07:00
jsteemann	313e877285	fix reading encrypted files beyond file boundaries (#5160 ) Summary: This fix should help reading from encrypted files if the file-to-be-read is smaller than expected. For example, when using the encrypted env and making it read a journal file of exactly 0 bytes size, the encrypted env code crashes with SIGSEGV in its Decrypt function, as there is no check if the read attempts to read over the file's boundaries (as specified originally by the `dataSize` parameter). The most important problem this patch addresses is however that there is no size underlow check in `CTREncryptionProvider::CreateCipherStream`: The stream to be read will be initialized to a size of always `prefix.size() - (2 * blockSize)`. If the prefix however is smaller than twice the block size, this will obviously assume a _very_ large stream and read over the bounds. The patch adds a check here as follows: // If the prefix is smaller than twice the block size, we would below read a // very large chunk of the file (and very likely read over the bounds) assert(prefix.size() >= 2 * blockSize); if (prefix.size() < 2 * blockSize) { return Status::Corruption("Unable to read from file " + fname + ": read attempt would read beyond file bounds"); } so embedders can catch the error in their release builds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5160 Differential Revision: D14834633 Pulled By: sagar0 fbshipit-source-id: 47aa39a6db8977252cede054c7eb9a663b9a3484	2019-04-08 14:57:25 -07:00
Yanqin Jin	d77476ef55	Fix db_stress for custom env (#5122 ) Summary: Fix some hdfs-related code so that it can compile and run 'db_stress' Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122 Differential Revision: D14675495 Pulled By: riversand963 fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a	2019-03-28 19:20:27 -07:00
Yanqin Jin	9358178edc	Support for single-primary, multi-secondary instances (#4899 ) Summary: This PR allows RocksDB to run in single-primary, multi-secondary process mode. The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary. Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`. This PR has several components: 1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary. 2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue. 3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`. 3.1 Tail the primary's MANIFEST during recovery. 3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`. 3.3 Tailing WAL will be in a future PR. 4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899 Differential Revision: D14510945 Pulled By: riversand963 fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886	2019-03-26 16:45:31 -07:00
Zhongyi Xie	3c5eed5ebe	remove incorrect assert in `GetUniqueIdFromFile` (#5102 ) Summary: User report has shown that sometimes `BlockBasedTable::SetupCacheKeyPrefix` would assert when trying to generate an id from the file. The actual cause seems to be hardware related but we might be better off without the incorrect assertion See T42178927 for more information Pull Request resolved: https://github.com/facebook/rocksdb/pull/5102 Differential Revision: D14604677 Pulled By: miasantreble fbshipit-source-id: fcb09207ebdc4fa66e941afbc0523d84797e7ad7	2019-03-25 23:28:29 -07:00
Rashmi Sharma	a4396f9218	Make it easier for users to load options from option file and set shared block cache. (#5063 ) Summary: [RocksDB] Make it easier for users to load options from option file and set shared block cache. Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file. If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess. Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063 Differential Revision: D14518584 Pulled By: rashmishrm fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa	2019-03-21 16:25:28 -07:00
Zhongyi Xie	a291f3a1e5	Collect compaction stats by priority and dump to info LOG (#5050 ) Summary: In order to better understand compaction done by different priority thread pool, we now collect compaction stats by priority and also print them to info LOG through stats dump. ``` Compaction Stats [default] Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 16.8 11.3 5.5 5.6 0.1 0.0 0.0 406.4 136.1 42.24 34.96 45 0.939 13M 8865K High 0/0 0.00 KB 0.0 0.0 0.0 0.0 11.4 11.4 0.0 0.0 0.0 76.2 153.00 35.74 12185 0.013 0 0 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5050 Differential Revision: D14408583 Pulled By: miasantreble fbshipit-source-id: e53746586ea27cb8abc9fec35805bd80ed30f608	2019-03-19 17:28:19 -07:00
Andrew Kryczka	186b3afaa8	Use `fallocate` even if hole-punching unsupported (#5023 ) Summary: The compiler flag `-DROCKSDB_FALLOCATE_PRESENT` was only set when `fallocate`, `FALLOC_FL_KEEP_SIZE`, and `FALLOC_FL_PUNCH_HOLE` were all present. However, the last of the three is not really necessary for the primary `fallocate` use case; furthermore, it was introduced only in later Linux kernel versions (2.6.38+). This PR changes the flag `-DROCKSDB_FALLOCATE_PRESENT` to only require `fallocate` and `FALLOC_FL_KEEP_SIZE` to be present. There is a separate check for `FALLOC_FL_PUNCH_HOLE` only in the place where it is used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5023 Differential Revision: D14248487 Pulled By: siying fbshipit-source-id: a10ed0b902fa755988e957bd2dcec9081ec0502e	2019-03-04 15:43:17 -08:00
Michael Liu	3c5d1b16b1	Apply modernize-use-override (3) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. bypass-lint drop-conflicts Reviewed By: igorsugak Differential Revision: D14131816 fbshipit-source-id: f20e7f7cecf2e699d70f5fa036f72c0e3f59b50e	2019-02-19 13:39:49 -08:00
Siying Dong	c2affccc18	Header logger should call LogHeader() (#4980 ) Summary: The info log header feature never worked well, because log level Header was not translated to Logger::LogHeader() call. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4980 Differential Revision: D14087283 Pulled By: siying fbshipit-source-id: 7e7d03ce35fa8d13d4ee549f46f7326f7bc0006d	2019-02-15 16:59:36 -08:00
Young Tack Jin	4091597c67	fix for nvme device path (#4866 ) Summary: nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1" or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as "nvme0n1p1" should be removed if nvme drive is partitioned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866 Differential Revision: D13627824 Pulled By: riversand963 fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317	2019-01-31 19:08:37 -08:00
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	2019-01-29 17:24:00 -08:00
Siying Dong	da1c64b6e7	Introduce a CPU time counter in perf_context (#4741 ) Summary: Introduce the first CPU timing counter, perf_context.get_cpu_nanos. This opens a door to more CPU counters in the future. Only Posix Env has it implemented using clock_gettime() with CLOCK_THREAD_CPUTIME_ID. How accurate the counter is depends on the platform. Make PerfStepTimer to take an Env as an argument, and sometimes pass it in. The direct reason is to make the unit tests to use SpecialEnv where we can ingest logic there. But in long term, this is a good change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4741 Differential Revision: D13287798 Pulled By: siying fbshipit-source-id: 090361049d9d5095d1d1a369fe1338d2e2e1c73f	2018-12-20 12:03:44 -08:00
Sagar Vemuri	dc3528077a	Update all unique/shared_ptr instances to be qualified with namespace std (#4638 ) Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name ".cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8	2018-11-09 11:19:58 -08:00
Sagar Vemuri	abb8ecb4cd	Add missing methods to WritableFileWrapper (#4584 ) Summary: `WritableFileWrapper` was missing some newer methods that were added to `WritableFile`. Without these functions, the missing wrapper methods would fallback to using the default implementations in WritableFile instead of using the corresponding implementations in, say, `PosixWritableFile` or `WinWritableFile`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4584 Differential Revision: D10559199 Pulled By: sagar0 fbshipit-source-id: 0d0f18a486aee727d5b8eebd3110a41988e27391	2018-10-24 12:19:54 -07:00
Zhongyi Xie	cac87fcf57	move dump stats to a separate thread (#4382 ) Summary: Currently statistics are supposed to be dumped to info log at intervals of `options.stats_dump_period_sec`. However the implementation choice was to bind it with compaction thread, meaning if the database has been serving very light traffic, the stats may not get dumped at all. We decided to separate stats dumping into a new timed thread using `TimerQueue`, which is already used in blob_db. This will allow us schedule new timed tasks with more deterministic behavior. Tested with db_bench using `--stats_dump_period_sec=20` in command line: > LOG:2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS ------- LOG:2018/09/17-14:08:05.643286 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS ------- LOG:2018/09/17-14:08:25.691325 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS ------- LOG:2018/09/17-14:08:45.740989 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS ------- LOG content: > 2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS ------- 2018/09/17-14:07:45.575080 7fe99fbfe700 [WARN] [db/db_impl.cc:606] DB Stats Uptime(secs): 20.0 total, 20.0 interval Cumulative writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5.57 GB, 285.01 MB/s Cumulative WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 GB, 285.01 MB/s Cumulative stall: 00:00:0.012 H:M:S, 0.1 percent Interval writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5700.71 MB, 285.01 MB/s Interval WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 MB, 285.01 MB/s Interval stall: 00:00:0.012 H:M:S, 0.1 percent Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Pull Request resolved: https://github.com/facebook/rocksdb/pull/4382 Differential Revision: D9933051 Pulled By: miasantreble fbshipit-source-id: 6d12bb1e4977674eea4bf2d2ac6d486b814bb2fa	2018-10-08 22:54:43 -07:00
Sagar Vemuri	b1dad4cfcc	assert in PosixEnv::FileExists should be based on errno (#4427 ) Summary: The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno. Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ... its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing. (I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427 Differential Revision: D10037200 Pulled By: sagar0 fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51	2018-09-26 13:25:15 -07:00
Anand Ananthabhotla	a27fce408e	Auto recovery from out of space errors (#4164 ) Summary: This commit implements automatic recovery from a Status::NoSpace() error during background operations such as write callback, flush and compaction. The broad design is as follows - 1. Compaction errors are treated as soft errors and don't put the database in read-only mode. A compaction is delayed until enough free disk space is available to accomodate the compaction outputs, which is estimated based on the input size. This means that users can continue to write, and we rely on the WriteController to delay or stop writes if the compaction debt becomes too high due to persistent low disk space condition 2. Errors during write callback and flush are treated as hard errors, i.e the database is put in read-only mode and goes back to read-write only fater certain recovery actions are taken. 3. Both types of recovery rely on the SstFileManagerImpl to poll for sufficient disk space. We assume that there is a 1-1 mapping between an SFM and the underlying OS storage container. For cases where multiple DBs are hosted on a single storage container, the user is expected to allocate a single SFM instance and use the same one for all the DBs. If no SFM is specified by the user, DBImpl::Open() will allocate one, but this will be one per DB and each DB will recover independently. The recovery implemented by SFM is as follows - a) On the first occurance of an out of space error during compaction, subsequent compactions will be delayed until the disk free space check indicates enough available space. The required space is computed as the sum of input sizes. b) The free space check requirement will be removed once the amount of free space is greater than the size reserved by in progress compactions when the first error occured c) If the out of space error is a hard error, a background thread in SFM will poll for sufficient headroom before triggering the recovery of the database and putting it in write-only mode. The headroom is calculated as the sum of the write_buffer_size of all the DB instances associated with the SFM 4. EventListener callbacks will be called at the start and completion of automatic recovery. Users can disable the auto recov ery in the start callback, and later initiate it manually by calling DB::Resume() Todo: 1. More extensive testing 2. Add disk full condition to db_stress (follow-on PR) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164 Differential Revision: D9846378 Pulled By: anand1976 fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a	2018-09-15 13:43:04 -07:00
cngzhnp	64324e329e	Support pragma once in all header files and cleanup some warnings (#4339 ) Summary: As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited. Besides this, try to fix some warnings about loss of data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339 Differential Revision: D9654990 Pulled By: ajkr fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848	2018-09-05 18:13:31 -07:00
Wez Furlong	d00e5de7fc	use atomic O_CLOEXEC when available (#4328 ) Summary: In our application we spawn helper child processes concurrently with opening rocksdb. In one situation I observed that the child process had inherited the rocksdb lock file as well as directory handles to the rocksdb storage location. The code in env_posix takes care to set CLOEXEC but doesn't use `O_CLOEXEC` at the time that the files are opened which means that there is a window of opportunity to leak the descriptors across a fork/exec boundary. This diff introduces a helper that can conditionally set the `O_CLOEXEC` bit for the open call using the same logic as that in the existing helper for setting that flag post-open. I've preserved the post-open logic for systems that don't have `O_CLOEXEC`. I've introduced setting `O_CLOEXEC` for what appears to be a number of temporary or transient files and directory handles; I suspect that none of the files opened by Rocks are intended to be inherited by a forked child process. In one case, `fopen` is used to open a file. I've added the use of the glibc-specific `e` mode to turn on `O_CLOEXEC` for this case. While this doesn't cover all posix systems, it is an improvement for our common deployment system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4328 Reviewed By: ajkr Differential Revision: D9553046 Pulled By: wez fbshipit-source-id: acdb89f7a85ca649b22fe3c3bd76f82142bec2bf	2018-08-29 20:27:43 -07:00
Jean-Marc Le Roux	bbf30330b4	Fix the build failure with OS_ANDROID (#4232 ) Summary: sysmacros.h should be included in OS_ANDROID build as well otherwise the compile would complain: error: use of undeclared identifier 'major'. Fixes https://github.com/facebook/rocksdb/issues/4231 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4232 Differential Revision: D9217350 Pulled By: maysamyabandeh fbshipit-source-id: 21f4b62dbbda3163120ac0b38b95d95d35d67dce	2018-08-08 08:12:02 -07:00
Maysam Yabandeh	8581a93a6b	Per-thread unique test db names (#4135 ) Summary: The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel. Example: ``` ~/gtest-parallel/gtest-parallel ./table_test``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135 Differential Revision: D8846653 Pulled By: maysamyabandeh fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84	2018-07-13 17:27:39 -07:00
Yanqin Jin	520bbb1774	Disable EnvPosixTest.RunImmediately, add EnvPosixTest.RunEventually. (#4126 ) Summary: The original `EnvPosixTest.RunImmediately` assumes that after scheduling a background thread, the thread is guaranteed to complete after 0.1 second. I do not know about any non-real-time OS/runtime providing this guarantee. Nor does C++11 standard say anything about this in the documentation of `std::thread`. In fact, we have observed this test failure multiple times on appveyor, and we haven't been able to reproduce the failure deterministically. Therefore, I disable this test for now until we know for sure how it used to fail. Instead, I add another test `EnvPosixTest.RunEventually` that checks that a thread will be scheduled eventually. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4126 Differential Revision: D8827086 Pulled By: riversand963 fbshipit-source-id: abc5cb655f90d50b791493da5eeb3716885dfe93	2018-07-12 18:27:15 -07:00
Siying Dong	926f3a78a6	In delete scheduler, before ftruncate file for slow delete, check whether there is other hard links (#4093 ) Summary: Right now slow deletion with ftruncate doesn't work well with checkpoints because it ruin hard linked files in checkpoints. To fix it, check the file has no other hard link before ftruncate it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4093 Differential Revision: D8730360 Pulled By: siying fbshipit-source-id: 756eea5bce8a87b9a2ea3a5bfa190b2cab6f75df	2018-07-09 15:28:12 -07:00
Sagar Vemuri	645e57c22d	Assert for Direct IO at the beginning in PositionedRead (#3891 ) Summary: Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO. Closes https://github.com/facebook/rocksdb/pull/3891 Differential Revision: D8267972 Pulled By: sagar0 fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7	2018-06-21 14:58:01 -07:00
Tomas Kolda	906a602c2c	Build and tests fixes for Solaris Sparc (#4000 ) Summary: Here are some fixes for build on Solaris Sparc. It is also fixing CRC test on BigEndian platforms. Closes https://github.com/facebook/rocksdb/pull/4000 Differential Revision: D8455394 Pulled By: ajkr fbshipit-source-id: c9289a7b541a5628139c6b77e84368e14dc3d174	2018-06-15 12:42:53 -07:00
Andrew Kryczka	1f32dc7d2b	Check with PosixEnv before opening LOCK file (#3993 ) Summary: Rebased and resubmitting #1831 on behalf of stevelittle. The problem is when a single process attempts to open the same DB twice, the second attempt fails due to LOCK file held. If the second attempt had opened the LOCK file, it'll now need to close it, and closing causes the file to be unlocked. Then, any subsequent attempt to open the DB will succeed, which is the wrong behavior. The solution was to track which files a process has locked in PosixEnv, and check those before opening a LOCK file. Fixes #1780. Closes https://github.com/facebook/rocksdb/pull/3993 Differential Revision: D8398984 Pulled By: ajkr fbshipit-source-id: 2755fe66950a0c9de63075f932f9e15768041918	2018-06-13 17:32:04 -07:00
Zhongyi Xie	f1592a06c2	run make format for PR 3838 (#3954 ) Summary: PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings. Run `make format` to fix formatting as suggested by siying . Also piggyback two changes: 1) fix singleton destruction order for windows and posix env 2) fix two clang warnings Closes https://github.com/facebook/rocksdb/pull/3954 Differential Revision: D8272041 Pulled By: miasantreble fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f	2018-06-05 12:58:02 -07:00
Andrew Kryczka	2210152947	Fix singleton destruction order of PosixEnv and SyncPoint (#3951 ) Summary: Ensure the PosixEnv singleton is destroyed first since its destructor waits for background threads to all complete. This ensures background threads cannot hit sync points after the SyncPoint singleton is destroyed, which was previously possible. Closes https://github.com/facebook/rocksdb/pull/3951 Differential Revision: D8265295 Pulled By: ajkr fbshipit-source-id: 7738dd458c5d993a78377dd0420e82badada81ab	2018-06-04 15:58:46 -07:00
奏之章	6e08916eb3	Fix Fadvise on closed file when reads use mmap Summary: ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called. Closes https://github.com/facebook/rocksdb/pull/2764 Differential Revision: D8152515 Pulled By: ajkr fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65	2018-05-25 10:57:57 -07:00
Dmitri Smirnov	3db8504cde	Catchup with posix features Summary: Catch up with Posix features NewWritableRWFile must fail when file does not exists Implement Env::Truncate() Adjust Env options optimization functions Implement MemoryMappedBuffer on Windows. Closes https://github.com/facebook/rocksdb/pull/3857 Differential Revision: D8053610 Pulled By: ajkr fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e	2018-05-24 15:13:04 -07:00
Andrew Kryczka	072ae671a7	Apply use_direct_io_for_flush_and_compaction to writes only Summary: Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used. This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously. Closes https://github.com/facebook/rocksdb/pull/3829 Differential Revision: D7915443 Pulled By: ajkr fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279	2018-05-09 19:42:58 -07:00
Siying Dong	3690276e74	Disallow to open RandomRW file if the file doesn't exist Summary: The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case. Closes https://github.com/facebook/rocksdb/pull/3827 Differential Revision: D7913719 Pulled By: siying fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492	2018-05-09 10:27:26 -07:00
Andrew Kryczka	19fde54841	initialize local variable for UBSAN in PosixEnv function Summary: this is a repeat commit of `a8a28da215`, which got reverted together with `6afe22db2e`, but forgotten about when that commit was un-reverted in `46152d53bf`. Closes https://github.com/facebook/rocksdb/pull/3796 Differential Revision: D7826077 Pulled By: ajkr fbshipit-source-id: edb22375da56e2feda50c5b35f942f4d2d52b19c	2018-05-01 13:27:05 -07:00
Andrew Kryczka	46152d53bf	Second attempt at db_stress crash-recovery verification Summary: - Original commit: `a4fb1f8c04` - Revert commit (we reverted as a quick fix to get crash tests passing): `6afe22db2e` This PR includes the contents of the original commit plus two bug fixes, which are: - In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs. - Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported. Closes https://github.com/facebook/rocksdb/pull/3793 Differential Revision: D7811671 Pulled By: ajkr fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30	2018-04-30 12:27:34 -07:00
Andrew Kryczka	6afe22db2e	revert db_stress crash-recovery verification Summary: crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures. Closes https://github.com/facebook/rocksdb/pull/3786 Differential Revision: D7794516 Pulled By: ajkr fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33	2018-04-27 12:57:01 -07:00
Nathan VanBenschoten	37cd617b6b	Add virtual Truncate method to Env Summary: This change adds a virtual `Truncate` method to `Env`, which truncates the named file to the specified size. At the moment, this is only supported for `MockEnv`, but other `Env's` could be extended to override the method too. This is the same approach that methods like `LinkFile` and `AreSameFile` have taken. This is useful for any user of the in-memory `Env`. The implementation's header is not exported, so before this change, it was impossible to access it's already existing `Truncate` method. Closes https://github.com/facebook/rocksdb/pull/3779 Differential Revision: D7785789 Pulled By: ajkr fbshipit-source-id: 3bcdaeea7b7180529f7d9b496dc67b791a00bbf0	2018-04-26 21:12:51 -07:00
Andrew Kryczka	dfc61e7c24	initialize local variable for UBSAN in PosixEnv function Summary: It seems clear to me that the variable is initialized before line 492, but it wasn't clear to UBSAN. The failure was: ``` In file included from ./env/io_posix.h:14:0, from env/env_posix.cc:44: ./include/rocksdb/env.h: In member function ‘virtual rocksdb::Status rocksdb::{anonymous}::PosixEnv::NewMemoryMappedFileBuffer(const string&, std::unique_ptr<rocksdb::MemoryMappedFileBuffer>)’: ./include/rocksdb/env.h:822:36: error: ‘base’ may be used uninitialized in this function [-Werror=maybe-uninitialized] : base(_base), length(_length) {} ^ env/env_posix.cc:482:11: note: ‘base’ was declared here void base; ``` We can just initialize to nullptr to keep UBSAN happy. Closes https://github.com/facebook/rocksdb/pull/3770 Differential Revision: D7756287 Pulled By: ajkr fbshipit-source-id: 0f2efb9594e2d3a30706a4ca7e1d4a6328031bf2	2018-04-25 13:42:02 -07:00
Andrew Kryczka	a4fb1f8c04	Add crash-recovery correctness check to db_stress Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes https://github.com/facebook/rocksdb/pull/3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b	2018-04-24 15:58:22 -07:00
Gabriel Wicke	090c78a0d7	Support lowering CPU priority of background threads Summary: Background activities like compaction can negatively affect latency of higher-priority tasks like request processing. To avoid this, rocksdb already lowers the IO priority of background threads on Linux systems. While this takes care of typical IO-bound systems, it does not help much when CPU (temporarily) becomes the bottleneck. This is especially likely when using more expensive compression settings. This patch adds an API to allow for lowering the CPU priority of background threads, modeled on the IO priority API. Benchmarks (see below) show significant latency and throughput improvements when CPU bound. As a result, workloads with some CPU usage bursts should benefit from lower latencies at a given utilization, or should be able to push utilization higher at a given request latency target. A useful side effect is that compaction CPU usage is now easily visible in common tools, allowing for an easier estimation of the contribution of compaction vs. request processing threads. As with IO priority, the implementation is limited to Linux, degrading to a no-op on other systems. Closes https://github.com/facebook/rocksdb/pull/3763 Differential Revision: D7740096 Pulled By: gwicke fbshipit-source-id: e5d32373e8dc403a7b0c2227023f9ce4f22b413c	2018-04-24 08:41:51 -07:00
Yi Wu	2e72a5899b	Disable EnvPosixTest::FilePermission Summary: The test is flaky in our CI but could not be reproduce manually on the same CI host. Disabling it. Closes https://github.com/facebook/rocksdb/pull/3753 Differential Revision: D7716320 Pulled By: yiwu-arbug fbshipit-source-id: 6bed3b05880c1d24e8dc86bc970e5181bc98fb45	2018-04-20 15:42:42 -07:00
Andrew Kryczka	3cea61392f	include thread-pool priority in thread names Summary: Previously threads were named "rocksdb:bg\<index in thread pool\>", so the first thread in all thread pools would be named "rocksdb:bg0". Users want to be able to distinguish threads used for flush (high-pri) vs regular compaction (low-pri) vs compaction to bottom-level (bottom-pri). So I changed the thread naming convention to include the thread-pool priority. Closes https://github.com/facebook/rocksdb/pull/3702 Differential Revision: D7581415 Pulled By: ajkr fbshipit-source-id: ce04482b6acd956a401ef22dc168b84f76f7d7c1	2018-04-18 17:27:56 -07:00
Zhongyi Xie	954b496b3f	fix memory leak in two_level_iterator Summary: this PR fixes a few failed contbuild: 1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406 2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662 3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag. Closes https://github.com/facebook/rocksdb/pull/3718 Reviewed By: maysamyabandeh Differential Revision: D7621192 Pulled By: miasantreble fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146	2018-04-15 17:26:26 -07:00
Xiaofei Du	a0102aa6d7	Make database files' permissions configurable Summary: Closes https://github.com/facebook/rocksdb/pull/3709 Differential Revision: D7610227 Pulled By: xiaofeidu008 fbshipit-source-id: 88a52f0f9f96e2195fccde995cf9760b785e9f07	2018-04-13 13:13:04 -07:00
Steven Fackler	1f5457ef21	Merge raw and shared pointer log method impls Summary: Calling rocksdb::Log, rocksdb::Info, etc with a `shared_ptr<Logger>` should behave the same as calling those functions with a `Logger `. This PR achieves it by making the `shared_ptr<Logger>` versions delegate to the `Logger ` versions. Closes #3689 Closes https://github.com/facebook/rocksdb/pull/3710 Differential Revision: D7595557 Pulled By: ajkr fbshipit-source-id: 64dd7f20fd42dc821bac7b8032705c35b483e00d	2018-04-13 11:12:54 -07:00
David Lai	3be9b36453	comment unused parameters to turn on -Wunused-parameter flag Summary: This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557. Closes https://github.com/facebook/rocksdb/pull/3662 Differential Revision: D7426121 Pulled By: Dayvedde fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352	2018-04-12 17:59:16 -07:00
Bruce Mitchener	a3a3f5497c	Fix some typos in comments and docs. Summary: Closes https://github.com/facebook/rocksdb/pull/3568 Differential Revision: D7170953 Pulled By: siying fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c	2018-03-08 10:27:25 -08:00
Bruce Mitchener	0de710f5b8	Use nullptr instead of NULL / 0 more consistently. Summary: Closes https://github.com/facebook/rocksdb/pull/3569 Differential Revision: D7170968 Pulled By: yiwu-arbug fbshipit-source-id: 308a6b7dd358a04fd9a7de3d927bfd8abd57d348	2018-03-07 12:42:12 -08:00
amytai	0a3db28d98	Disallow compactions if there isn't enough free space Summary: This diff handles cases where compaction causes an ENOSPC error. This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize. It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC. Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions. Closes https://github.com/facebook/rocksdb/pull/3449 Differential Revision: D7016941 Pulled By: amytai fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445	2018-03-06 16:27:54 -08:00
Fosco Marotto	d518fe1da6	uint64_t and size_t changes to compile for iOS Summary: In attempting to build a static lib for use in iOS, I ran in to lots of type errors between uint64_t and size_t. This PR contains the changes I made to get `TARGET_OS=IOS make static_lib` to succeed while also getting Xcode to build successfully with the resulting `librocksdb.a` library imported. This also compiles for me on macOS and tests fine, but I'm really not sure if I made the correct decisions about where to `static_cast` and where to change types. Also up for discussion: is iOS worth supporting? Getting the static lib is just part one, we aren't providing any bridging headers or wrappers like the ObjectiveRocks project, it won't be a great experience. Closes https://github.com/facebook/rocksdb/pull/3503 Differential Revision: D7106457 Pulled By: gfosco fbshipit-source-id: 82ac2073de7e1f09b91f6b4faea91d18bd311f8e	2018-03-06 12:43:51 -08:00
Dmitri Smirnov	c364eb42b5	Windows cumulative patch Summary: This patch addressed several issues. Portability including db_test std::thread -> port::Thread Cc: @ and %z to ROCKSDB portable macro. Cc: maysamyabandeh Implement Env::AreFilesSame Make the implementation of file unique number more robust Get rid of C-runtime and go directly to Windows API when dealing with file primitives. Implement GetSectorSize() and aling unbuffered read on the value if available. Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976 Fix test running script issue where $status var was of incorrect scope so the failures were swallowed and not reported. DestroyDB() creates a logger and opens a LOG file in the directory being cleaned up. This holds a lock on the folder and the cleanup is prevented. This fails one of the checkpoin tests. We observe the same in production. We close the log file in this change. Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test attempts to open a directory with NewRandomAccessFile which does not work on Windows. Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug Closes https://github.com/facebook/rocksdb/pull/3552 Differential Revision: D7156304 Pulled By: siying fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b	2018-03-06 11:57:43 -08:00
Andrew Kryczka	5d68243e61	Comment out unused variables Summary: Submitting on behalf of another employee. Closes https://github.com/facebook/rocksdb/pull/3557 Differential Revision: D7146025 Pulled By: ajkr fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e	2018-03-05 13:13:41 -08:00
Zhongyi Xie	92b1a68d87	fix FreeBSD build Summary: Currently FreeBSD build is broken in master and possibly some previous releases due to unrecognized symbol `O_DIRECT`. This PR will fix the build on FreeBSD Closes https://github.com/facebook/rocksdb/pull/3560 Differential Revision: D7148646 Pulled By: miasantreble fbshipit-source-id: 95b6c3d310fa531267c086b2cd40a5ab1c042b5a	2018-03-05 11:12:28 -08:00
Anand Ananthabhotla	dfbe52e099	Fix the Logger::Close() and DBImpl::Close() design pattern Summary: The recent Logger::Close() and DBImpl::Close() implementation rely on calling the CloseImpl() virtual function from the destructor, which will not work. Refactor the implementation to have a private close helper function in derived classes that can be called by both CloseImpl() and the destructor. Closes https://github.com/facebook/rocksdb/pull/3528 Reviewed By: gfosco Differential Revision: D7049303 Pulled By: anand1976 fbshipit-source-id: 76a64cbf403209216dfe4864ecf96b5d7f3db9f4	2018-02-23 13:57:26 -08:00
Igor Sugak	aba3409740	Back out "[codemod] - comment out unused parameters" Reviewed By: igorsugak fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37	2018-02-22 12:43:17 -08:00
David Lai	f4a030ce81	- comment out unused parameters Reviewed By: everiq, igorsugak Differential Revision: D7046710 fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461	2018-02-22 09:44:23 -08:00
jsteemann	4e7a182d09	Several small "fixes" Summary: - removed a few unneeded variables - fused some variable declarations and their assignments - fixed right-trimming code in string_util.cc to not underflow - simplifed an assertion - move non-nullptr check assertion before dereferencing of that pointer - pass an std::string function parameter by const reference instead of by value (avoiding potential copy) Closes https://github.com/facebook/rocksdb/pull/3507 Differential Revision: D7004679 Pulled By: sagar0 fbshipit-source-id: 52944952d9b56dfcac3bea3cd7878e315bb563c4	2018-02-15 16:57:37 -08:00
Tamir Duberstein	cd5092e168	Suppress unused warnings Summary: - Use `__unused__` everywhere - Suppress unused warnings in Release mode + This currently affects non-MSVC builds (e.g. mingw64). Closes https://github.com/facebook/rocksdb/pull/3448 Differential Revision: D6885496 Pulled By: miasantreble fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211	2018-02-02 12:27:07 -08:00
Anand Ananthabhotla	d0f1b49ab6	Add a Close() method to DB to return status when closing a db Summary: Currently, the only way to close an open DB is to destroy the DB object. There is no way for the caller to know the status. In one instance, the destructor encountered an error due to failure to close a log file on HDFS. In order to prevent silent failures, we add DB::Close() that calls CloseImpl() which must be implemented by its descendants. The main failure point in the destructor is closing the log file. This patch also adds a Close() entry point to Logger in order to get status. When DBOptions::info_log is allocated and owned by the DBImpl, it is explicitly closed by DBImpl::CloseImpl(). Closes https://github.com/facebook/rocksdb/pull/3348 Differential Revision: D6698158 Pulled By: anand1976 fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d	2018-01-16 11:08:57 -08:00
Yi Wu	bbcd3b0bd2	Suppress valgrind "unimplemented functionality" error Summary: Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints. Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print. Closes https://github.com/facebook/rocksdb/pull/3174 Differential Revision: D6338786 Pulled By: yiwu-arbug fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa	2017-11-15 14:28:34 -08:00
Dmitri Smirnov	f8e2db0717	Fix crashes, address test issues and adjust windows test script Summary: Add per-exe execution capability Add fix parsing of groups/tests Add timer test exclusion Fix unit tests Ifdef threadpool specific tests that do not pass on Vista threadpool. Remove spurious outout from prefix_test so test case listing works properly. Fix not using standard test directories results in file creation errors in sst_dump_test. BlobDb fixes: In C++ end() iterators can not be dereferenced. They are not valid. When deleting blob_db_ set it to nullptr before any other code executes. Not fixed:. On Windows you can not delete a file while it is open. [ RUN ] BlobDBTest.ReadWhileGC d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options) IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options) IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied write_batch Should not call front() if there is a chance the container is empty Closes https://github.com/facebook/rocksdb/pull/3152 Differential Revision: D6293274 Pulled By: sagar0 fbshipit-source-id: 318c3717c22087fae13b18715dffb24565dbd956	2017-11-10 10:41:57 -08:00
Shaohua Li	eefd75a228	Stream Summary: Add a simple policy for NVMe write time life hint Closes https://github.com/facebook/rocksdb/pull/3095 Differential Revision: D6298030 Pulled By: shligit fbshipit-source-id: 9a72a42e32e92193af11599eb71f0cf77448e24d	2017-11-10 09:26:24 -08:00
Shaohua Li	33c7d4ccd9	Make writable_file_max_buffer_size dynamic Summary: The DBOptions::writable_file_max_buffer_size can be changed dynamically. Closes https://github.com/facebook/rocksdb/pull/3053 Differential Revision: D6152720 Pulled By: shligit fbshipit-source-id: aa0c0cfcfae6a54eb17faadb148d904797c68681	2017-10-31 13:56:35 -07:00
Prashant D	c1be8d86c6	Fix removed structurally dead return statement Summary: There seems to be a typo mistake in env ReuseWritableFile func where status is being returned twice. Closes https://github.com/facebook/rocksdb/pull/3099 Differential Revision: D6196204 Pulled By: ajkr fbshipit-source-id: abb6e3e1c1e772dd485fc39e7f1b9d502fa188fe	2017-10-31 01:26:13 -07:00
zach shipko	386a57e6ef	Fix build on OpenBSD Summary: A few simple changes to allow RocksDB to be built on OpenBSD. Let me know if any further changes are needed. Closes https://github.com/facebook/rocksdb/pull/3061 Differential Revision: D6138800 Pulled By: ajkr fbshipit-source-id: a13a17b5dc051e6518bd56a8c5efd1d24dd81b0c	2017-10-24 13:27:38 -07:00
Dmitri Smirnov	d2a65c59e1	Fix unused var warnings in Release mode Summary: MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var. Closes https://github.com/facebook/rocksdb/pull/3048 Differential Revision: D6126272 Pulled By: maysamyabandeh fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f	2017-10-23 14:27:04 -07:00
Dmitri Smirnov	ebab2e2d42	Enable MSVC W4 with a few exceptions. Fix warnings and bugs Summary: Closes https://github.com/facebook/rocksdb/pull/3018 Differential Revision: D6079011 Pulled By: yiwu-arbug fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57	2017-10-19 10:57:12 -07:00
Andrew Kryczka	4708a6875c	Repair DBs with trailing slash in name Summary: Problem: - `DB::SanitizeOptions` strips trailing slash from `wal_dir` but not `dbname` - We check whether `wal_dir` and `dbname` refer to the same directory using string equality: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L258 - Providing `dbname` with trailing slash causes default `wal_dir` to be misidentified as a separate directory. - Then the repair tries to add all SST files to the `VersionEdit` twice (once for `dbname` dir, once for `wal_dir`) and fails with coredump. Solution: - Add a new `Env` function, `AreFilesSame`, which uses device and inode number to check whether files are the same. It's currently only implemented in `PosixEnv`. - Migrate repair to use `AreFilesSame` to check whether `dbname` and `wal_dir` are same. If unsupported, falls back to string comparison. Closes https://github.com/facebook/rocksdb/pull/2827 Differential Revision: D5761349 Pulled By: ajkr fbshipit-source-id: c839d548678b742af1166d60b09abd94e5476238	2017-09-22 12:42:22 -07:00
Andrew Kryczka	cc01985db0	Introduce bottom-pri thread pool for large universal compactions Summary: When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs. This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions. - Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads. - Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default. - Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic. - Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`). - Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free. Closes https://github.com/facebook/rocksdb/pull/2580 Differential Revision: D5422916 Pulled By: ajkr fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333	2017-08-03 15:43:29 -07:00
Siying Dong	21696ba502	Replace dynamic_cast<> Summary: Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available. Some nontrivial changes: 1. Add Comparator::GetRootComparator() to get around the internal comparator hack 2. Add the two experiemental functions to DB 3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string 4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric. Closes https://github.com/facebook/rocksdb/pull/2645 Differential Revision: D5502723 Pulled By: siying fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c	2017-07-28 16:27:16 -07:00
Sagar Vemuri	72502cf227	Revert "comment out unused parameters" Summary: This reverts the previous commit `1d7048c598`, which broke the build. Did a `git revert 1d7048c`. Closes https://github.com/facebook/rocksdb/pull/2627 Differential Revision: D5476473 Pulled By: sagar0 fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06	2017-07-21 18:26:26 -07:00
Victor Gao	1d7048c598	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 14:57:44 -07:00
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	2017-07-15 16:11:23 -07:00
Aaron Gao	43e4eef77f	remove unnecessary fadvise Summary: We has to remove this line because previously it is only called when use_os_buffer = false. But now we have direct io to replace it. Closes https://github.com/facebook/rocksdb/pull/2573 Differential Revision: D5412824 Pulled By: yiwu-arbug fbshipit-source-id: 81f3f0cdf94566bfc09ef2ff123e40cddbe36b36	2017-07-12 18:14:41 -07:00
Andrew Kryczka	33042573db	Fix GetCurrentTime() initialization for valgrind Summary: Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case. Closes https://github.com/facebook/rocksdb/pull/2526 Differential Revision: D5358689 Pulled By: ajkr fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42	2017-07-05 12:12:00 -07:00
Ewout Prangsma	51778612c9	Encryption at rest support Summary: This PR adds support for encrypting data stored by RocksDB when written to disk. It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files. The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done. Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!). The Counter operation mode uses an initial counter & random initialization vector (IV). Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize). The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there. To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests. Typically you would run it like this: ``` ENCRYPTED_ENV=1 make check_some ``` There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings. Closes https://github.com/facebook/rocksdb/pull/2424 Differential Revision: D5322178 Pulled By: sdwilsh fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f	2017-06-26 16:56:24 -07:00
Siying Dong	857e9960be	Improve the error message for I/O related errors. Summary: Force people to write something other than file name while returning status for IOError. Closes https://github.com/facebook/rocksdb/pull/2493 Differential Revision: D5321309 Pulled By: siying fbshipit-source-id: 38bcf6c19e80831cd3e300a047e975cbb131d822	2017-06-26 12:57:01 -07:00
Anirban Rahut	7a270069b3	GNU C library for struct tm has 2 additional fields. Summary: initialize 2 additional fields tm_gmtoff and tm_zone, otherwise under strict warnings for initialization, we get errors in myrocks. Closes https://github.com/facebook/rocksdb/pull/2439 Differential Revision: D5229013 Pulled By: yiwu-arbug fbshipit-source-id: 9fc1615a1919656f36064791706ed41e10e9db84	2017-06-13 04:41:35 -07:00
Mike Kolupaev	bc09c8a0db	Fix crash in PosixWritableFile::Close() when fstat() fails Summary: We had a crash in this code: `fstat()` failed; `file_stats` contained garbage, in particular `file_stats.st_blksize == 6`; the expression `file_stats.st_blocks / (file_stats.st_blksize / 512)` divided by zero. Closes https://github.com/facebook/rocksdb/pull/2420 Differential Revision: D5216110 Pulled By: al13n321 fbshipit-source-id: 6d8fc5e7c4f98c1139e68c7829ebdbac68b0fce0	2017-06-08 19:56:22 -07:00
Yi Wu	6d0f22e42f	Fix mock_env.cc uninitialized variable Summary: Mingw is complaining about uninitialized variable in mock_env.cc. e.g. https://travis-ci.org/facebook/rocksdb/jobs/240132276 The fix is to initialize the variable. Closes https://github.com/facebook/rocksdb/pull/2428 Differential Revision: D5211306 Pulled By: yiwu-arbug fbshipit-source-id: ee02bf0327dcea8590a2aa087f0176fecaf8621c	2017-06-08 17:41:59 -07:00
Maysam Yabandeh	d4f7731b61	fix travis error with init time in mockenv Summary: /home/travis/build/facebook/rocksdb/env/mock_env.cc: In member function ‘virtual void rocksdb::{anonymous}::TestMemLogger::Logv(const char*, va_list)’: /home/travis/build/facebook/rocksdb/env/mock_env.cc:391:53: error: ‘t.tm::tm_year’ may be used uninitialized in this function [-Werror=maybe-uninitialized] static_cast<int>(now_tv.tv_usec)); Closes https://github.com/facebook/rocksdb/pull/2418 Differential Revision: D5193597 Pulled By: maysamyabandeh fbshipit-source-id: 8801a3ef27f33eb419d534f7de747702cdf504a0	2017-06-08 10:41:18 -07:00
Maysam Yabandeh	550a1df72c	Fix clang errors by asserting the precondition Summary: USE_CLANG=1 make -j32 analyze The two errors would disappear after the assertion. Closes https://github.com/facebook/rocksdb/pull/2416 Differential Revision: D5193526 Pulled By: maysamyabandeh fbshipit-source-id: 16a21f18f68023f862764dd3ab9e00ca60b0eefa	2017-06-06 12:56:52 -07:00
Maysam Yabandeh	5a9b4d7435	Retire memenv https://github.com/facebook/rocksdb/pull/2082 Summary: This is a manual commit of this PR: Retire InMemoryEnv in favor of MockEnv #2082 With MockEnv doing the same yet being more mature, InMemoryEnv is redundant. Reviewed By: IslamAbdelRahman Differential Revision: D5162323 fbshipit-source-id: 59fd0082a891dc99cc531e4da9d68bf891eae3f5	2017-06-01 15:41:20 -07:00
Andrew Kryczka	6cc9aef162	New API for background work in single thread pool Summary: Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool. Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool. Closes https://github.com/facebook/rocksdb/pull/2204 Differential Revision: D4936256 Pulled By: ajkr fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01	2017-05-23 11:12:27 -07:00
Aaron Gao	3e86c0f07c	disable direct reads for log and manifest and add direct io to tests Summary: Disable direct reads for log and manifest. Direct reads should not affect sequential_file Also add kDirectIO for option_config_ in db_test_util Closes https://github.com/facebook/rocksdb/pull/2337 Differential Revision: D5100261 Pulled By: lightmark fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b	2017-05-22 18:41:28 -07:00
Anirban Rahut	d85ff4953c	Blob storage pr Summary: The final pull request for Blob Storage. Closes https://github.com/facebook/rocksdb/pull/2269 Differential Revision: D5033189 Pulled By: yiwu-arbug fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06	2017-05-10 15:14:44 -07:00
Aaron Gao	6b99dbe049	fix memory alignment with logical sector size Summary: we align the buffer with logical sector size and should not test it with page size, which is usually 4k. Closes https://github.com/facebook/rocksdb/pull/2245 Differential Revision: D5001842 Pulled By: lightmark fbshipit-source-id: a7135fcf6351c6db363e8908956b1e193a4a6291	2017-05-04 01:30:13 -07:00
Siying Dong	a2b05210ef	Make PosixLogger::flush_pending_ atomic Summary: TSAN sometimes complaints data race of PosixLogger::flush_pending_. Make it atomic. Closes https://github.com/facebook/rocksdb/pull/2231 Differential Revision: D4973397 Pulled By: siying fbshipit-source-id: 571e886e3eca3231705919d573e250c1c1ec3764	2017-04-28 17:07:56 -07:00
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	2017-04-27 18:06:12 -07:00
Aaron Gao	6616e4d629	add prefetch to PosixRandomAccessFile in buffered io Summary: Every time after a compaction/flush finish, we issue user reads to put the table into block cache which includes a couple of IO that read footer, index blocks, meta block, etc. So we implement Prefetch here to reduce IO. Closes https://github.com/facebook/rocksdb/pull/2196 Differential Revision: D4931782 Pulled By: lightmark fbshipit-source-id: 5a13d58dcab209964352322217193bbf7ff78149	2017-04-26 14:47:23 -07:00
Aaron Gao	3b4d1b7a44	add <sys/sysmacros.h> to avoid warning with glibc 2.25 Summary: https://github.com/facebook/rocksdb/issues/2152 Closes https://github.com/facebook/rocksdb/pull/2208 Differential Revision: D4945577 Pulled By: lightmark fbshipit-source-id: 4e679150f2c9443d3be0b6008b26b65fabbda75a	2017-04-26 01:26:55 -07:00
Tomas Kolda	04d58970cb	AIX and Solaris Sparc Support Summary: Replacement of #2147 The change was squashed due to a lot of conflicts. Closes https://github.com/facebook/rocksdb/pull/2194 Differential Revision: D4929799 Pulled By: siying fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581	2017-04-21 20:48:04 -07:00
Aaron Gao	0716527341	remove warning Summary: st_blocks shows 16 though the right value is 8. This happens occasionally which seems a bug. Closes https://github.com/facebook/rocksdb/pull/2160 Differential Revision: D4893542 Pulled By: lightmark fbshipit-source-id: 68e832586b58bbc6162efbe83ce273f1570d5be3	2017-04-14 18:56:14 -07:00
Aaron Gao	44fa8ece9b	change use_direct_writes to use_direct_io_for_flush_and_compaction Summary: Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get(). Closes https://github.com/facebook/rocksdb/pull/2117 Differential Revision: D4860912 Pulled By: lightmark fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19	2017-04-13 16:12:04 -07:00
Aaron Gao	13b50358fb	add space for buggy kernel warning Summary: add the missing space Closes https://github.com/facebook/rocksdb/pull/2150 Differential Revision: D4880696 Pulled By: lightmark fbshipit-source-id: a4e0ad6a8ea45d6469d3f6c8514fdeb4cf10aaf5	2017-04-13 16:12:04 -07:00
Aaron Gao	ba7da434ae	fix db_stress crash caused by buggy kernel warning Summary: filter the warning out and only print it once. Closes https://github.com/facebook/rocksdb/pull/2137 Differential Revision: D4870925 Pulled By: lightmark fbshipit-source-id: 91b363ce7f70bce88b0780337f408fc4649139b8	2017-04-11 16:56:59 -07:00
Nikhil Benesch	72fc1e9d07	avoid non-existent O_DIRECT on OpenBSD Summary: OpenBSD doesn't have `O_DIRECT`, so avoid it. (RocksDB compiles successfully on OpenBSD with this patch.) Closes https://github.com/facebook/rocksdb/pull/2106 Differential Revision: D4847833 Pulled By: siying fbshipit-source-id: 214b785	2017-04-07 10:39:15 -07:00

1 2 3 4

152 commits