Commit Graph

192 Commits

Author SHA1 Message Date
Cheng Chang 4fc216649d Support direct IO in RandomAccessFileReader::MultiRead (#6446)
Summary:
By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined.

In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API.

For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446

Test Plan: Added a new test `random_access_file_reader_test.cc`.

Reviewed By: anand1976

Differential Revision: D20097518

Pulled By: cheng-chang

fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1
2020-03-20 16:33:26 -07:00
Peter Dillinger 43dde332cb Share kPageSize (and other small tweaks) (#6443)
Summary:
Make kPageSize extern const size_t (used in draft https://github.com/facebook/rocksdb/issues/6427)
Make kLitteEndian constexpr bool
Clarify a couple of comments
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6443

Test Plan: make check, CI

Differential Revision: D20044558

Pulled By: pdillinger

fbshipit-source-id: e0c5cc13229c82726280dc0ddcba4078346b8418
2020-02-22 08:01:36 -08:00
sdong fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
Huisheng Liu 2fdd8087ce Implement getfreespace for WinEnv (#6265)
Summary:
A new interface method Env::GetFreeSpace was added in https://github.com/facebook/rocksdb/issues/4164. It needs to be implemented for Windows port. Some error_handler_test cases fail on Windows because recovery cannot succeed without free space being reported.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6265

Differential Revision: D19303065

fbshipit-source-id: 1f1a83e53f334284781cf61feabc996e87b945ca
2020-01-07 13:56:13 -08:00
John Ericson c16b087427 Work around weird unused errors with Mingw (#6075)
Summary:
From the reset of the code, it looks this this maybe can be unconditionally given the attribute? But I couldn't test with MSVC so I defensively put under CPP.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6075

Differential Revision: D18723749

fbshipit-source-id: 45fc8732c28dd29aab1644225d68f3c6f39bd69b
2019-11-26 21:42:29 -08:00
Andrew Kryczka b00761eea6 Fix block cache ID uniqueness for Windows builds (#5844)
Summary:
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844

Test Plan:
we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change

Differential Revision: D17752442

fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
2019-10-11 18:19:31 -07:00
sdong e8263dbdaa Apply formatter to recent 200+ commits. (#5830)
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830

Test Plan: Run all existing tests.

Differential Revision: D17488031

fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
2019-09-20 12:04:26 -07:00
Shylock Hg 9eb3e1f77d Use delete to disable automatic generated methods. (#5009)
Summary:
Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009

Differential Revision: D17288733

fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3
2019-09-11 18:09:00 -07:00
Tomas Kolda 6e8a1354a7 Fix regression - 100% CPU - Regression for Windows 7 (#5557)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/5552
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5557

Differential Revision: D16266329

fbshipit-source-id: a8f6b50298a6f7c8d6c7e172bb26dd7eb6bd8a4d
2019-07-15 12:19:49 -07:00
Siying Dong 000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Siying Dong 8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Siying Dong e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Sagar Vemuri dda474399a Remove PATENTS text from a few straggler files (#5326)
Summary:
Remove PATENTS related wording from a few stragglers which still reference the old PATENTS file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5326

Differential Revision: D15423297

Pulled By: sagar0

fbshipit-source-id: 4babcddfc120b7d2fed6eb3898287cf8012bf8ea
2019-05-21 16:22:35 -07:00
Andrew Kryczka 8272a6de57 Optionally wait on bytes_per_sync to smooth I/O (#5183)
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.

My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.

Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.

There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).

The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183

Differential Revision: D14953553

Pulled By: siying

fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
2019-04-22 11:51:39 -07:00
datonli f0edf9d575 #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152)
Summary:
mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152

Differential Revision: D14779409

Pulled By: siying

fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
2019-04-04 11:38:19 -07:00
Burton Li d1edf4eced Format env_win coding style by google c++ style guide (#5096)
Summary:
The existing code for env_win src and header file doesn't fully followed the recommended code style (https://google.github.io/styleguide/cppguide.html#Functions). Fix it for better readability.
anand1976 siying
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5096

Differential Revision: D14585358

Pulled By: anand1976

fbshipit-source-id: 7ce35ffe9e922f5c1421b0bbaa5fce7abad57617
2019-03-27 16:38:20 -07:00
Yanqin Jin 9358178edc Support for single-primary, multi-secondary instances (#4899)
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.

This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.

2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.

3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.

4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899

Differential Revision: D14510945

Pulled By: riversand963

fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
2019-03-26 16:45:31 -07:00
Burton Li 88d85b6820 fix NowNanos overflow (#5062)
Summary:
The original implementation of WinEnvIO::NowNanos() has a constant data overflow by:
li.QuadPart *= std::nano::den;
As a result, the api provides a incorrect result.
e.g.:
li.QuadPart=13477844301545
std::nano::den=1e9

The fix uses pre-computed nano_seconds_per_period_ to present the nano seconds per performance counter period, in the case if nano::den is divisible by perf_counter_frequency_. Otherwise it falls back to use high_resolution_clock.
siying ajkr
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5062

Differential Revision: D14426842

Pulled By: anand1976

fbshipit-source-id: 127f1daf423dd4b30edd0dcf8ea0466f468bec12
2019-03-21 15:18:36 -07:00
Levi Tamasi 34f8ac0c99 Make adaptivity of LRU cache mutexes configurable (#5054)
Summary:
The patch adds a new config option to LRUCacheOptions that enables
users to choose whether to use an adaptive mutex for the LRU block
cache (on platforms where adaptive mutexes are supported). The default
is true if RocksDB is compiled with -DROCKSDB_DEFAULT_TO_ADAPTIVE_MUTEX,
false otherwise.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5054

Differential Revision: D14542749

Pulled By: ltamasi

fbshipit-source-id: 0065715ab6cf91f10444b737fed8c8aee6a8a0d2
2019-03-20 12:33:44 -07:00
Zhongyi Xie fdc72a5c5d add OptionType kInt32T and kInt64T
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5061

Differential Revision: D14418581

Pulled By: miasantreble

fbshipit-source-id: be7f90e16586666ddd0cce36971e403782ab0892
2019-03-12 13:49:52 -07:00
Sagar Vemuri dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Wilfried Goesgens 5d809ecef7 Add compile time option to work with utf8 filename strings (#4469)
Summary:
The default behaviour of rocksdb is to use the `*A(` windows API functions.
These accept filenames in the currently configured system encoding,
be it Latin 1, utf8 or whatever.
If the Application intends to completely work with utf8 strings internally,
converting these to that codepage properly isn't even always possible.
Thus this patch adds a switch to use the `*W(` functions, which accept
UTF-16 filenames, and uses C++11 features to translate the
UTF8 containing std::string to an UTF16 containing std::wstring.

This feature is a compile time options, that can be enabled by setting `WITH_WINDOWS_UTF8_FILENAMES` to true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4469

Differential Revision: D10356011

Pulled By: yiwu-arbug

fbshipit-source-id: 27b6ae9171f209085894cdf80069e8a896642044
2018-10-11 23:24:28 -07:00
Dmitry Alimov e13d8dcbbb Fix typos in comments (#4456)
Summary:
Fix some typos in the comments
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4456

Differential Revision: D10209214

Pulled By: miasantreble

fbshipit-source-id: dff857ba60396bc95126e635db96d7dc8330d2cb
2018-10-04 20:46:50 -07:00
Constantin Belyaev 2353c5c821 Fix cross-filesystem checkpoint on Windows (#4365)
Summary:
Now port/win_env.cc do check error for cross device link creation.
Fixes #4364
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4365

Differential Revision: D9833144

Pulled By: ajkr

fbshipit-source-id: be7555e510f4b8d2196d843841606a6cfada7644
2018-09-14 10:28:39 -07:00
cngzhnp 64324e329e Support pragma once in all header files and cleanup some warnings (#4339)
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.

Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339

Differential Revision: D9654990

Pulled By: ajkr

fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
2018-09-05 18:13:31 -07:00
Yanqin Jin bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Dmitri Smirnov ab22cf349e Implement Env::NumFileLinks (#4221)
Summary:
Although delete scheduler implementation allows for the interface not to be supported, the delete_scheduler_test does not allow for that.
Address compiler warnings
Make sst_dump_test use test directory structure as the current execution directory may not be writiable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4221

Differential Revision: D9210152

Pulled By: siying

fbshipit-source-id: 381a74511e969ecb8089d5c4b4df87dc30c8df63
2018-08-09 14:29:11 -07:00
Siying Dong a5e851e113 Reformatting some recent changes (#4161)
Summary:
Lint is not happy with some new code recently committed. Format them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161

Differential Revision: D8940582

Pulled By: siying

fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
2018-07-20 14:43:38 -07:00
Dmitri Smirnov 78ab11cd71 Return new operator for Status allocations for Windows (#4128)
Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive.

Differential Revision: D8878047

Pulled By: siying

fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36
2018-07-19 15:09:06 -07:00
Yanqin Jin 524c6e6b72 Add file name info to SequentialFileReader. (#4026)
Summary:
We potentially need this information for tracing, profiling and diagnosis.
Closes https://github.com/facebook/rocksdb/pull/4026

Differential Revision: D8555214

Pulled By: riversand963

fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2
2018-06-21 08:42:24 -07:00
Dmitri Smirnov 61d69d450d Hide jemalloc aligned allocation functions into .cc (#4025)
Summary:
so they could be overriden
Closes https://github.com/facebook/rocksdb/pull/4025

Differential Revision: D8526287

Pulled By: siying

fbshipit-source-id: 9537b299dc907b4d1eeaf77a8784b13cb058280d
2018-06-19 17:12:06 -07:00
shpala 9c7da963bc Fix a crash in WinEnvIO::GetSectorSize (#3975)
Summary:
Fix a crash in `WinEnvIO::GetSectorSize` that happens on old Windows systems (e.g Windows 7).
On old Windows systems that don't support querying StorageAccessAlignmentProperty using IOCTL_STORAGE_QUERY_PROPERTY, the flow calls a different DeviceIoControl with nullptr as lpBytesReturned.
When the code reaches this point, we get an access violation.
Closes https://github.com/facebook/rocksdb/pull/3975

Differential Revision: D8385186

Pulled By: ajkr

fbshipit-source-id: fae4c9b4b0a52c8a10182e1b35bcaa30dc393bbb
2018-06-12 13:45:18 -07:00
Zhongyi Xie f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Dmitri Smirnov f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Dmitri Smirnov 3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Dmitri Smirnov 934f96de27 Better destroydb
Summary:
Delete archive directory before WAL folder
  since archive may be contained as a subfolder.
  Also improve loop readability.
Closes https://github.com/facebook/rocksdb/pull/3797

Differential Revision: D7866378

Pulled By: riversand963

fbshipit-source-id: 0c45d97677ce6fbefa3f8d602ef5e2a2a925e6f5
2018-05-03 16:13:09 -07:00
Dmitri Smirnov acb61b7a52 Adjust pread/pwrite to return Status
Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
2018-05-01 13:42:46 -07:00
Dmitri Smirnov 61785c73ed CloseHandle docs says that the return is non-zero, does not say TRUE(1)
Summary:
say it is TRUE(1). Add assert.
Closes https://github.com/facebook/rocksdb/pull/3630

Differential Revision: D7346895

Pulled By: ajkr

fbshipit-source-id: a46075aa4dd89f32520230606adecccecc874cdf
2018-03-20 18:43:02 -07:00
Dmitri Smirnov c364eb42b5 Windows cumulative patch
Summary:
This patch addressed several issues.
  Portability including db_test std::thread -> port::Thread Cc: @
  and %z to ROCKSDB portable macro. Cc: maysamyabandeh

  Implement Env::AreFilesSame

  Make the implementation of file unique number more robust

  Get rid of C-runtime and go directly to Windows API when dealing
  with file primitives.

  Implement GetSectorSize() and aling unbuffered read on the value if
  available.

  Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976

  Fix test running script issue where $status var was of incorrect scope
  so the failures were swallowed and not reported.

  DestroyDB() creates a logger and opens a LOG file in the directory
  being cleaned up. This holds a lock on the folder and the cleanup is
  prevented. This fails one of the checkpoin tests. We observe the same in production.
  We close the log file in this change.

 Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
 attempts to open a directory with NewRandomAccessFile which does not
 work on Windows.
  Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
Closes https://github.com/facebook/rocksdb/pull/3552

Differential Revision: D7156304

Pulled By: siying

fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
2018-03-06 11:57:43 -08:00
Dmitri Smirnov 7eb292da14 Fix a memory leak in WindowsThread
Summary:
_endthreadex does not return and thus objects
  for stack destructors do not run. This creates a memory leak.
  We remove the calls since _enthreadex called automatically after the
  threadproc returns i.e. thread exits.
Closes https://github.com/facebook/rocksdb/pull/3542

Differential Revision: D7088713

Pulled By: ajkr

fbshipit-source-id: 749ecafc6a9572f587f76e516547e07734349a54
2018-02-26 13:46:12 -08:00
Siying Dong ef29d2a234 Explictly fail writes if key or value is not smaller than 4GB
Summary:
Right now, users will encounter unexpected bahavior if they use key or value larger than 4GB. We should explicitly fail the queriers.
Closes https://github.com/facebook/rocksdb/pull/3484

Differential Revision: D6953895

Pulled By: siying

fbshipit-source-id: b60491e1af064fc5d52971956661f6c18ceac24f
2018-02-09 14:57:54 -08:00
Tamir Duberstein cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Siying Dong ccc095a016 Speed up BlockTest.BlockReadAmpBitmap
Summary:
BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
(1) improve the way the verification is done. With this it is 5 times faster
(2) run fewer tests for large blocks. This cut it down by another 10 times.
Now it can finish in similar time as other tests.
Closes https://github.com/facebook/rocksdb/pull/3313

Differential Revision: D6643711

Pulled By: siying

fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
2018-01-02 10:41:28 -08:00
burtonli b5c99cc908 Disable onboard cache for compaction output
Summary:
FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.

After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.

Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput

 Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 56.897 ms | 35.593 ms | -37.4%
P99 | 3.905 ms | 3.896 ms | -2.8%

Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.

Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 86.227   ms | 50.436 ms | -41.5%
P99 | 8.415   ms | 3.356 ms | -60.1%
Closes https://github.com/facebook/rocksdb/pull/3225

Differential Revision: D6624174

Pulled By: miasantreble

fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
2017-12-21 18:41:34 -08:00
Dmitri Smirnov fe608e32ab Fix a race condition in WindowsThread (port::Thread)
Summary:
Fix a race condition when we create a thread and immediately destroy
 This case should be supported.
  What happens is that the thread function needs the Data instance
  to actually run but has no shared ownership and must rely on the
  WindowsThread instance to continue existing.
  To address this we change unique_ptr to shared_ptr and then
  acquire an additional refcount for the threadproc which destroys it
  just before the thread exit.
  We choose to allocate shared_ptr instance on the heap as this allows
  the original thread to continue w/o waiting for the new thread to start
  running.
Closes https://github.com/facebook/rocksdb/pull/3240

Differential Revision: D6511324

Pulled By: yiwu-arbug

fbshipit-source-id: 4633ff7996daf4d287a9fe34f60c1dd28cf4ff36
2017-12-07 13:42:53 -08:00
Shaohua Li 33c7d4ccd9 Make writable_file_max_buffer_size dynamic
Summary:
The DBOptions::writable_file_max_buffer_size can be changed dynamically.
Closes https://github.com/facebook/rocksdb/pull/3053

Differential Revision: D6152720

Pulled By: shligit

fbshipit-source-id: aa0c0cfcfae6a54eb17faadb148d904797c68681
2017-10-31 13:56:35 -07:00
Dmitri Smirnov 682db81385 Enable cacheline_aligned_alloc() to allocate from jemalloc if enabled.
Summary:
Reuse WITH_JEMALLOC option in preparation for module search unification.
  Move jemalloc overrides into a separate .cc
  Remote obsolete JEMALLOC_NOINIT option.
Closes https://github.com/facebook/rocksdb/pull/3078

Differential Revision: D6174826

Pulled By: yiwu-arbug

fbshipit-source-id: 9970a0289b4490272d15853920d9d7531af91140
2017-10-27 13:27:12 -07:00
Dmitri Smirnov d2a65c59e1 Fix unused var warnings in Release mode
Summary:
MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var.
Closes https://github.com/facebook/rocksdb/pull/3048

Differential Revision: D6126272

Pulled By: maysamyabandeh

fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f
2017-10-23 14:27:04 -07:00
Dmitri Smirnov ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Orgad Shaneh 34ebadf930 Fix MinGW build
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298

Differential Revision: D5070457

Pulled By: yiwu-arbug

fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
2017-09-19 10:28:26 -07:00
Dmitri Smirnov 0ec90a7cc2 Add -DPORTABLE=1 to MSVC CI build
Summary:
Add -DPORTABLE=1
  port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running.
Closes https://github.com/facebook/rocksdb/pull/2815

Differential Revision: D5751661

Pulled By: siying

fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0
2017-08-31 16:42:48 -07:00
Andrew Kryczka 47ed3bfc3b fix WinEnv assertions
Summary: Closes https://github.com/facebook/rocksdb/pull/2702

Differential Revision: D5585389

Pulled By: ajkr

fbshipit-source-id: cb54041eb481d0d759c440f82a8a2c5b34534173
2017-08-08 17:20:52 -07:00
Daniel Black 16e0388205 LRUCacheShard cache line size alignment
Summary:
combining #2568 and #2612.
Closes https://github.com/facebook/rocksdb/pull/2620

Differential Revision: D5464394

Pulled By: IslamAbdelRahman

fbshipit-source-id: 9f71d3058dd6adaf02ce3b2de3a81a1228009778
2017-07-24 10:54:37 -07:00
Siying Dong 3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Dmitri Smirnov a21db161c9 Implement ReopenWritibaleFile on Windows and other fixes
Summary:
Make default impl return NoSupported so the db_blob
  tests exist in a meaningful manner.
  Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465

Differential Revision: D5275563

Pulled By: yiwu-arbug

fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
2017-06-20 10:31:13 -07:00
Orgad Shaneh 9bb91e9328 Dedup release
Summary:
cc tamird sagar0
Closes https://github.com/facebook/rocksdb/pull/2325

Differential Revision: D5098302

Pulled By: sagar0

fbshipit-source-id: 297c5506b5d9b2ed1d7719c8caf0b96cffe503b8
2017-06-12 13:13:06 -07:00
Andrew Kryczka 6cc9aef162 New API for background work in single thread pool
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.

Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204

Differential Revision: D4936256

Pulled By: ajkr

fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
2017-05-23 11:12:27 -07:00
Tamir Duberstein 146b7718f0 Fix mingw compilation with -DNDEBUG
Summary:
This was exposed by a48a62d, which made NDEBUG the default for cmake
builds.
Closes https://github.com/facebook/rocksdb/pull/2315

Differential Revision: D5079583

Pulled By: sagar0

fbshipit-source-id: c614e96a40df016a834a62b6236852265e7ee4db
2017-05-17 22:56:48 -07:00
Gunnar Kudrjavets 0b69e50791 Define CACHE_LINE_SIZE only when it's not defined
Summary:
RocksDB is compiled as part of MyRocks (MySQL storage engine) build.
MySQL already defines `CACHE_LINE_SIZE` and therefore we're getting a
conflict. Change RocksDB definition to be more cognizant of this.
Closes https://github.com/facebook/rocksdb/pull/2257

Differential Revision: D5013188

Pulled By: gunnarku

fbshipit-source-id: cfa76fe99f90dcd82aa09204e2f1f35e07a82b41
2017-05-08 16:12:28 -07:00
Tamir Duberstein fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Siying Dong d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Dmitri Smirnov cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Siying Dong d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Orgad Shaneh 6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Dmitri Smirnov c9df05d1e4 Fix random access alignment
Summary:
This fixes an issue when the most recent readers assume that alignment is always set even if direct io is off.
Also adjust slightly appveyor script to run db_basic_test cases concurrently.
Closes https://github.com/facebook/rocksdb/pull/1959

Differential Revision: D4671972

Pulled By: IslamAbdelRahman

fbshipit-source-id: 1886620
2017-03-08 17:09:11 -08:00
Dmitri Smirnov 0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Dmitri Smirnov 324a0f988e Follow up for DirectIO refactor
Summary: Windows follow up for  dc2584eea0

Differential Revision: D4420337

Pulled By: IslamAbdelRahman

fbshipit-source-id: fedc5b5
2017-01-15 13:24:16 -08:00
Aaron Gao 3e6899d116 change UseDirectIO() to use_direct_io()
Summary:
also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path.
Closes https://github.com/facebook/rocksdb/pull/1770

Differential Revision: D4416435

Pulled By: lightmark

fbshipit-source-id: 4143c53
2017-01-13 12:09:15 -08:00
Dmitri Smirnov 3c233ca4ea Fix Windows environment issues
Summary:
Enable directIO on WritableFileImpl::Append
     with offset being current length of the file.
     Enable UniqueID tests on Windows, disable others but
     leeting them to compile. Unique tests are valuable to
     detect failures on different filesystems and upcoming
     ReFS.
     Clear output in WinEnv Getchildren.This is different from
     previous strategy, do not touch output on failure.
     Make sure DBTest.OpenWhenOpen works with windows error message
Closes https://github.com/facebook/rocksdb/pull/1746

Differential Revision: D4385681

Pulled By: IslamAbdelRahman

fbshipit-source-id: c07b702
2017-01-09 15:54:12 -08:00
Aaron Gao 972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Andrew Kryczka f0c509e2c8 Return finer-granularity status from Env::GetChildren*
Summary:
It'd be nice to use the error status type to distinguish
between user error and system error. For example, GetChildren can fail
listing a backup directory's contents either because a bad path was provided
(user error) or because an operation failed, e.g., a remote storage service
call failed (system error). In the former case, we want to continue and treat
the backup directory as empty; in the latter case, we want to immediately
propagate the error to the caller.

This diff uses NotFound to indicate user error and IOError to indicate
system error. Previously IOError indicated both.
Closes https://github.com/facebook/rocksdb/pull/1644

Differential Revision: D4312157

Pulled By: ajkr

fbshipit-source-id: 51b4f24
2016-12-12 12:54:13 -08:00
Edouard A 99c052a34f Fix integer overflow in GetL0ThresholdSpeedupCompaction (#1378) 2016-10-23 18:43:29 -07:00
Dmitri Smirnov b9311aa65c Implement WinRandomRW file and improve code reuse (#1388) 2016-10-13 16:36:34 -07:00
Edouard A 66a91e2607 Add NoSpace subcode to IOError (#1320)
Add a sub code to distinguish "out of space" errors from regular I/O errors
2016-09-07 12:37:45 -07:00
Islam AbdelRahman e9b2af87f8 Expose ThreadPool under include/rocksdb/threadpool.h
Summary:
This diff split ThreadPool to
-ThreadPool (abstract interface exposed in include/rocksdb/threadpool.h)
-ThreadPoolImpl (actual implementation in util/threadpool_imp.h)

This allow us to expose ThreadPool to the user so we can use it as an option later

Test Plan: existing unit tests

Reviewers: andrewkr, yiwu, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62085
2016-08-26 10:41:35 -07:00
sdong f62fbd2c85 Handle overflow case of rate limiter's paramters
Summary: When rate_bytes_per_sec * refill_period_us_ overflows, the actual limited rate is very low. Handle this case so the rate will be large.

Test Plan: Add a unit test for it.

Reviewers: IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: yiwu, lightmark, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58929
2016-05-27 16:15:28 -07:00
Aaron Orenstein 2073cf3775 Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions.
Summary: Reduce use of argument-dependent name lookup in RocksDB.

Test Plan: 'make check' passed.

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58203
2016-05-20 07:42:18 -07:00
Dmitri Smirnov 26adaad438 Split WinEnv into separate classes. (#1128)
For ease of reuse and customization as a library
  without wrapping.
  WinEnvThreads is a class for replacement.
  WintEnvIO is a class for reuse and behavior override.
  Added private virtual functions for custom override
  of fallocate pread for io classes.
2016-05-19 16:40:54 -07:00
Dmitri Smirnov bac3be7c46 Fix build issue. (#1123)
Implement GetUniqueIdFromFile to support new tests and the feature.
2016-05-16 17:01:00 -07:00
Dmitri Smirnov aab91b8d8f Use generic threadpool for Windows environment (#1120)
Conditionally retrofit thread_posix for use with std::thread
  and reuse the same logic. Posix users continue using Posix interfaces.
  Enable XPRESS compression in test runs.
  Fix master introduced signed/unsigned mismatch.
2016-05-12 18:34:04 -07:00
Dmitri Smirnov 4ea6e051ee Fix multiple issues with WinMmapFile fo sequential writing (#1108)
make preallocation inline with other writable files
  make sure that we map no more than pre-allocated size.
2016-04-29 16:43:13 -07:00
PraveenSinghRao e8115cea45 Revert "Use async file handle for better parallelism (#1049)" (#1105)
This reverts commit b54c347424.

Revert async file handle change as it causes failures with appveyor
2016-04-28 22:50:26 -07:00
Li Peng 6d4832a998 Merge pull request #1101 from flyd1005/wip-fix-typo
fix typos and remove duplicated words
2016-04-28 02:30:44 -07:00
PraveenSinghRao b54c347424 Use async file handle for better parallelism (#1049) 2016-04-22 13:27:33 -07:00
Dmitri Smirnov ee221d2de0 Introduce XPRESS compresssion on Windows. (#1081)
Comparable with Snappy on comp ratio.
  Implemented using Windows API, does not require external package.
  Avaiable since Windows 8 and server 2012.
  Use -DXPRESS=1 with CMake to enable.
2016-04-19 22:54:24 -07:00
Yueh-Hsuan Chiang a558830f8f Fixed compile warnings in posix_logger.h and coding.h
Summary:
Fixed the following compile warnings:

/Users/yhchiang/rocksdb/util/posix_logger.h:32:11: error: unused variable 'kDebugLogChunkSize' [-Werror,-Wunused-const-variable]
const int kDebugLogChunkSize = 128 * 1024;
          ^
/Users/yhchiang/rocksdb/util/coding.h:24:20: error: unused variable 'kMaxVarint32Length' [-Werror,-Wunused-const-variable]
const unsigned int kMaxVarint32Length = 5;
                   ^
2 errors generated.

Test Plan: make clean rocksdb

Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven, kradhakrishnan, adamretter

Reviewed By: adamretter

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56223
2016-03-31 16:01:47 -07:00
Dmitri Smirnov 2ca0994cf7 Latest versions of Jemalloc library do not require je_init()/je_unint()
calls. #ifdef in the source code and make this a default build option.
2016-03-17 11:25:20 -07:00
Dmitri Smirnov 9ea2968d26 Implement ConsistentChildrenAttribute
by using default implementation for now as it works.
2016-02-19 14:20:34 -08:00
Andrew Kryczka d733dd5728 [build] Fix env_win.cc compiler errors
Summary: I broke it in D53781.

Test Plan: tried the same code in util/env_posix.cc and it compiled successfully

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54303
2016-02-17 11:57:04 -08:00
Baraa Hamodi 21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Andrew Kryczka 59b3ee658f Env function for bulk metadata retrieval
Summary:
Added this new function, which returns filename, size, and modified
timestamp for each file in the provided directory. The default implementation
retrieves the metadata sequentially using existing functions. In the next diff
I'll make HdfsEnv override this function to use libhdfs's bulk get function.

This won't work on windows due to the path separator.

Test Plan:
new unit test

  $ ./env_test --gtest_filter=EnvPosixTest.ConsistentChildrenMetadata

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: IslamAbdelRahman, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53781
2016-02-09 14:54:32 -08:00
Tomas Kolda 57a95a7001 Making use of GetSystemTimePreciseAsFileTime dynamic - code review fixes 2016-02-02 10:23:56 +01:00
Tomas Kolda 502d41f150 Making use of GetSystemTimePreciseAsFileTime dynamic to not
break compatibility with Windows 7. The issue with rotated logs
was fixed other way.
2016-02-02 10:23:56 +01:00
Dmitri Smirnov 36300fbbe3 Enable per-request buffer allocation in RandomAccessFile
This change impacts only non-buffered I/O on Windows.
 Currently, there is a buffer per RandomAccessFile
 instance that is protected by a lock. The reason we
 maintain the buffer is non-buffered I/O requires an aligned
 buffer to work.
 XPerf traces demonstrate that we accumulate a considerable
 wait time while waiting for that lock.
 This change enables to set random access buffer size to zero
 which would indicate a per request allocation.
 We are expecting that allocation expense would be much less than
 I/O costs plus wait time due to the fact that the memory heap
 would tend to re-use page aligned allocations especially with the
 use of Jemalloc.
 This change does not affect buffer use as a read_ahead_buffer for
 compaction purposes.
2016-02-01 13:14:37 -08:00
Dmitri Smirnov ac50fd3a71 Align statistics
Use Yield macro to make it a little more portable between platforms.
2016-01-13 14:53:23 -08:00
Marek Kurdej 92d0850f1c Fix failing assertion in logger on Windows when the disk is full. 2016-01-05 13:35:14 +01:00
Nathan Bronson 7d87f02799 support for concurrent adds to memtable
Summary:
This diff adds support for concurrent adds to the skiplist memtable
implementations.  Memory allocation is made thread-safe by the addition of
a spinlock, with small per-core buffers to avoid contention.  Concurrent
memtable writes are made via an additional method and don't impose a
performance overhead on the non-concurrent case, so parallelism can be
selected on a per-batch basis.

Write thread synchronization is an increasing bottleneck for higher levels
of concurrency, so this diff adds --enable_write_thread_adaptive_yield
(default off).  This feature causes threads joining a write batch
group to spin for a short time (default 100 usec) using sched_yield,
rather than going to sleep on a mutex.  If the timing of the yield calls
indicates that another thread has actually run during the yield then
spinning is avoided.  This option improves performance for concurrent
situations even without parallel adds, although it has the potential to
increase CPU usage (and the heuristic adaptation is not yet mature).

Parallel writes are not currently compatible with
inplace updates, update callbacks, or delete filtering.
Enable it with --allow_concurrent_memtable_write (and
--enable_write_thread_adaptive_yield).  Parallel memtable writes
are performance neutral when there is no actual parallelism, and in
my experiments (SSD server-class Linux and varying contention and key
sizes for fillrandom) they are always a performance win when there is
more than one thread.

Statistics are updated earlier in the write path, dropping the number
of DB mutex acquisitions from 2 to 1 for almost all cases.

This diff was motivated and inspired by Yahoo's cLSM work.  It is more
conservative than cLSM: RocksDB's write batch group leader role is
preserved (along with all of the existing flush and write throttling
logic) and concurrent writers are blocked until all memtable insertions
have completed and the sequence number has been advanced, to preserve
linearizability.

My test config is "db_bench -benchmarks=fillrandom -threads=$T
-batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000
--block_size=16384 --allow_concurrent_memtable_write" on a two-socket
Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
-N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
across both sockets happens at 30 threads, and is ~900Kops/sec, although
with fewer threads there is less performance loss when the system has
background work.

Test Plan:
1. concurrent stress tests for InlineSkipList and DynamicBloom
2. make clean; make check
3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
6. make clean; OPT=-DROCKSDB_LITE make check
7. verify no perf regressions when disabled

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba

Differential Revision: https://reviews.facebook.net/D50589
2015-12-25 11:03:40 -08:00
Siying Dong 298ba27ae2 Merge pull request #846 from yuslepukhin/enble_c4244_lossofdata
Enable MS compiler warning c4244.
2015-12-23 22:59:42 -08:00
Venkatesh Radhakrishnan 030215bf01 Running manual compactions in parallel with other automatic or manual compactions in restricted cases
Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.

Test Plan: Rocksdb regression + new tests + valgrind

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47973
2015-12-14 11:20:34 -08:00
Dmitri Smirnov 236fe21c92 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:47:34 -08:00