-
Notifications
You must be signed in to change notification settings - Fork 1
extend docs for benchbase chbenchmark #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,10 +3,10 @@ | |||||||||
| linkTitle: "CH-benCHmark" | ||||||||||
| weight: 20 | ||||||||||
| --- | ||||||||||
| The [CH-benCHmark](https://db.in.tum.de/research/projects/CHbenCHmark/?lang=en) bridges the gap between TPC-C, an OLTP (i.e., transactional), and TPC-H, an OLAP (i.e., analytical benchmark). | ||||||||||
| The [CH-benCHmark](https://db.in.tum.de/research/projects/CHbenCHmark/?lang=en) bridges the gap between TPC-C, an OLTP (i.e., transactional) benchmark, and TPC-H, an OLAP (i.e., analytical) benchmark. | ||||||||||
|
Check failure on line 6 in content/example_datasets/chbenchmark.md
|
||||||||||
|
|
||||||||||
| In contrast to many other benchmarks for hybrid workloads, CH-benCHmark runs its analytical queries on the same tables that are updated by the transactional queries. | ||||||||||
| This especially stresses the database's transaction subsystem as it has to ensure that all queries see a consistent state of the heavily write-contested tables. | ||||||||||
| This especially stresses the database's transaction subsystem as it has to ensure that all queries see a consistent state of the heavily write-contended tables. | ||||||||||
|
|
||||||||||
| ## The Dataset | ||||||||||
|
|
||||||||||
|
|
@@ -15,4 +15,159 @@ | |||||||||
|
|
||||||||||
| ## Executing the benchmark | ||||||||||
|
|
||||||||||
| CMU's benchmarking tool [benchbase](https://github.com/cmu-db/benchbase/) comes with a CH-benCHmark configuration and is compatible to Postgres. | ||||||||||
| CMU's benchmarking tool [benchbase](https://github.com/cmu-db/benchbase/) comes with a CH-benCHmark configuration and is compatible with PostgreSQL. | ||||||||||
|
Check failure on line 18 in content/example_datasets/chbenchmark.md
|
||||||||||
|
|
||||||||||
| {{% steps %}} | ||||||||||
|
|
||||||||||
| ### Prepare a CedarDB instance | ||||||||||
|
|
||||||||||
| First install CedarDB [locally](../get_started/install_locally) or alternatively [via Docker](../get_started/install_with_docker). | ||||||||||
|
|
||||||||||
| ```shell | ||||||||||
| curl https://get.cedardb.com | bash | ||||||||||
| ./cedar/cedardb | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Set up benchbase | ||||||||||
|
Check failure on line 31 in content/example_datasets/chbenchmark.md
|
||||||||||
|
|
||||||||||
| CedarDB requires a slight modification to the upstream benchbase project, which is maintained [in a fork](https://github.com/cedardb/benchbase/). | ||||||||||
|
Check failure on line 33 in content/example_datasets/chbenchmark.md
|
||||||||||
| Following the [quickstart instructions](https://github.com/cedardb/benchbase/#quickstart), set up benchbase: | ||||||||||
|
Check failure on line 34 in content/example_datasets/chbenchmark.md
|
||||||||||
|
|
||||||||||
| ```shell | ||||||||||
| git clone --depth 1 https://github.com/cedardb/benchbase.git | ||||||||||
| cd benchbase | ||||||||||
| ./mvnw clean package -P postgres | ||||||||||
| cd target | ||||||||||
| tar xvzf benchbase-postgres.tgz | ||||||||||
| cd benchbase-postgres | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| The benchmark config file specifies the workload parameters. | ||||||||||
| The following file specifies the CH workload with scale factor 100, which is about 10 GB data. | ||||||||||
|
|
||||||||||
| ```xml {filename="config.xml"} | ||||||||||
| <parameters> | ||||||||||
| <!-- Connection details, --> | ||||||||||
| <type>POSTGRES</type> | ||||||||||
| <driver>org.postgresql.Driver</driver> | ||||||||||
| <url>jdbc:postgresql://localhost:5432/postgres?sslmode=disable&ApplicationName=chbenchmark&reWriteBatchedInserts=true</url> | ||||||||||
| <username>postgres</username> | ||||||||||
| <password>postgres</password> | ||||||||||
| <reconnectOnConnectionFailure>true</reconnectOnConnectionFailure> | ||||||||||
| <batchsize>128</batchsize> | ||||||||||
| <scalefactor>100</scalefactor> | ||||||||||
| <terminals>101</terminals> | ||||||||||
| <works> | ||||||||||
| <work> | ||||||||||
| <warmup>60</warmup> | ||||||||||
| <time>120</time> | ||||||||||
|
Comment on lines
+62
to
+63
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't expect a clean create -> load -> execute run we should have a longer warmup so slow disks can actually read all data during warmup
Suggested change
|
||||||||||
| <rate bench="tpcc">unlimited</rate> | ||||||||||
| <rate bench="chbenchmark">unlimited</rate> | ||||||||||
| <weights bench="tpcc">45,43,4,4,4</weights> | ||||||||||
| <weights bench="chbenchmark">3, 2, 3, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5</weights> | ||||||||||
| <active_terminals bench="tpcc">100</active_terminals> | ||||||||||
| <active_terminals bench="chbenchmark">1</active_terminals> | ||||||||||
| </work> | ||||||||||
| </works> | ||||||||||
| <transactiontypes bench="chbenchmark"> | ||||||||||
| <transactiontype><name>Q1</name></transactiontype> | ||||||||||
| <transactiontype><name>Q2</name></transactiontype> | ||||||||||
| <transactiontype><name>Q3</name></transactiontype> | ||||||||||
| <transactiontype><name>Q4</name></transactiontype> | ||||||||||
| <transactiontype><name>Q5</name></transactiontype> | ||||||||||
| <transactiontype><name>Q6</name></transactiontype> | ||||||||||
| <transactiontype><name>Q7</name></transactiontype> | ||||||||||
| <transactiontype><name>Q8</name></transactiontype> | ||||||||||
| <transactiontype><name>Q9</name></transactiontype> | ||||||||||
| <transactiontype><name>Q10</name></transactiontype> | ||||||||||
| <transactiontype><name>Q11</name></transactiontype> | ||||||||||
| <transactiontype><name>Q12</name></transactiontype> | ||||||||||
| <transactiontype><name>Q13</name></transactiontype> | ||||||||||
| <transactiontype><name>Q14</name></transactiontype> | ||||||||||
| <transactiontype><name>Q15</name></transactiontype> | ||||||||||
| <transactiontype><name>Q16</name></transactiontype> | ||||||||||
| <transactiontype><name>Q17</name></transactiontype> | ||||||||||
| <transactiontype><name>Q18</name></transactiontype> | ||||||||||
| <transactiontype><name>Q19</name></transactiontype> | ||||||||||
| <transactiontype><name>Q20</name></transactiontype> | ||||||||||
| <transactiontype><name>Q21</name></transactiontype> | ||||||||||
| <transactiontype><name>Q22</name></transactiontype> | ||||||||||
| </transactiontypes> | ||||||||||
| <transactiontypes bench="tpcc"> | ||||||||||
| <transactiontype><name>NewOrder</name></transactiontype> | ||||||||||
| <transactiontype><name>Payment</name></transactiontype> | ||||||||||
| <transactiontype><name>OrderStatus</name></transactiontype> | ||||||||||
| <transactiontype><name>Delivery</name></transactiontype> | ||||||||||
| <transactiontype><name>StockLevel</name></transactiontype> | ||||||||||
| </transactiontypes> | ||||||||||
| </parameters> | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Run the benchmark | ||||||||||
|
|
||||||||||
| With CedarDB running and benchbase set up in the `benchbase-postgres` directory, the benchmark is ready to run. | ||||||||||
|
Check failure on line 108 in content/example_datasets/chbenchmark.md
|
||||||||||
|
|
||||||||||
| ```shell | ||||||||||
| # Set up the benchmark user for CedarDB using the same values as in config.xml | ||||||||||
| psql -h /tmp -U postgres -c "alter user postgres with password 'postgres';" | ||||||||||
|
|
||||||||||
| # Run the benchmark | ||||||||||
| java -jar benchbase.jar -b tpcc,chbenchmark -c config.xml --create=true --load=true --execute=true | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Benchbase now generates and loads the initial dataset before running the workload query stream. | ||||||||||
|
Check failure on line 118 in content/example_datasets/chbenchmark.md
|
||||||||||
| The initial data loading takes about 5 minutes, which can be skipped with `--create=false --load=false` after the first run. | ||||||||||
|
|
||||||||||
| The config specifies a warm-up time of 60 seconds to ensure data is cached, and afterward runs the benchmark for 120 seconds. | ||||||||||
|
|
||||||||||
| ### Results | ||||||||||
|
|
||||||||||
| After the benchmark run, benchbase prints a detailed report of the workload. | ||||||||||
|
Check failure on line 125 in content/example_datasets/chbenchmark.md
|
||||||||||
| The following is an example run on AWS with EBS. TODO: @ChrisWint please add the specifics. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Measurements below are on https://instances.vantage.sh/aws/ec2/m7a.16xlarge with a 1GB/s 16k IOPS gp3 volume, but for scale factor 1000 with 100 terminals. We might want to remeasure according to the xml we have above to give accurate results.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to have a (rough) sizing guide per scale factor. For best performance instance memory should be 300mb per warehouse to be on the safe side regarding memory handling. wrt CPU cores I don't know exactly when we start to suffer from the Linux scheduler under oversubscription, but I think 10 connections/terminals per core should be safe still. Lower doesn't hurt either |
||||||||||
|
|
||||||||||
| ```text | ||||||||||
| Completed Transactions: | ||||||||||
| com.oltpbenchmark.benchmarks.tpcc.procedures.NewOrder/01 [2257376] ******************************************************************************** | ||||||||||
| com.oltpbenchmark.benchmarks.tpcc.procedures.Payment/02 [2178734] ***************************************************************************** | ||||||||||
| com.oltpbenchmark.benchmarks.tpcc.procedures.OrderStatus/03 [ 202457] ******* | ||||||||||
| com.oltpbenchmark.benchmarks.tpcc.procedures.Delivery/04 [ 202717] ******* | ||||||||||
| com.oltpbenchmark.benchmarks.tpcc.procedures.StockLevel/05 [ 202556] ******* | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q1/06 [ 1] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q2/07 [ 2] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q3/08 [ 2] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q4/09 [ 1] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q5/10 [ 0] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q6/11 [ 4] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q7/12 [ 0] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q8/13 [ 3] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q9/14 [ 6] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q10/15 [ 4] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q11/16 [ 1] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q12/17 [ 4] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q13/18 [ 5] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q14/19 [ 2] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q15/20 [ 4] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q16/21 [ 1] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q17/22 [ 3] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q18/23 [ 4] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q19/24 [ 7] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q20/25 [ 7] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q21/26 [ 2] | ||||||||||
| com.oltpbenchmark.benchmarks.chbenchmark.queries.Q22/27 [ 4] | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| The two key CH-benCHmark metrics can be derived from these numbers: | ||||||||||
|
|
||||||||||
| **tpmC** (new-order transactions per minute, the standard TPC-C throughput metric): | ||||||||||
|
|
||||||||||
| ```text | ||||||||||
| 2.257.376 NewOrder tx / 120s * 60 = 1.128.688 tpmC | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| **QphH** (analytical queries per hour, summing all 22 CH query completions): | ||||||||||
|
|
||||||||||
| ```text | ||||||||||
| 67 queries / 120s * 3600 = 2.010 QphH | ||||||||||
| ``` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to include our chbench tool here as well? I don't think it is necessary for this PR, just raising the possibility |
||||||||||
|
|
||||||||||
| {{% /steps %}} | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could include docker setup infos here or above, especially the command line args to set the postgres user password accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also should decide if we want to enable async mode (was enabled for below). This does not matter for local machines, but for EBS with fsync latency around 4ms and writethrough reporting it does