In the previous post, I tried to use some algorithms to speed up simulation. Now in this post, I will introduce a pretty powerful package `Rcpp`

, which has the potential to fundementally change the way you do simulation if you want to employ a time-consuming simulation in R. Let’s finish this project!

### Rcpp

`Rcpp`

is basically a package that incorporates C++ code in R. For me, using `Rcpp`

means my code can work faster while I can still enjoy the convenience of many versatile R packages. A simple way to use `Rcpp`

is the function `cppFunction`

. C++ code comes in as text in your R script. In this example, I define the function `simulateC`

, which simulates and calculate the probility of a birthday shared by at least *k* persons in a room size of *room* (sorry for inconsistent notation, but I prefer i as index in a loop). I am totally not an expert in C++, so I am not goint through the code (this script is my first C++ script, to be honest). This is just to give a taste of how fast a function created by `Rcpp`

can be.

```
library(Rcpp)
cppFunction("
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
float simulateC (int k, int room, int replicate) {
IntegerVector result (replicate);
for (int r=0; r<replicate; r++){
IntegerVector counts(365);
for (int i=0; i<room; i++) {
int pos = rand() % 365;
if (pos < 365 && pos >= 0) counts[pos] ++;
}
if (max(counts) >= k) {
result[r] = TRUE;
} else result[r] = FALSE;
}
float prob = float(sum(result))/float(replicate);
return prob;
}
")
```

Let’s benchmark the time spent by either `tabulate`

or `Rcpp`

method. `rbenchmark`

is a package that we can measure the computer time consumed.

```
library(rbenchmark)
k <- 2
room <- 23
r <- 10000
set.seed(1)
b <- benchmark(Rcpp = {
simulateC(k = 2, room = room, replicate = r)
}, Tabulate = {
birth <- matrix(sample(1:365, r * room, replace = TRUE),
nrow = r, ncol = room)
sum(apply(birth, 1, function(x) max(tabulate(x)) >= k))/r
}, replications = 1000, columns = c("test", "elapsed", "replications",
"relative"))
b
```

```
## test elapsed replications relative
## 1 Rcpp 7.643 1000 1.000
## 2 Tabulate 65.545 1000 8.576
```

Bascially, `benchmark`

runs each methods for 1000 times, and compare the time consumed. It takes 7.643/1000 = 0.007643 seconds for a run of `simulateC(k=2, room=23, replicate=1e4)`

and 0.065545 seconds for a run of `tabulate`

method. So on average, the `Rcpp`

method is 8.576 times faster than `tabulate`

methods.

### Generalized birthday problem

This is the finall verison of function in generalized birthday problem project. Simply replace the code for simulation of one room size by `simulateC`

. Now we can try large room size and large replicates that are unbelievably time-consuming if we do it withou RCpp.

```
library(Rcpp)
gbp_rcpp <- function(k = 2, p = 0.5, B_l = 1000, B_u = 20000,
threshold = 0.001) {
i <- k # Tested room size; start from k persons in a room
x <- 1 # Counter for while loop
L <- 0 # Lower bound for binary search
R <- k^4 + 100 + sample(-k:k, 1) # Upper bound for binary search
# Create an empty data frame for result
result <- setNames(data.frame(matrix(0, 100, 3)), c("room",
"p", "replicate"))
while (abs(result$room[x] - p) >= threshold) {
# Increase replicate when approaching target room size
r <- floor(-B_u/(1 + exp(-0.03 * ((R - L) - 200)))) +
(B_u + B_l)
result$replicate[x] <- r
# Test room size i
i <- ceiling((L + R)/2)
result$room[x] <- i
# Calculate probability by running simulation in C++
result$p[x] <- simulateC(k = k, room = i, replicate = r)
if (result$p[x] >= p)
R <- i
if (result$p[x] < p)
L <- i
# Break the loop if trapped
if (sum(duplicated(tail(result$room[1:x], 2))) == 1 &
result$room[x] != 0)
break
# Print out room sizes tested
cat(paste0("Room size tested:", i, "\tprobability:",
round(result$p[x], 3), "\treplicate size: ", r, "\n"))
# Counter for steps
x <- x + 1
}
ans <- R
return(list(answer = ans, result = result[1:x, ]))
}
```

Let’s try test the ideal room size for *k=8* and *p=0.5*. When approaching the ideal room size, increase the replicate size up to one million.

```
set.seed(1)
system.time(y_rcpp <- gbp_rcpp(k = 8, p = 0.5, B_u = 1e+06))
```

```
## Room size tested:2096 probability:1 replicate size: 1000
## Room size tested:1048 probability:0.964 replicate size: 1000
## Room size tested:524 probability:0.037 replicate size: 1000
## Room size tested:786 probability:0.49 replicate size: 1060
## Room size tested:917 probability:0.804 replicate size: 135703
## Room size tested:852 probability:0.646 replicate size: 888952
## Room size tested:819 probability:0.557 replicate size: 983363
## Room size tested:803 probability:0.514 replicate size: 994373
## Room size tested:795 probability:0.492 replicate size: 996889
## Room size tested:799 probability:0.503 replicate size: 997858
## Room size tested:797 probability:0.497 replicate size: 998213
## Room size tested:798 probability:0.5 replicate size: 998374
```

```
## user system elapsed
## 47.737 0.537 57.407
```

`y_rcpp`

```
## $answer
## [1] 798
##
## $result
## room p replicate
## 1 2096 1.0000000 1000
## 2 1048 0.9640000 1000
## 3 524 0.0370000 1000
## 4 786 0.4896227 1060
## 5 917 0.8037553 135703
## 6 852 0.6463184 888952
## 7 819 0.5572042 983363
## 8 803 0.5140676 994373
## 9 795 0.4922083 996889
## 10 799 0.5027779 997858
## 11 797 0.4972315 998213
## 12 798 0.5001422 998374
## 13 798 0.5006149 998452
```

Now we have confident to say the ideal room size for *k=8* is 798.

### GeneralBirthdayProblem package

In case you want to play with generalized birthday problem and the source code, or don’t want to write from the very beginning. I uploaded the package to my github. You can simply dowload the R package by the following command in your R interface.

`devtools::install_github("Chang-Yu-Chang/GeneralBirthdayProblem")`

This package only has two functions: `simulateC`

and `gbp`

. These two functions are basically the same as what I used throughout this post. `simulateC`

simulates and reports the *probability* given *k*, room size, and the replicate size.

`simulateC(k = 2, room = 23, replicate = 1e+05)`

`## [1] 0.50724`

`gbp`

incorporates the `simulateC`

function and finds the ideal room size given *k* and *p*. I set the printing flexible so that if you don’t want the result to be printed every time.

`gbp(k = 2, p = 0.5) # Default is print=FALSE`

`## [1] 23`