# Generalized birthday problem Rcpp

In the previous post, I tried to use some algorithms to speed up simulation. Now in this post, I will introduce a pretty powerful package `Rcpp`

, which has the potential to fundementally change the way you do simulation if you want to employ a time-consuming simulation in R. Let’s finish this project!

### Rcpp

`Rcpp`

is basically a package that incorporates C++ code in R. For me, using `Rcpp`

means my code can work faster while I can still enjoy the convenience of many versatile R packages. A simple way to use `Rcpp`

is the function `cppFunction`

. C++ code comes in as text in your R script. In this example, I define the function `simulateC`

, which simulates and calculate the probility of a birthday shared by at least *k* persons in a room size of *room* (sorry for inconsistent notation, but I prefer i as index in a loop). I am totally not an expert in C++, so I am not goint through the code (this script is my first C++ script, to be honest). This is just to give a taste of how fast a function created by `Rcpp`

can be.

```
library(Rcpp)
cppFunction('
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
float simulateC (int k, int room, int replicate) {
IntegerVector result (replicate);
for (int r=0; r<replicate; r++){
IntegerVector counts(365);
for (int i=0; i<room; i++) {
int pos = rand() % 365;
if (pos < 365 && pos >= 0) counts[pos] ++;
}
if (max(counts) >= k) {
result[r] = TRUE;
} else result[r] = FALSE;
}
float prob = float(sum(result))/float(replicate);
return prob;
}
')
```

Let’s benchmark the time spent by either `tabulate`

or `Rcpp`

method. `rbenchmark`

is a package that we can measure the computer time consumed.

```
library(rbenchmark)
k <- 2
room <- 23
r <- 1e4
set.seed(1)
b <- benchmark("Rcpp" = {simulateC(k=2, room=room, replicate=r)},
'Tabulate' = {
birth <- matrix(sample(1:365, r*room, replace=TRUE), nrow=r, ncol=room)
sum(apply(birth, 1, function(x) max(tabulate(x))>=k)) /r},
replications = 1000,
columns=c('test', 'elapsed', 'replications', 'relative')
)
b
```

Bascially, `benchmark`

runs each methods for 1000 times, and compare the time consumed. It takes 7.643/1000 = 0.007643 seconds for a run of `simulateC(k=2, room=23, replicate=1e4)`

and 0.065545 seconds for a run of `tabulate`

method. So on average, the `Rcpp`

method is 8.576 times faster than `tabulate`

methods.

### Generalized birthday problem

This is the finall verison of function in generalized birthday problem project. Simply replace the code for simulation of one room size by `simulateC`

. Now we can try large room size and large replicates that are unbelievably time-consuming if we do it withou RCpp.

```
library(Rcpp)
gbp_rcpp <- function(k=2, p=.5, B_l=1000, B_u=20000, threshold=.001) {
i <- k # Tested room size; start from k persons in a room
x <- 1 # Counter for while loop
L <- 0 # Lower bound for binary search
R <- k^4+100+sample(-k:k, 1) # Upper bound for binary search
# Create an empty data frame for result
result <- setNames(data.frame(matrix(0, 100, 3)), c("room", "p", "replicate"))
while (abs(result$room[x] - p) >= threshold) {
# Increase replicate when approaching target room size
r <- floor(-B_u/(1+exp(-.03* ((R-L)-200))))+(B_u+B_l)
result$replicate[x] <- r
# Test room size i
i <- ceiling((L+R)/2)
result$room[x] <- i
# Calculate probability by running simulation in C++
result$p[x] <- simulateC(k=k, room=i, replicate = r)
if (result$p[x] >= p) R <- i
if (result$p[x] < p) L <- i
# Break the loop if trapped
if(sum(duplicated(tail(result$room[1:x], 2))) == 1 & result$room[x] != 0) break
# Print out room sizes tested
cat(paste0('Room size tested:', i, '\tprobability:', round(result$p[x], 3), '\treplicate size: ', r, '\n'))
# Counter for steps
x <- x + 1
}
ans <- R
return(list(answer=ans,
result=result[1:x,]))
}
```

Let’s try test the ideal room size for *k=8* and *p=0.5*. When approaching the ideal room size, increase the replicate size up to one million.

```
set.seed(1)
system.time(y_rcpp <- gbp_rcpp(k=8, p=.5, B_u=1000000))
y_rcpp
```

Now we have confident to say the ideal room size for *k=8* is 798.

### GeneralBirthdayProblem package

In case you want to play with generalized birthday problem and the source code, or don’t want to write from the very beginning. I uploaded the package to my github. You can simply dowload the R package by the following command in your R interface.

`devtools::install_github("Chang-Yu-Chang/GeneralBirthdayProblem")`

This package only has two functions: `simulateC`

and `gbp`

. These two functions are basically the same as what I used throughout this post. `simulateC`

simulates and reports the *probability* given *k*, room size, and the replicate size.

`simulateC(k=2, room=23, replicate=1e5)`

`gbp`

incorporates the `simulateC`

function and finds the ideal room size given *k* and *p*. I set the printing flexible so that if you don’t want the result to be printed every time.

`gbp(k=2, p=0.5) # Default is print=FALSE`