During the Christmas holidays, I was taking some (many) trains to get to Hamburg. And yes, I took only the regional ones because I didn’t want to spend more money, instead I preferred to sacrifice my time and my health, now it all seems like a trade-off. Anyway, time on trains are a good excuse to do some project euler.
I had many troubles trying to figure out why my code runs smoothly up to $10^{13}$ but crashes and takes hours for $10^{14}$, honestly, I still don’t know what the reason is overflow. Since, I’m no longer a Wolfram Mathematica employee, I don’t have access to the Wolfram Notebook without a license which I don’t plan to buy, however a good thing about working previously in Wolfram Research, is that you know more about their products and solutions. There is actually an official Wolfram Kernel software that can be used without a paid license, it only works via CLI but it’s enough for me, I think it’s possible to connect to something like Jupyter.
So, my current solution is kinda cheat because I used Wolfram Mathematica to calculate some values that my algorithm take long time. Anyway, I think the solution is correct, but the implementation is not the most clever one.
I have also other pending problems, because it was a really long trip thanks DB. However, I’ll see if I have time to post something else later.
Statement
A positive integer, $n$, is factorised into prime factors. We define $f(n)$ to be the product when each prime factor is replaced with $2$. In addition we define $f(1) = 1$.
For example, $90 = 2 \times 3 \times 3 \times 5$, then replacing the primes, $2 \times 2 \times 2 \times 2 = 16$, hence $f(90) = 16$ .
Let $S(N) = \sum_{n=1}^{N} f(n)$. You are given $F(10^{8}) = 9613563919$.
Find $S(10^{14})$.
Solution
Initial solution
At first sight, it’s possible to notice that if $n = \prod_{i=1}^{m} p_{i}^{\alpha_{i}}$, then:
\begin{aligned} f(n) = 2^{\sum_{i=1}^{m} \alpha_{i}} \end{aligned}
Let $g(n, k)$ the amount of numbers less or equal than $n$ with exactly $k$ prime factors, then:
\begin{aligned} S(n) = \sum_{k=1}^{\log_2(n)} 2^{k} g(n, k) \end{aligned}
Since the integer value of $\log_{2}(10^{14})$ is $46$, the only problem is find an efficient way to calculate $g(n, k)$ for a big value of $n$.
Fortunately, there is already a definition for the value of $g(n, i)$, which is called k-almost prime.
\begin{aligned} g(n, k) = \sum_{i=1}^{\pi(n^{1/k})} \sum_{j=i}^{\pi((n/p_{i})^{1/(k-1)})} \cdots \sum_{z=y}^{\pi((n/(p_{i}p_{j}\cdots p_{y}))^{1/2})} \pi\bigg(\frac{n}{p_{i}p_{j}\cdots p_{z}}\bigg) - z + 1 \end{aligned}
As you can notice, it has a fancy recursive definition. Of course, the base case is when $k=2$, because the answer for $k=1$ is just $\pi(n)$.
\begin{aligned} g(n, 2) = \sum_{i=1}^{\pi(n^{1/2})} \pi(n/p_{i}) - i + 1 \end{aligned}
However, to define an easy recursive formula, it’s neccessary to add a new parameter it, which represent from which prime number starts:
\begin{aligned} g(n, k, it) = \sum_{i=it}^{\pi(n^{1/k})} \sum_{j=i}^{\pi((n/p_{i})^{1/(k-1)})} \cdots \sum_{z=y}^{\pi((n/(p_{i}p_{j}\cdots p_{y}))^{1/2})} \pi\bigg(\frac{n}{p_{i}p_{j}\cdots p_{z}}\bigg) - z + 1 \end{aligned}
\begin{aligned} g(n, 2, it) = \sum_{i=it}^{\pi(n^{1/2})} \pi(n/p_{i}) - i + 1 \end{aligned}
So, we can define a new simpler formula for $g(n, k, it)$:
\begin{aligned} g(n, k, it) = \sum_{i=it}^{\pi(n^{1/k})} g(n/p_{i}, k-1, i) \end{aligned}
The last step, is to see how fast it is possible to find the base case. Since $g(n, 2, 1)$ is just the iterations over the prime numbers less or equal than $\sqrt{n}$, the time complexity is:
\begin{aligned} O(g(n, 2, i)) = \sum_{p_{i} \leq \sqrt{n}} O(\pi(n/p_{i})) \end{aligned}
It means, the whole time complexity depends about how fast it is possible to calculate $\pi(n)$. A simpler method could be using a sieve, however since the values are greater that $\sqrt{n}$, we can’t sieve to the value of $n$. Other method, is trying to factorize it with the prime numbers less than $\sqrt{n}$. However, this approach can potentially take $O(\sqrt{n}/\log(n))$ per number which is too slow.
Meissel-Lehmer is an algorithm which can efficiently calculate $\pi(n)$ for big values of $n$.
ll g(ll n, int k, int it){
ll ans = 0;
for(int i=it; i<pi(pow(n, 1./m)); i++){
if(m == 2) ans += pi(n/primes[i]) - i;
else ans += g(n/primes[i], k-1, i);
}
return ans;
}
Using my Meissel-Lehmer implementation is enough to solve the problem for $10^{13}$ in a few minutes. However, for $10^{14}$ it’s having some troubles and it takes hours.
Adding pre-processing
I’ve used Wolfram Mathematica to find the most expensive runnings of my Meissel-Lehmer algorithm, which are the values of:
\begin{aligned} {\pi(10^{14}/1), \pi(10^{14}/2) \cdots \pi(10^{14}/\sqrt{10^{14}})} \end{aligned}
PrimePi[Table[Quotient[10^14, i], {i, 1, 10^7}]]
I stored it in a .txt
file and read it in my c++
code just using pipes operators.
Also, adding some memoization to the recursive formula is useful. Now this solution looks more like a dynamic programming approach:
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
#define FIFO ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0)
#define N 100000000000000
const long long limit = 10000005;
const long long cache_limit = 50000000;
map <ll, ll> cache[48][670000];
map <ll, ll> pi_div;
vector <int> pi_sieve(limit);
int ceil_power(ll n, ll m){
double p = pow(n, 1.0/m);
int x = floor(p);
if(pow(x+1, m) <= n) return x+1;
else return x;
}
vector <int> prime_sieve(){
bitset <limit> sieve;
vector <int> ans;
for(int i=2; i<limit; i++){
if(!sieve[i]){
pi_sieve[i] = pi_sieve[i-1] + 1;
ans.push_back(i);
for(ll j=1LL*i*i; j<limit; j+=i) sieve[j] = 1;
}
else pi_sieve[i] = pi_sieve[i-1];
}
return ans;
}
vector <int> primes;
map <pair <ll,ll>, ll> phi_cache;
ll phi(ll x, ll a){
pair <ll,ll> values = {x,a};
if(phi_cache.find(values) != phi_cache.end()) return phi_cache[values];
if(a == 1) return (x+1)/2;
ll ans = phi(x, a-1) - phi(x/primes[a-1], a-1);
if(phi_cache.size() < cache_limit) phi_cache[values] = ans;
return ans;
}
map <ll,ll> pi_cache;
ll pi(ll x){
if(x < limit) return pi_sieve[x];
if(pi_div.find(x) != pi_div.end()) return pi_div[x];
if(pi_cache.find(x) != pi_cache.end()) return pi_cache[x];
int a = pi((int)(pow(x,1.0/4)));
int b = pi((int)(sqrt(x)));
int c = pi((int)(pow(x,1.0/3)));
ll ans = phi(x,a) + 1LL*(b+a-2)*(b-a+1)/2;
for(int i=a+1; i<b+1; i++){
ll w = x/primes[i-1];
ans -= pi(w);
if(i <= c){
ll b_i = pi((int)(sqrt(w)));
for(int j=i; j<b_i+1; j++)
ans = ans - pi(w/primes[j-1]) + j - 1;
}
}
if(pi_cache.size() < cache_limit) pi_cache[x] = ans;
return ans;
}
ll g(ll n, int k, int it){
ll ans = 0;
if(cache[k][it].find(n) != cache[k][it].end()){
ans = cache[k][it][n];
return ans;
}
for(int i=it; i<pi(ceil_power(n, k)); i++){
if(m == 2) ans += pi(n/primes[i]) - i;
else ans += Pi(n/primes[i], m-1, i);
}
cache[k][it][n] = ans;
return ans;
}
ll solve(ll n){
int m = log2(n);
ll ans = 2*pi(n);
for(int i=2; i<=m; i++){
ans += g(n, i, 0) * (1LL<<i);
}
return ans;
}
int main(){
FIFO;
primes = prime_sieve();
ll n = N;
for(int i=0; i<10000000; i++){
ll x; cin >> x;
pi_div[N/(i+1)] = x;
}
cout << 1 + solve(n) << '\n';
}
it runs in less than 5 min.