### PROBLEM LINK:

**Setter-** Kasra Mazaheri

**Tester-** Encho Mishinev

**Editorialist-** Abhishek Pandey

### DIFFICULTY:

HARD

### PRE-REQUISITES:

Concept of Modular Exponentiation and Fermat’s Little Theorem, Applications of FFT (All Possible Sums), Basic Combinatorics (The link might be more than what you need), General concepts of Modular set - like Generator of the Modular Set and Primitive Root (It just sound scary - its a trivial thing).

### PROBLEM:

Given an array A with N elements, we make a brute force generator to generate sequences of length K to crack password. Our brute force generator generates all N^K sequences. The password is product of the K chosen elements modulo P. We need to find how many times each integer from 0 to P-1 is tried by the sequences generated by our generator modulo 998244353.

### QUICK-EXPLANATION:

**Key to AC-** All possible sum (FFT) can be used if we change multiplication to addition!

We need to handle the case of 0 separately. Lets use a frequency array F to store frequency of elements. Let F[0] denote the frequency of 0. The solution for case of 0 would be N^K-(N-F[0])^K.

Now we need to find answer for all elements from 1 to P-1. For this, we will first transform the problem. Let G be the primitive root of the set. Hence, all numbers from 1 to P-1 can be written as some power of G.

Thus, transform the array A such that A_i is replaced by y_i such that G^{y_i}=A_i. Hence the product A_{i1}*A_{i2}*...*A_{ik} \% P now becomes \large G^{(y1+y2+...+yk)\% (P-1)} \%P.

Given that any power y (0 \leq y \leq P-2) of G (i.e. G^y) corresponds to a unique element from 1 to P-1, now the problem reduces to modular version of “All possible sums” problem, i.e. out of given y_i (via array A), in how many ways can we chose K of them to get a (modular) sum of r, for each r from 0 to P-2.

To solve above, we go by method of All Possible Sums. We make a polynomial P(x) such that:

- The zero’eth coefficient (we are using this term for convenience) corresponds to power of x^0. Similarly first coefficient corresponds to power x^1. Hence, i'th coefficient corresponds to power x^i.
- The value of i'th coefficient is nothing but frequency of i in array A (after transformation).

Now, since we can chose K elements, our answer lies in the K'th power of this polynomial. To find it, we do the following:

- K is large, so we cannot do normal multiplication. Instead, we will use the same algorithm of Fast Exponentiation to calculate it - just this time on polynomials instead of numbers! In other words, we calculate P(x), P(x)^2, P(x)^4… and so on and use them to find P(x)^K in O(logK) time factor.
- The terms in the polynomial will be too many in higher powers! But hey, we are interested in modular sum, not normal sum. What I mean is, take any power, say x^i where i \geq P-1. Note that the coefficient of x^i represents in how many ways we can get i by summing. But we are only interested in values from 1 to P-1 (we calculated for 0 already), and hence i's from 0 to P-2. So instead of keeping a separate term, and then manipulating result at end, we add coefficient of x^i to x^{i \% (P-1)} and discard x^i. Hence number of terms are always \leq P-1.

The only thing that’s left then trivial manipulation to see answer for which y_i corresponds to which A_i and print the result.

### EXPLANATION:

The editorial will have following sections-

- Transforming the problem from Product to Sum.
- Bringing in Polynomial - Their interpretation, how they can help and the challenges.
- Fast Exponentiation on above polynomial to get the answer

Mostly, the detailed explanation is aimed towards not-the-top-star-coders because the top coders already got everything they needed in the Quick Explanation Section. In case any doubt remains, they are requested to just skim through relevant sections.

There are some concept-checks in editorial - there solution is also in the bonus corner.

## Transforming the problem from Product to Sum

This was the first step to solve this problem. And there are some things we need to realize.

For any problem related to product or divisions, the case of 0 has to be - almost always - either handled separately or seen with care.

So going by that norm, we will handle it separately. Number of times 0 is used as a password can be easily calculated as we only need count of sequences with at least 1 zero. This can be easily found as-

Count Sequences with at least one zero = Total sequences - Count of Sequences with no zeroes at all.

Let F[0] be the frequency of 0 in A. Hence, the answer for 0 becomes N^k-(N-F[0])^k. If you do not get how, its given in bonus corner.

**Concept Check - Why are we not using the nCk formula here?**

Now, we need the solution for passwords from 1 to P-1.

Now, obviously the product of K terms is a difficult thing to handle. Not only it overflows the standard data type range - even using Big Integer etc is going to be very costly, especially for K \approx 10^9.

The most common trick we apply for these cases is to somehow change multiplication to addition. For different questions, its done in different ways, like taking logs etc. For this question, taking logs is not desirable as we are doing modular arithmetic, hence we use the other concept - Generator or Primitive Root. (refer to pre-requisites and bonus corner for more)

Using the methods given in pre-requisite links, we can find the Primitive Root/Generator for the set, i.e. G. Note that when I say set, I mean the set of elements from 1 to P-1. Now, every element in A can be represented as some power of G. In other words, we replace A_i with y_i such that we have G^{y_i}=A_i.

Now, the product of A_{i1}*A_{i2}*...*A_{ik} \% P becomes sum (of powers) of G. That is, A_{i1}*A_{i2}*...*A_{ik} \% P now becomes \large G^{(y1+y2+...+yk)\% (P-1)} \%P.

**Concept check - Why are the powers modulo P-1 instead of P?**

Now, our problem transforms from "Finding how many times product of K elements equals to i for all i from [1,P-1]" to "Finding in how many ways sum of y_1+y_2+...+y_k \%P yields c such that G^c \% P \in [1,P-1] for all c \in [0,P-2] (Note modulo of power is P-1 and each power from 0 to P-2 represents a distinct integer in range [1,P-1])

## Bringing in the Polynomial - Their interpretation and how they can help.

Our problem is now reduced to finding in how many ways we can chose K elements from our transformed A such that their modular sum is c for all 0 \leq c \leq P-2. This can be solved by using All Possible Sums algorithm.

The original problem says that, given 2 arrays, find what are the possible sums and how many ways are possible to generate them. Seems pretty similar! The only changes are this:

- All possible sums (given in FFT pre-requisite link) does it for 2 arrays, and assumes we want all possible sums if we chose one element from each.
- It can be easily generalized. We know that all our elements are to be chosen from A. This is similar to having K copies of array A and having to choose one element from each. This is why you will see we go for K'th power of frequency polynomial.

To begin with solving the problem, we shall make a frequency polynomial. This is a polynomial where i'th coefficient (corresponding to power x^i) has value F[i] where F[i] is frequency of i in A. Obviously, our polynomial has 0-based indexing (zeroth coefficient corresponds to x^0). Note that this is done in the modified/transformed array A where A_i is replaced by y_i such that G^{y_i}=A_i. (Concept check - Why does this polynomial work? This also deals with how it helps and is at bonus corner)

As already justified above, our answer shall lie in the K'th power of this polynomial. But there are 2 challenges we face here:

- K is very large. Using normal FFT will take a LOT of time for K \approx 10^9.
- The number of terms is going to kill us as well. For K \approx 10^9, you can imagine how many terms the K'th power will have.

We will resolve these difficulties in the next section.

## Overcoming Challenges using our old and friendly Fast Exponentiation Trick

We had 2 challenges in previous section. One pertaining to K being very high - hence it being very time consuming to calculate K'th power of the polynomial. And the other pertained to the K'th power having too many terms to cause memory and time limits to get exceeded.

You will see that both of these are actually handled by very basic fundamentals.

Firstly, lets deal with high number of terms first. It is a legit question, that how can we solve the problem with this method if the number of terms are \approx 10^9 ? The answer is simple - we are interested in modular sum and not normal sum! All Possible Sums solves for normal sum, but it is faster if it has to be done modulo P. Actually, since the sum we are finding are powers of G, they are modulo is P-1, but you get the flow!

Now, what happens when terms increase? Terms increase because we got a new power of x, say x^i where i \geq (P-1). Note that, we are interested in finding how many ways we can select (y_1,y_2,...,y_k) and get a value c where 0 \leq c \leq P-2. So we know that ultimately the number of ways to get i (which is the coefficient of x^i) are to be added to number of ways to get i \% (P-1) to get the final answer (because we are interested in modular sum and not normal sum). So why not do that now?

All that I’m saying is, whenever we get a term x^i where i \geq P-1, we simply add its contribution to x^{i \% (P-1)} and discard the term x^i. This keeps number of terms at any time O(P) and the concept behind the polynomial stays intact!

Now all that we need to handle is finding the K'th power. This can be easily done using fast exponentiation trick - except this time on polynomials instead of numbers! Let P(x) be our frequency polynomial. We will calculate P(x), P(x)^2, P(x)^4, P(x)^8, … and so on to finally obtain P(x)^K using their combination. The multiplication happens as described earlier (to keep number of terms in check).

So if you’ve understood till here, pat yourself on the back because you are now well equipped to solve this problem. You will find referring to setter’s code very much easy - just know that he has used some algebra library (which you should as well) instead of writing the code from scratch. So just trust those function names and read the code with a pinch of abstraction.

### SOLUTION

## Setter

```
// In The Name Of The Queen
#include<bits/stdc++.h>
using namespace std;
const int LG = 18, N = 1 << LG, Mod = 998244353, LGK = 30;
inline int Power(int a, int b)
{
int ret = 1;
for (; b; b >>= 1, a = 1LL * a * a % Mod)
if (b & 1) ret = 1LL * ret * a % Mod;
return (ret);
}
// ================================================================
const int MOD = 998244353;
struct mod_int {
int val;
mod_int(long long v = 0) {
if (v < 0) v = v % MOD + MOD;
if (v >= MOD) v %= MOD;
val = v;
}
static int mod_inv(int a, int m = MOD) {
// https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Example
int g = m, r = a, x = 0, y = 1;
while (r != 0) {
int q = g / r;
g %= r; swap(g, r);
x -= q * y; swap(x, y);
}
return x < 0 ? x + m : x;
}
explicit operator int() const {
return val;
}
mod_int& operator+=(const mod_int &other) {
val += other.val;
if (val >= MOD) val -= MOD;
return *this;
}
mod_int& operator-=(const mod_int &other) {
val -= other.val;
if (val < 0) val += MOD;
return *this;
}
static unsigned fast_mod(uint64_t x, unsigned m = MOD) {
#if !defined(_WIN32) || defined(_WIN64)
return x % m;
#endif
// Optimized mod for Codeforces 32-bit machines.
// x must be less than 2^32 * m for this to work, so that x / m fits in a 32-bit integer.
unsigned x_high = x >> 32, x_low = (unsigned) x;
unsigned quot, rem;
asm("divl %4\n"
: "=a" (quot), "=d" (rem)
: "d" (x_high), "a" (x_low), "r" (m));
return rem;
}
mod_int& operator*=(const mod_int &other) {
val = fast_mod((uint64_t) val * other.val);
return *this;
}
mod_int& operator/=(const mod_int &other) {
return *this *= other.inv();
}
friend mod_int operator+(const mod_int &a, const mod_int &b) { return mod_int(a) += b; }
friend mod_int operator-(const mod_int &a, const mod_int &b) { return mod_int(a) -= b; }
friend mod_int operator*(const mod_int &a, const mod_int &b) { return mod_int(a) *= b; }
friend mod_int operator/(const mod_int &a, const mod_int &b) { return mod_int(a) /= b; }
mod_int& operator++() {
val = val == MOD - 1 ? 0 : val + 1;
return *this;
}
mod_int& operator--() {
val = val == 0 ? MOD - 1 : val - 1;
return *this;
}
mod_int operator++(int) { mod_int before = *this; ++*this; return before; }
mod_int operator--(int) { mod_int before = *this; --*this; return before; }
mod_int operator-() const {
return val == 0 ? 0 : MOD - val;
}
bool operator==(const mod_int &other) const { return val == other.val; }
bool operator!=(const mod_int &other) const { return val != other.val; }
mod_int inv() const {
return mod_inv(val);
}
mod_int pow(long long p) const {
assert(p >= 0);
mod_int a = *this, result = 1;
while (p > 0) {
if (p & 1)
result *= a;
a *= a;
p >>= 1;
}
return result;
}
friend ostream& operator<<(ostream &stream, const mod_int &m) {
return stream << m.val;
}
};
namespace NTT {
vector<mod_int> roots = {0, 1};
vector<int> bit_reverse;
int max_size = -1;
mod_int root;
bool is_power_of_two(int n) {
return (n & (n - 1)) == 0;
}
int round_up_power_two(int n) {
while (n & (n - 1))
n = (n | (n - 1)) + 1;
return max(n, 1);
}
// Given n (a power of two), finds k such that n == 1 << k.
int get_length(int n) {
assert(is_power_of_two(n));
return __builtin_ctz(n);
}
// Rearranges the indices to be sorted by lowest bit first, then second lowest, etc., rather than highest bit first.
// This makes even-odd div-conquer much easier.
void bit_reorder(int n, vector<mod_int> &values) {
if ((int) bit_reverse.size() != n) {
bit_reverse.assign(n, 0);
int length = get_length(n);
for (int i = 0; i < n; i++)
bit_reverse[i] = (bit_reverse[i >> 1] >> 1) + ((i & 1) << (length - 1));
}
for (int i = 0; i < n; i++)
if (i < bit_reverse[i])
swap(values[i], values[bit_reverse[i]]);
}
void find_root() {
max_size = 1 << __builtin_ctz(MOD - 1);
root = 2;
// Find a max_size-th primitive root of MOD.
while (!(root.pow(max_size) == 1 && root.pow(max_size / 2) != 1))
root++;
}
void prepare_roots(int n) {
if (max_size < 0)
find_root();
assert(n <= max_size);
if ((int) roots.size() >= n)
return;
int length = get_length(roots.size());
roots.resize(n);
// The roots array is set up such that for a given power of two n >= 2, roots[n / 2] through roots[n - 1] are
// the first half of the n-th primitive roots of MOD.
while (1 << length < n) {
// z is a 2^(length + 1)-th primitive root of MOD.
mod_int z = root.pow(max_size >> (length + 1));
for (int i = 1 << (length - 1); i < 1 << length; i++) {
roots[2 * i] = roots[i];
roots[2 * i + 1] = roots[i] * z;
}
length++;
}
}
void fft_iterative(int N, vector<mod_int> &values) {
assert(is_power_of_two(N));
prepare_roots(N);
bit_reorder(N, values);
for (int n = 1; n < N; n *= 2)
for (int start = 0; start < N; start += 2 * n)
for (int i = 0; i < n; i++) {
mod_int even = values[start + i];
mod_int odd = values[start + n + i] * roots[n + i];
values[start + n + i] = even - odd;
values[start + i] = even + odd;
}
}
const int FFT_CUTOFF = 150;
vector<mod_int> mod_multiply(vector<mod_int> left, vector<mod_int> right) {
int n = left.size();
int m = right.size();
// Brute force when either n or m is small enough.
if (min(n, m) < FFT_CUTOFF) {
const uint64_t ULL_BOUND = numeric_limits<uint64_t>::max() - (uint64_t) MOD * MOD;
vector<uint64_t> result(n + m - 1, 0);
for (int i = 0; i < n; i++)
for (int j = 0; j < m; j++) {
result[i + j] += (uint64_t) ((int) left[i]) * ((int) right[j]);
if (result[i + j] > ULL_BOUND)
result[i + j] %= MOD;
}
for (uint64_t &x : result)
if (x >= MOD)
x %= MOD;
return vector<mod_int>(result.begin(), result.end());
}
int N = round_up_power_two(n + m - 1);
left.resize(N);
right.resize(N);
bool equal = left == right;
fft_iterative(N, left);
if (equal)
right = left;
else
fft_iterative(N, right);
mod_int inv_N = mod_int(N).inv();
for (int i = 0; i < N; i++)
left[i] *= right[i] * inv_N;
reverse(left.begin() + 1, left.end());
fft_iterative(N, left);
left.resize(n + m - 1);
return left;
}
vector<mod_int> mod_power(const vector<mod_int> &v, int exponent) {
assert(exponent >= 0);
vector<mod_int> result = {1};
if (exponent == 0)
return result;
for (int k = 31 - __builtin_clz(exponent); k >= 0; k--) {
result = mod_multiply(result, result);
if (exponent >> k & 1)
result = mod_multiply(result, v);
}
return result;
}
vector<mod_int> mod_multiply_all(const vector<vector<mod_int>> &polynomials) {
if (polynomials.empty())
return {1};
struct compare_size {
bool operator()(const vector<mod_int> &x, const vector<mod_int> &y) {
return x.size() > y.size();
}
};
priority_queue<vector<mod_int>, vector<vector<mod_int>>, compare_size> pq(polynomials.begin(), polynomials.end());
while (pq.size() > 1) {
vector<mod_int> a = pq.top(); pq.pop();
vector<mod_int> b = pq.top(); pq.pop();
pq.push(mod_multiply(a, b));
}
return pq.top();
}
}
// ================================================================
int n, k, P, A[N], Res[N], MP[N];
int main()
{
scanf("%d%d%d", &n, &P, &k);
for (int i = 0; i < n; i ++)
scanf("%d", &A[i]);
sort(A, A + n);
reverse(A, A + n);
int Zero = 0;
while (n && !A[n - 1])
Zero ++, n --;
Res[0] = (Power(n + Zero, k) - Power(n, k) + Mod) % Mod;
if (P == 2)
return !printf("%d %d\n", Res[0], Power(n, k));
int G = 3;
while(P > 3)
{
int c = 1;
for (int a = G; a != 1; a = a * G % P)
c ++;
if (c == P - 1)
break;
G += 2;
}
if (P == 3)
G = 2;
int c = 1;
for (int a = G; a != 1; a = a * G % P)
MP[a] = c, c = (c + 1) % P;
int lenp = 1;
while (lenp < P * 2 - 3)
lenp <<= 1;
mod_int rlenp = Power(lenp, Mod - 2);
vector < mod_int > F(lenp, 0), R(lenp, 0);
for (int i = 0; i < n; i ++)
F[MP[A[i]]] ++;
R[0] = 1;
for (int i = 0; i <= LGK; i ++, k >>= 1)
{
NTT::fft_iterative(lenp, F);
if (k & 1)
{
NTT::fft_iterative(lenp, R);
for (int j = 0; j < lenp; j ++)
R[j] *= F[j] * rlenp;
reverse(R.begin() + 1, R.end());
NTT::fft_iterative(lenp, R);
for (int j = 0; j < P - 1; j ++)
R[j] += R[j + P - 1];
for (int j = P - 1; j < lenp; j ++)
R[j] = 0;
}
if (i == LGK)
break;
for (int j = 0; j < lenp; j ++)
F[j] *= F[j] * rlenp;
reverse(F.begin() + 1, F.end());
NTT::fft_iterative(lenp, F);
for (int j = 0; j < P - 1; j ++)
F[j] += F[j + P - 1];
for (int j = P - 1; j < lenp; j ++)
F[j] = 0;
}
for (int i = 0, a = 1; i < P - 1; i ++, a = a * G % P)
Res[a] = R[i].val;
for (int i = 0; i < P; i ++)
printf("%d ", Res[i]);
return 0;
}
```

Time Complexity=O(P * logP *logK) (because frequency polynomial is of size P not N)

Space Complexity=O(P*logK)

### CHEF VIJJU’S CORNER

## More on Case of 0

The total number of sequences being N^k is obvious. For the other term - the sequences with no 0's, realize that we have N-F[0] choices for each chosen element. Hence, for k chosen elements, our choices are (N-F[0])^k

## Why are we not using nCk formulas

This is very basic of counting. Generally, nCk formulas implicitly mean a given object can be chosen only once. So thats out.

But what about the nCk formulas where an object can be chosen multiple times? Even if they did work, thats a lot more work than using the basic fundamental “We have N-F[0] choices for each index, hence (N-F[0])^k choices for k elements.” which is known to be correct.

## Basic Funda of Generator/Primitive Root

Say we have a set containing numbers from 1 to P-1 , and that P is prime. We call a number G generator of the set, if each of its power from 0 to P-2 modulo P correspond to a different number in the set. In other words, all the numbers in range 1 to P-1 can be written as powers of G.

And no, this is NOT going to be decimal or involve floating points. We can easily prove G is an integer, because G^1 is a power of G and it must represent one element of the set - hence G^1 is going to be one element in range 1 to P-1.

The main error people do while understanding this is that they forget this is modular arithmetic - even if G^k exceeds P, we are only interested in G^k \%P.

## Why powers are done modulo P-1 instead of P

To know the answer - ask yourself, why are you thinking of doing it modulo P in first place? Because everything happens modulo P everywhere? Note that I am assuming P to be prime here.

You’d be surprised to know how basic and yet tricky it is. We, in fact, do powers modulo P-1 because everything happens modulo P.

By Fermat’s little theorem, we know that K^{P-1} \%P=1. Now, say we want to find K^X \%P, where X is very large, in fact, larger than what you can store in 64-bit integer. For instance, say you know from problem that X=6^N for N upto 10^9. So you have to find \large K^{6^N} \%P.

How do we do this? Fermat’s little theorem to rescue! We know that K^{P-1}=1. Hence, we can rewrite X as X=A*(P-1)+B, and hence K^X=K^{A*(P-1)}*K^B=(K^{P-1})^A*K^B=1^A*K^B=K^B.

Hence we do modulo P-1 for powers.

- Say you have to calculate something like \large K^{3^{3^{N}}} \% P where P is prime. How will you proceed?

## On Polynomial used in All Possible Sums

The concept is actually simple and can be shown with example. Say you have an array A=[3,3,3] and another array B=[1,1]. What are the possible sums and number of ways to get them? The possible sum is 4, and can be obtained in 3*2=6 ways. Note that 2 ways are same if elements from same indices (for both arrays) are picked in both the ways. Hence its easy to see that number of ways is nothing but product of frequency.

Now, when we do a multiplication of these frequency polynomial - the powers of x add. This represents the sum which is possible. The coefficients of these powers (which are just the frequency of elements) multiply - which represent the number of ways to achieve them desired sum.

## How to go about these questions involving FFT

Usually I see many people struggling to learn FFT. Why they struggle is because, they try to go directly into the math behind it. Well, its really intimidating.

Do you know how pros deal with FFT? Its simple - they dont. Most of people just make themselves aware on where to use it. Instead of equipping themselves with why it is correct (where many people get lost and give up), they equip themselves with where they can use it if given a problem.

As you see in setter’s solution - many people generally use libraries and templates to solve these kind of questions, which is kind of good because you do not suffer from any bugs in implementation of those parts. That also stresses on the fact that they tackled the conept by knowing how to use it rather than why is it correct.

So if you are learning FFT, just know what it does, and where to use it. Skip those proof using complex numbers for now.

## Setter's Notes

**Expected Solution:** We can rewrite each A[i] as A[i] \% P. Now we have to handle the cases where the product becomes zero.

If there’s at least one zero among the K numbers, the product becomes zero. So we can calculate this.

Now we delete all the zeros. In order to solve the problem we have to realize that multiplication is hard and we have to transform it into something like addition. We can find a primitive root G for P and rewrite A[i] = X[i] such that G^X[i] = A[i]. Now multiplication in the previous problem becomes addition in this one. We add their powers when we multiply them.

So we are actually given a knapsack problem. We want to know in how many ways we can choose K numbers such that their sum modulo (P-1) equals R for all Rs. Define a polynomial F such that F[val] equals the number of vals in our new array. Now we can write F in point value form and raise each number to the Kth power and then rewrite F in polynomial form. This gives us the answer.

The only problem is that F should have a size of N*P. We can however fix this with a trick. Compute T[1] = F. Now compute T[2] = T[1] * T[1]. T[2] should have twice the size of T[1]. When we computed T[2], for each i \geq P-1 we can add the value of T[2][i] to T[2][i \% (P-1)] and only keep the first P-1 values. This way the size of T[2] remains P-1. We can then compute T[3], T[4], ..., T[log(K)]. This requires O(P*log(P)*log(K)).

Now to get the answer we should multiply some of this polynomials (there are at most log(K) polynomials) with the same trick described above (reducing by half). This gives us a O(P*log(P)*log(K)) solution.

## Related Problems

- Chef and Subsequences - More on Multiplication to Addition trick
- Three Tower Coloring -Fast Expo and Fermat’s little theorem.
- POLYEVAL - FFT
- BINOSUM - FFT and Combinatorics
- CLGSUM - FFT.