Hi everybody!
It looks like large matrix multiplication is extremely slow and I was looking for strategy to improve it.
The classical 3 for loop is extremely slow:
C(m, n) = A(m, k) * B(k, n)
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
for (int p = 0; p < k; p++) {
C(i, j) += A(i, p) * B(p, j);
}
}
}
This version is 2.5 times faster but 2.5 times faster is unfortunately not enough for large matrix:
float* dstPtr = output.m_data;
const float* leftPtr = m_data;
for (size_t i = 0; i < m_rows; ++i) {
for (size_t j = 0; j < other.m_cols; ++j) {
const float* rightPtr = other.m_data + j;
float sum = leftPtr[0] * rightPtr[0];
for (size_t n = 1; n < m_cols; ++n) {
rightPtr += other.m_cols;
sum += leftPtr[n] * rightPtr[0];
}
*dstPtr++ = sum;
}
leftPtr += m_cols;
}
Any experience or idea about this problem?
Thanks a lot!