Effective 2 GB limit on blend input
Bug #373398 reported by
Rob Speer
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Divisi |
Fix Committed
|
Medium
|
Rob Speer |
Bug Description
The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.
There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.
This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:
* Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
* SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
Changed in divisi: | |
assignee: | nobody → Rob Speer (rspeer) |
importance: | Undecided → Medium |
status: | New → Confirmed |
To post a comment you must log in.
The blend tensor only has to keep the input tensors around if you want
to adjust the blending factors. It could throw them out otherwise.
Likewise, the conversion to a CSCMatrix could be made to be
destructive. Or SVDLIBC could be ported to work on a Tensor directly.
Another option is actually storing the biggest tensors on disk, using
(gasp!) ZODB. This is actually efficient. Sorta.
We also have some low-hanging fruit: DictTensor is storing Python
objects, not raw integers.
-Ken
On Thu, May 7, 2009 at 3:46 PM, Rob Speer <email address hidden> wrote: /bugs.launchpad .net/bugs/ 373398 _______ _______ _______ _______ _______ _____ /launchpad. net/~commonsens e /launchpad. net/~commonsens e /help.launchpad .net/ListHelp
> Public bug reported:
>
> The blending code currently multiplies all the input data, and puts it
> into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input
> tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or,
> equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We
> need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> ** Affects: divisi
> Importance: Medium
> Assignee: Rob Speer (rspeer)
> Status: Confirmed
>
>
> ** Tags: efficiency
>
> ** Changed in: divisi
> Importance: Undecided => Medium
>
> ** Changed in: divisi
> Status: New => Confirmed
>
> ** Changed in: divisi
> Assignee: (unassigned) => Rob Speer (rspeer)
>
> --
> Effective 2 GB limit on blend input
> https:/
> You received this bug notification because you are a member of
> Commonsense Computing, which is the registrant for Divisi.
>
> Status in Divisi: Confirmed
>
> Bug description:
> The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> _______
> Mailing list: https:/
> Post to : <email address hidden>
> Unsubscribe : https:/
> More help : https:/
>