Sending 12000 byte buffers produces an obscure error.

Bug #1331897 reported by Dmitry Kiselev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Accelio
Fix Committed
High
Eyal Salomon

Bug Description

When I send 12000 buffers and RDMA read is triggered within the Accelio, but I get a CONNECTION_ERROR event instead of something meaningful.

The following block of code in xio_rdma_datapath.c, function: xio_sched_rdma_rd_req, obscures the error,
since EINVAL doesn't describe it when no RDMA read is possible in my opinion:

 if (user_assign_flag) {
  /* if user does not have buffers ignore */
  if (task->imsg.in.data_iovlen == 0) {
   WARN_LOG("application has not provided buffers\n");
   WARN_LOG("rdma read is ignored\n");
   task->imsg.status = XIO_E_PARTIAL_MSG;
   return -1;
  }
  for (i = 0; i < task->imsg.in.data_iovlen; i++) {
   if (task->imsg.in.data_iov[i].mr == NULL) {
    ERROR_LOG("application has not provided mr\n");
    ERROR_LOG("rdma read is ignored\n");
    task->imsg.status = EINVAL;
    return -1;
   }
   llen += task->imsg.in.data_iov[i].iov_len;
  }
  if (rlen > llen) {
   ERROR_LOG("application provided too small iovec\n");
   ERROR_LOG("remote peer want to write %zd bytes while" \
      "local peer provided buffer size %zd bytes\n",
      rlen, llen);
   ERROR_LOG("rdma read is ignored\n");
   task->imsg.status = EINVAL;
   return -1;
  }

Otherwise, please, kindly, let us know.

Also, from a surface look it looks as a kind of inconsistancy, when you can send messages even without registering a Memory Region up to a certain number of bytes, and then you add a byte and get a CONNECTION_ERROR.
Perhaps, if there is an edge there, the error reporting should be very presize, like ERR_MAX_LENGTH_WITH_NO_MR_IS_REACHED, or similar.
(Ideally it should be able to send all the message sizes (even sliced), or nothing at all)

Please, kindly, let me know if you had some other pattern in mind.

Thanks!
Dmitry

description: updated
Eyal Salomon (esalomon)
Changed in accelio:
importance: Undecided → High
status: New → Fix Committed
assignee: nobody → Eyal Salomon (esalomon)
Revision history for this message
Eyal Salomon (esalomon) wrote :

The current actions that took place after this error was raised were invalid.
Application should not get connection error, but need to continue and get notification with the invalid message.
The message will contain error in the message's status field.
since there are various reasons to have error in that operation, elaborate error is printed in the log.

Revision history for this message
Eyal Salomon (esalomon) wrote :

fixed in commit 7c49e07c4a324cc4c29177b204f0393a00885354 in the "for_next" branch

Revision history for this message
Dmitry Kiselev (dmitri-kiselev) wrote :

I checked your last change in the for_next branch, with the new error constants - and it looks good to me.
Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.