contrail-vrouter-agent:Agent introspect does not respond to request sometimes

Bug #1447937 reported by Sandip Dey
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Megh Bhatt
Trunk
Fix Committed
High
Megh Bhatt

Bug Description

Request to <agent>:8085 getting stuck.

Its happening in sanity .

Reproduced the problem and showed to Manish/Hari

Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :
Download full text (3.9 KiB)

copying debug info

we have taken a gcore of vrouter-agent and I have narrowed it down to issue because of following check-in
https://github.com/Juniper/contrail-controller/commit/902a5872d9c069a5beb9d236e899f67a8605351b#diff-949e5ca38dfdab9ce14ec60e30fe211c

There is no task scheduled to process the entries in the concurrent queue request_queue in http session
And session just hangs.

This happens because checking for empty along with push/pop is not an atomic operation.
So it so happens that we check the queue to be not empty but while we try to enqueue an entry it was already empty because of the parallel task.
And after enqueue of this entry we don’t start a new task for the same.

Please find the debug info pasted below.

We should consider replacing this task creation and concurrent queue with a workqueue, if it helps ?

(gdb) p SandeshHttp::hServ_
$1 = (HttpServer *) 0x7ff59806dc50
(gdb) pset $1->session_ref_ boost::intrusive_ptr<TcpSession>
elem[0]: $2 = {
  px = 0x30720f0
}
Set size = 1
(gdb) p (HttpSession *) 0x30720f0
$3 = (HttpSession *) 0x30720f0
(gdb) p $3->request_
request_builder_ request_queue_
(gdb) p $3->request_queue_
$4 = (tbb::strict_ppl::concurrent_queue<HttpRequest*, tbb::cache_aligned_allocator<HttpRequest*> >) {
  <tbb::strict_ppl::internal::concurrent_queue_base_v3<HttpRequest*>> = {
    <tbb::strict_ppl::internal::concurrent_queue_page_allocator> = {
      _vptr.concurrent_queue_page_allocator = 0x1103770
    },
    members of tbb::strict_ppl::internal::concurrent_queue_base_v3<HttpRequest*>:
    my_rep = 0x3093e00
  },
  members of tbb::strict_ppl::concurrent_queue<HttpRequest*, tbb::cache_aligned_allocator<HttpRequest*> >:
  my_allocator = {<No data fields>}
}
(gdb) p $3->context_
context_map_ context_str_
(gdb) p $3->context_str_
$5 = {
  static npos = <optimized out>,
  _M_dataplus = {
    <std::allocator<char>> = {
      <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
    members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
    _M_p = 0x3041db8 "http%10.204.217.75:8085::10.204.216.7:44164"
  }
}
(gdb) p $3->context_map_
$6 = (HttpSession::map_type *) 0x301aaf0
(gdb) p *$3->context_map_
$7 = {
  _M_t = {
    _M_impl = {
      <std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::intrusive_ptr<HttpSession> > > >> = {
        <__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::intrusive_ptr<HttpSession> > > >> = {<No data fields>}, <No data fields>},
      members of std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::intrusive_ptr<HttpSession> >, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::intrusive_ptr<HttpSession> > >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<...

Read more...

Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Megh Bhatt (meghb)
importance: Undecided → High
tags: added: base
summary: - contrail-vrouter-agent:Agent introspect does not respond to equest
+ contrail-vrouter-agent:Agent introspect does not respond to request
sometimes
Megh Bhatt (meghb)
Changed in juniperopenstack:
status: New → In Progress
Revision history for this message
venu kolli (vkolli) wrote :

I observe this issue on HA setup as well and more often .

tags: added: blocker
Revision history for this message
Megh Bhatt (meghb) wrote :

Fixed via

Reviewed: https://review.opencontrail.org/9894
Committed: http://github.org/Juniper/contrail-controller/commit/c1802eeaf3035fc4bd4fd21a7eaae2e4a827c92e
Submitter: Zuul
Branch: master

commit c1802eeaf3035fc4bd4fd21a7eaae2e4a827c92e
Author: Megh Bhatt <email address hidden>
Date: Mon May 4 17:38:44 2015 -0700

Use return value of concurrent queue try_pop to determine
whether to enqueue HTTP session request queue processing
task. The current method of using empty with try_pop/push
without mutex can lead to situation where concurrent queue
is not empty and no task is enqueued.
Closes-Bug: #1447837

Change-Id: Ia475d3fa6b1d0fb540a159d5aed47a3dde96f2fc

Changed in juniperopenstack:
status: In Progress → Fix Committed
information type: Proprietary → Public
Raj Reddy (rajreddy)
tags: added: analytics
Revision history for this message
Megh Bhatt (meghb) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.