I'm using kolla mitaka.
The elasticsearch service is logging a lot with the following WARN message that to be related to limits of open files in the system...
combined with https://bugs.launchpad.net/kolla/+bug/1634223 , my controller ends up having the disk full.
{"log":"[2016-12-14 10:32:17,357][WARN ][cluster.action.shard ] [10.194.148.130] [log-2016.12.14][3] received shard failed for target shard [[log-2016.12.14][3], node[Dc0drTfbTVmaPqnRcgO-VQ], [P], v[2562], s[INITIALIZING], a[id=3bSbLDfFQRCMsCaQt_H6XQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-12-14T10:32:17.284Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-1610097421722097931.tlog: Too many open files]; ]]], indexUUID [a205oP2fQA2H4i_4Cvn8bA], message [failed recovery], failure [IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-2129181083727687428.tlog: Too many open files]; ]\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357502633Z"}
{"log":"[log-2016.12.14][[log-2016.12.14][3]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-2129181083727687428.tlog: Too many open files];\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357537296Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:250)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357557636Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357573309Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.35758758Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357601736Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357618113Z"}
{"log":"\u0009at java.lang.Thread.run(Thread.java:745)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357633582Z"}
{"log":"Caused by: [log-2016.12.14][[log-2016.12.14][3]] EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-2129181083727687428.tlog: Too many open files];\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357647649Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngine.\u003cinit\u003e(InternalEngine.java:155)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357662288Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357677185Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1515)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357691334Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1499)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357705544Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:972)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357719223Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357745674Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357762113Z"}
{"log":"\u0009... 5 more\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357777274Z"}
{"log":"Caused by: java.nio.file.FileSystemException: /var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-2129181083727687428.tlog: Too many open files\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357791456Z"}
{"log":"\u0009at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357806146Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357820267Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357843713Z"}
{"log":"\u0009at sun.nio.fs.UnixCopyFile.copyFile(UnixCopyFile.java:245)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357859138Z"}
{"log":"\u0009at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:579)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357873197Z"}
{"log":"\u0009at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357886438Z"}
{"log":"\u0009at java.nio.file.Files.copy(Files.java:1227)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357900243Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:344)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357914278Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.\u003cinit\u003e(Translog.java:179)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357928207Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357942722Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngine.\u003cinit\u003e(InternalEngine.java:151)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357957482Z"}
{"log":"\u0009... 11 more\r\n","stream":"stdout","time":"2016-12-14T10:32:17.357971582Z"}
{"log":"[2016-12-14 10:32:17,390][WARN ][indices.memory ] [10.194.148.130] failed to set shard [log-2016.12.14][4] index buffer to [4mb]\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391099998Z"}
{"log":"org.apache.lucene.store.AlreadyClosedException: translog [13] is already closed\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391150226Z"}
{"log":"\u0009at org.elasticsearch.index.translog.TranslogWriter.ensureOpen(TranslogWriter.java:329)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391160003Z"}
{"log":"\u0009at org.elasticsearch.index.translog.BufferingTranslogWriter.updateBufferSize(BufferingTranslogWriter.java:143)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.3911688Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.updateBuffer(Translog.java:406)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391177599Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.updateBufferSize(IndexShard.java:1170)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.39118565Z"}
{"log":"\u0009at org.elasticsearch.indices.memory.IndexingMemoryController.updateShardBuffers(IndexingMemoryController.java:232)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391192183Z"}
{"log":"\u0009at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:286)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391205206Z"}
{"log":"\u0009at org.elasticsearch.indices.memory.IndexingMemoryController.forceCheck(IndexingMemoryController.java:245)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391212067Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.markLastWrite(IndexShard.java:1052)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391241708Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:970)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391249238Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.39125676Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391265237Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391271753Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391279651Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391287946Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391295084Z"}
{"log":"\u0009at java.lang.Thread.run(Thread.java:745)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391303032Z"}
{"log":"Caused by: java.nio.file.FileSystemException: /var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/4/translog/translog.ckp: Too many open files\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391312051Z"}
{"log":"\u0009at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391319534Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391326595Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391335493Z"}
{"log":"\u0009at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391342506Z"}
{"log":"\u0009at java.nio.channels.FileChannel.open(FileChannel.java:287)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391351404Z"}
{"log":"\u0009at java.nio.channels.FileChannel.open(FileChannel.java:334)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391357774Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Checkpoint.write(Checkpoint.java:88)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391365528Z"}
{"log":"\u0009at org.elasticsearch.index.translog.TranslogWriter.writeCheckpoint(TranslogWriter.java:314)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391373645Z"}
{"log":"\u0009at org.elasticsearch.index.translog.TranslogWriter.checkpoint(TranslogWriter.java:304)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391381487Z"}
{"log":"\u0009at org.elasticsearch.index.translog.BufferingTranslogWriter.sync(BufferingTranslogWriter.java:132)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391411092Z"}
{"log":"\u0009at org.elasticsearch.index.translog.TranslogWriter.syncUpTo(TranslogWriter.java:288)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391446397Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:672)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.39145355Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.sync(IndexShard.java:1633)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391460547Z"}
{"log":"\u0009at org.elasticsearch.action.support.replication.TransportReplicationAction.processAfterWrite(TransportReplicationAction.java:1035)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391468177Z"}
{"log":"\u0009at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:295)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391475551Z"}
{"log":"\u0009at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391491689Z"}
{"log":"\u0009at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391501027Z"}
{"log":"\u0009at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391509248Z"}
{"log":"\u0009at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391522671Z"}
{"log":"\u0009at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391531395Z"}
{"log":"\u0009at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391539113Z"}
{"log":"\u0009at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.39154619Z"}
{"log":"\u0009at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391553592Z"}
{"log":"\u0009... 3 more\r\n","stream":"stdout","time":"2016-12-14T10:32:17.391560986Z"}
{"log":"[2016-12-14 10:32:17,406][WARN ][index.translog ] [10.194.148.130] [log-2016.12.14][3] failed to delete temp file /var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-620932528422074502.tlog\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406464699Z"}
{"log":"java.nio.file.NoSuchFileException: /var/lib/elasticsearch/data/kolla_logging/nodes/0/indices/log-2016.12.14/3/translog/translog-620932528422074502.tlog\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406495014Z"}
{"log":"\u0009at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406503474Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406511354Z"}
{"log":"\u0009at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406516924Z"}
{"log":"\u0009at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406523258Z"}
{"log":"\u0009at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406530125Z"}
{"log":"\u0009at java.nio.file.Files.delete(Files.java:1079)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406538358Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:358)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406545081Z"}
{"log":"\u0009at org.elasticsearch.index.translog.Translog.\u003cinit\u003e(Translog.java:179)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406553862Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406565086Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngine.\u003cinit\u003e(InternalEngine.java:151)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.40657703Z"}
{"log":"\u0009at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406587722Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1515)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406598903Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1499)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406623683Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:972)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406637516Z"}
{"log":"\u0009at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.4066508Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406664457Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406677259Z"}
{"log":"\u0009at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.40669053Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406703672Z"}
{"log":"\u0009at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406716748Z"}
{"log":"\u0009at java.lang.Thread.run(Thread.java:745)\r\n","stream":"stdout","time":"2016-12-14T10:32:17.406730381Z"}
we may have to increase the ulimit for nofile :
http:// stackoverflow. com/questions/ 24318543/ how-to- set-ulimit- file-descriptor -on-docker- container- the-image- tag-is- phusion
This is doable by passing the "--ulimit arg" for the "docker run" command of elasticsearch