Improve mongodb full-scan query performance: replication or sharding?

We are currently facing a situation where we can't avoid doing a collection full-scan. We have already optimize the query and the data structure but we would like to go further and take full advantage of sharding and replication.

Configuration

- mongodb version 3.2
- monogo-java-driver 3.2
- storageEngine: wiredTiger
- compression level: snappy
- database size : 6GB

Documents structure:

individuals collection

{
    "_id": 1, 
    "name": "randomName1", 
    "info": {...}
}, 
{
    "_id": 2, 
    "name": "randomName2", 
    "info": {...}
},
[...]
{
    "_id": 15000, 
    "name": "randomName15000", 
    "info": {...}
}

values collection

{
    "_id": ObjectId("5804d7a41da35c2e06467911"),
    "pos": NumberLong("2090845886852"),
    "val": 
        [0, 0, 1, 0, 1, ... 0, 1]
},
{
    "_id": ObjectId("5804d7a41da35c2e06467912"),
    "pos": NumberLong("2090845886857"),
    "val": 
        [1, 1, 1, 0, 1, ... 0, 0]
}

The "val" array contain an element for each individual (so the length of the array is up to 15000). The id of the individual is it's corresponding index in the "val" array.

Query

The query is to find documents from values collection where the sum of val[individual._id] is above a specific treshold for a list of idividuals. We can't just pre-compute the sum of the array since the list of individuals wanted change during runtime (we may want to get the result for only the first 2000 individuals for example). This query use the aggregation framework.

What we're currently doing:

We split the query in 100-500 subqueries and run them 5 by 5 in parallel.

The first subquery would be the same query for documents where pos > 0 and pos < 50000, the second for documents where pos > 50000 and pos < 100000 ect...

We would like to be able to run more subqueries in the same time, but we're facing performance loss when running more than 5 on a single mongod instance.

So the question is : should we go for replication or for sharding (or for both) in order to run the maximum number of subqueries in the same time ? How could we configure mongodb to dispatch subqueries among replica/shards as best as possible?

Improve mongodb full-scan query performance: replication or sharding?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112