Boosting dark matter searches at muon colliders with Machine Learning: the mono-Higgs channel as a case study

The search for dark-matter (DM) candidates at high-energy colliders is one of the most promising avenues to understand the nature of this elusive component of the universe. Several searches at the Large Hadron Collider (LHC) have strongly constrained a wide range of simplified models. The combination of the bounds from the LHC with direct-detection experiments exclude the most minimal scalar singlet DM model. To address this, Lepton portal DM models are suitable candidates where DM is predominantly produced at lepton colliders since the DM candidate only interacts with the lepton sector through a mediator that carries a lepton number. In this work, we analyse the production of DM pairs in association with a Higgs boson decaying into two bottom quarks at future muon colliders in the framework of the minimal lepton portal DM model. It is found that the usual cut-based analysis methods fail to probe heavy DM masses for both the resolved (where the decay products of the Higgs boson can be resolved as two well-separated small-$R$ jets) and the merged (where the Higgs boson is clustered as one large-$R$ jet). We have then built a search strategy based on Boosted-Decision Trees (BDTs). We have optimised the hyperparameters of the BDT model to both have a high signal-to-background ratio and to avoid overtraining effects. We have found very important enhancements of the signal significance with respect to the cut-based analysis by factors of $8$--$50$ depending on the regime (resolved or merged) and the benchmark points. Using this BDT model on a one-dimensional parameter space scan we found that future muon colliders with $\sqrt{s}=3$ TeV and ${\cal L} = 1~{\rm ab}^{-1}$ can exclude DM masses up to $1$ TeV at the $95\%$ CL.

The search for dark-matter (DM) candidates at high-energy colliders is one of the most promising avenues to understand the nature of this elusive component of the universe.Several searches at the Large Hadron Collider (LHC) have strongly constrained a wide range of simplified models.The combination of the bounds from the LHC with direct-detection experiments exclude the most minimal scalar singlet DM model.To address this, Lepton portal DM models are suitable candidates where DM is predominantly produced at lepton colliders since the DM candidate only interacts with the lepton sector through a mediator that carries a lepton number.In this work, we analyse the production of DM pairs in association with a Higgs boson decaying into two bottom quarks at future muon colliders in the framework of the minimal lepton portal DM model.It is found that the usual cut-based analysis methods fail to probe heavy DM masses for both the resolved (where the decay products of the Higgs boson can be resolved as two well-separated small-R jets) and the merged (where the Higgs boson is clustered as one large-R jet).We have then built a search strategy based on Boosted-Decision Trees (BDTs).We have optimised the hyperparameters of the BDT model to both have a high signal-to-background ratio and to avoid overtraining effects.We have found very important enhancements of the signal significance with respect to the cut-based analysis by factors of 8-50 depending on the regime (resolved or merged) and the benchmark points.Using this BDT model on a one-dimensional parameter space scan we found that future muon colliders with √ s = 3 TeV and L = 1 ab −1 can exclude DM masses up to 1 TeV at the 95% CL.

I. INTRODUCTION
Weakly interacting massive particles (WIMPs) are suitable candidates which extend the Standard Model (SM) in order to solve the dark-matter (DM) problem [1][2][3][4].In particular WIMPs of mass of about 100 GeV yield a DM density of Ω DM h 2 = 0.1198 ± 0.0015 in agreement with the Planck observation [5].On the other hand, WIMPs can be connected to other problems like the hierarchy problem, baryon asymmetry in the universe or the nature of neutrino mass generation mechanism.The understanding of the nature of WIMPs has driven many searches in direct detection, indirect detection and collider experiments.At the Large Hadron Collider (LHC), the ATLAS and CMS collaborations have carried several searches in channels dubbed as mono-X, i.e. channels where a visible particle recoils against a large transverse missing energy (E miss T ).Unfortunately all these searches were unsuccessful to find any interesting signal beyond the expected SM backgrounds and model-independent bounds were put on the DM mass versus the production cross section.On the other hand, strong bounds from direct detection experiments were also imposed as a consequence of the absence of new DM signals [6,7].By combining the results from both collider and direct detection experiments, we can narrow down the range of plausible beyond the SM scenarios aimed at addressing the DM problem.While the constraints from direct-detection experiments can be circumvented if the DM candidate is a right-handed fermion for example 1 , earlier and current collider searches strictly constrain minimal models (see for example a recent reinterpretation of multijet+E miss T searches [11]).To avoid these issues, there is a possibility that the DM candidate couples solely to the lepton sector of the SM.In such scenarios the DM is called leptophilic and the models that contain such a DM are called lepton portal models.In the case of fermionic DM candidates, the interaction Lagrangian resembles to that of slepton-lepton-neutralino interaction in supersymmetric models.The most minimal choice in this case would consist of extending the SM with only two SU (2) L singlets.The phenomenology of these minimal models has been studied in Refs.[12][13][14][15][16][17][18][19].
The LHC programme for DM searches is still not over with the possibility to probe DM masses as high as 0.5-2 TeV depending on the model.However, there is a priori no reason to not consider alternative future collider experiments which can both achieve high center-of-mass energies and provide very clean environments.Muon colliders are expected to provide both these characterestics at relatively low costs as compared to e.g.future circular hadron colliders (FCC-hh).There is a growing interests on the physics potential of muon colliders since they can probe new physics beyond the SM at very high scales [20][21][22].There are many reasons for this interest.First, muon colliders can achieve small signal-to-background ratios as compared to the LHC.Second, given that muons are elementary particles, the center-of-mass energy required to achieve the same beam-level cross section is always orders of magnitude smaller than that in pp collisions.Finally, at energies much higher than the production threshold of heavy resonances, muon colliders become vector-boson colliders where the dominant production channels occur through vector-boson fusion (VBF) [23,24] in which case many processes in beyond the SM are free of backgrounds.Extensive studies of both the SM and beyond the SM have been carried in the literature [19,.
The main production mechanism of DM at muon colliders is through mono-X channels.For the case of muon colliders, the phenomenology of DM in mono-photon, and mono-muon channels has been performed at the parton level in Ref. [28].The mono-Higgs channel is however very unique since the SM Higgs boson is the only particle that can be produced from the interaction with the dark sector due to the smallness of the muon-Yukawa coupling unlike the mono-photon channel which can also be produced from Initial-State Radiation (ISR).Therefore, the mono-Higgs channel can be a very important channel to study the characteristics of the underlying model.The mono-Higgs channel has been suggested sometime ago in Ref. [64] and studied extensively for both hadron colliders and lepton colliders (see e.g.Refs.[65][66][67][68][69][70][71][72][73][74][75][76][77]).In Ref. [19] a comprehensive analysis of the DM production within minimal lepton portal DM model has been done including the mono-Higgs production.This article presents a first study in a serie of upcoming works where we fully analyse the most prominent channels for DM at muon colliders and using state-of-art tools.In this study, we analyse the production of DM in association with a Higgs boson decaying into bottom quarks at future muon colliders using center-of-mass energy of 1 TeV and a total luminosity of 1 ab −1 .The final state consists of jets and missing energy.We use both the resolved regime where the decay products of the Higgs boson are two well-separated jets and the merged regime where the Higgs boson is identified with a single large-R jet.First we use cut-based analysis inspired from the searches of DM in the mono-Higgs channel that have been carried by the ATLAS and CMS collaborations [78,79].It is found that cut-based search strategies are not sensitive to heavy DM masses.We then employ a boosted-decision Tree (BDT) algorithm as implemented in XGBoost to further improve the signal-to-background ratio.We found that the BDT model leads to very important enhancements of the significance by large factors of 8-50 depending on the analysis region and the DM mass.We find that DM masses up to 1 TeV can be probed at future muon colliders if one employs BDTs.
The rest of this paper is organised as follows.In section II we briefly describe the theoretical setup including the benchmark points and the scanning procedure.We define the technical setup along with the discussion about the signal and the background cross sections and the object definitions at the detector level in section III.Section IV is devoted to a detailed cut-based analysis.We then study the sensitivty reach using the BDT model in section V. We conclude in section VI.

II. THEORETICAL SETUP
In this section, we discuss the theoretical setup used in this work.We briefly introduce the model, its particle content and its parameters.We close this section by a discussion of the benchmark points and the scanning procedure to be used in the collider analysis.

A. The model
We extend the SM by two SU (2) L gauge-singlet fields: a charged scalar (S) and a right-handed fermion (N R ).The quantum numbers of these new states are shown below The new particles of the model and their representation under In this setup, we have introduced an accidental discrete symmetry (Z 2 ) under which the new particles are odd while all the SM particles are even.In this case, the right-handed fermion, being neutral particle, is a suitable candidate for DM, assuming it is lighter than the singlet scalar.We furthermore assume that the charged scalar singlet carries a lepton number which therefore implies that it only interacts with the SM charged leptons (right-handed leptons in particular).The resulting interaction would be similar to the case of slepton-neutralino-lepton in supersymmetric theories with the exception that here we only have one slepton particle that interacts with all the lepton generations.Under these assumptions, the most general Lagrangian is given by where L S represents the kinetic term for the charged scalar singlet, given by Here, in the first line, the first term refers to the kinetic energy of S, while the second term represents the triple interaction of S with the photon and the Z boson.The second line illustrates the quartic interaction of S with the γ/Z.In equation weak mixing angle, and The notation L N + L SN refers to the Lagrangian of the right-handed fermion and its interaction with S, which can be expressed as follows: where M N is the mass of the N R particle and Y ℓ,ℓ=e,µ,τ are assumed to be real-valued couplings.In the last term, the sum over the lepton generations is implicit.Finally, the scalar potential is given by where Φ is the SM Higgs doublet: and υ = √ 2G F representing the vacuum expectation value (VEV), while G 0 and G + correspond to the Nambu-Goldstone bosons responsible for forming the longitudinal polarizations of the Z and W bosons, respectively.Following the electroweak symmetry breaking, we are left with three scalars: the CP-even scalar, identified as the 125 GeV Higgs boson, and a pair of charged scalars denoted as H ± in the subsequent discussion.Their tree-level masses are determined by: In addition to the SM parameters, this model has seven more parameters defined by The model is subject to various theoretical and experimental constraints.In this study, we take into account constraints from the stability of the scalar potential, unitarity of the scattering amplitudes, bounds from Higgs boson decays and lepton flavour violation.Additionally, we account for the constraints associated with the dark matter relic density and the direct detection experiments.Finally recasting the searches of sleptons and neutralinos at the LHC implies that mediator masses less than 400 GeV are excluded.For more details, we refer the interested reader to Refs.[16,19].

B. Benchmark points and scanning procedure
We will start by examining four benchmark points, as shown in table II and have been previously considered in Ref. [19].The illustration of the differential distributions and the performance of our algorithms will be discussed in great detail for these benchmark scenarios.Interestingly, the model predicts very simple relations  III.The total cross sections times the branching ratio (σ × BR) and the expected number of signal events for the NRNRHSM (→ b b).We consider three representative center-of-mass energies of 3, 10 and 30 TeV and we show the results for the benchmark points defined in table II.
between physical observables -production cross sections, relic density and spin-independent cross sections -and model parameters as follows: where F, G and H are functions of M2 N R and M 2 H ± .Therefore, it is clear that the physical observables exhibit simple scaling properties when changing from one choice of the parameters Y µ and λ 3 to another choice.We therefore perform the following scan over the mass of the DM particle of the model for

III. TECHNICAL SETUP A. Signal and backgrounds
In this work, we investigate the potential discovery of dark matter at muon colliders using the mono-Higgs channel: N R N R H SM (→ b b).Our analysis focuses on collision energy √ s = 3 TeV and an integrated process leads to a final state comprising of missing energy and at least two b-tagged jets (in the resolved case) or at least one large-R jet (in the boosted regime).The parton-level Feynman diagrams for the signal and the backgrounds are depicted in figures 1 and 2. The signal cross section receives two contributions, both occurring in t-and u-channels.The first contribution occurs through the double exchange of charged singlet scalar (1-a) while the second contribution occurs through the exchange of a muon in the t-channel (1-b).The second contribution is negligible since it is suppressed by the smallness of the Higgs-muon Yukawa coupling.The backgrounds can be split in two categories depending on their exact signature at the parton level: • Irreducible backgrounds: This category of background involves either the production of the SM Higgs boson in association with two SM neutrinos or the production of two gauge bosons (ZZ/W Z/W W ) where one gauge boson decays hadronically while the other decays invisibly (in the case of the Z-boson) or leptonically (in the case of W -boson) with one charged lepton escapes the detection.Note that the diboson production can be significantly reduced by requirements on the invariant of the hadronically decaying gauge boson to be off their on-shell mass window. (a) Example of Feynman diagrams for the background processes contributing to the b b + E miss T final state.Here we show the muon-annihilation channels (upper panel) and VBF channels (lower panel).
• Reducible backgrounds: This category contains the production of t t and t t + W/Z or the production through neutral current VBF (i.e.involving two charged leptons).We note that this category can be significantly reduced by several requirements on the number of hard charged leptons, or requirements on the invariant mass of the b b system that will form a Higgs candidate.
We present the cross-section results for the signal process in Table III corresponding to the four benchmark points defined in Table II and shown in Figure 3, as a function of the DM mass (M N R ).The background cross sections are shown in table IV.

B. Monte Carlo event generation
Samples for the signal and the backgrounds were generated using MadGraph_aMC@NLO version 3.4.1 [80] where we have used a dedicated model file in the UFO format [81] which we have produced using FeynRules version 2.3.0 [82].The model file along with instructions on how to use it can be found in this link.For the generation of both the signal and the background events, we have imposed some generator-level cuts: σ(µ T > 10 GeV and |η j | < 6. Background processes, on the other hand, receive sizeable contributions from the VBF channels, i.e.V V → X.The computation of the rates of these processes can be done by either considering the VBF/VBS of two gauge bosons through µ + µ − process following the lines of Ref. [23] or considering that electroweak gauge bosons form partons within muons with some parton distribution functions (PDFs) within the muons [24].Given that there is no validated treatment of initial state gauge boson PDFs for parton showers, we use the first approach in our simualtion of the VBF processes in the SM.For example, to simulate µ + µ − → W * W * → t t + X, we use the following syntax in MadGraph5_aMC@NLO:  > import model sm > generate mu+ e-> t t~vm~ve > add process mu+ e-> t t~mu+ e-> output MyOutput This syntax is necessary to isolate VBF contributions from the corrections of initial-state radiation (ISR) or final-state radiation (FSR) to s-channel contributions.Note that the syntax above corresponds to µ + e − scatterings and similar results can be found if we consider the complex conjugate of it, i.e. µ − e + scatterings.In our calculations, we have considered both neutral-current as well as charged-current contributions to the VBF processes.For example, V V → t t in table IV includes both contributions from µ + e − → t tν µ ν e and µ + e − → t tµ + e − .We have checked that our calculations of the background cross sections yield excellent agreement with the results of ref. [23].As for the signal processes, we found that there is no contribution to the production of the N R through VBF.This is can be understood since the N R particles are SU (2) L ⊗U (1) Y singlets and therefore do not couple directly to γ/Z or W gauge bosons.For each benchmark point in the signal and for all the backgrounds we have generated about 9×10 5 -3×10 6 parton-level events.The produced events are passed to Pythia version 8307 [83] to add resonance decays, parton showering and hadronisation.

C. Object definitions
In this section, we discuss the object definitions at the reconstruction level that we have used in our analysis.In this work, we define charged leptons (electrons or muons), hadronically decaying tau leptons (τ h ), small-R jets, large-R jets and missing transverse momentum.The details are shown below: • Electron candidates: Are required to have p e T > 7 GeV and |η| < 6. • Muon candidates: Muon candidates are required to have p T > 7 GeV and |η| < 6.
• Small-R Jets: Candidate jets are reconstructed with the anti-k t algorithm with a radius parameter R = 0.4 which are referred to as "small-R jets" [84].These jets are further required to have p T > 25 GeV and |η j | < 6.We use a ghost-based approach to tag "small-R jets" as b-jets where we assume a 70% b-tagging efficiency and a p T -dependent mistagging efficiency of light and charm jets as b-jets, i.e.
• Large-R Jets: In our analysis, the SM Higgs boson can be produced with very high transverse momentum.In such cases, the hadronic decay products of the SM Higgs boson can not be resolved into two isolated jets and therefore clustering based on small-R jet radius will have very small efficiencies.Therefore, we also utilize large-R jets in our analysis.A large radius parameter is chosen in order for a single large-R jet to capture all the constituents that are produced in the decay of a boosted Higgs boson.We perform two independent clustering algorithms along the same lines of the ATLAS [78] and CMS [79] analyses.First, we cluster jets using the anti-k t clustering algorithm with a jet radius of R = 1 [84] and these jets will be labeled as AK10 jets.Furthermore, we apply a trimming algorithm to remove any soft radiation [85].For this purpose we use the k t algorithm [86] where we remove any subjet of radius R = 0.2 that carries less than 5% of the total AK10 jet energy.An independent clustering algorithm will be used in our analysis where we cluster the jets using the Cambridge-Aachen algorithm and a jet radius of R = 1.5 [87] which will be denoted as CA15 jets.In this case, the soft-drop jet grooming algorithm [88] is employed to cut soft and wide-angle radiation from the CA15 jets where we use β = 1 and z cut = 0.1.For the training of the signal and the backgrounds and in order to optimise the sensitivity of the analysis, we use the ratios of the energy correlation functions constructed from the output of the CA15 jets.In this analysis, we construct two variables N 2 and M 2 which were found to be very powerful in discriminating two-prong boosted objects from QCD jets or three-prong boosted jets [89].They are defined as , N = 2 e (β) 3 ( 1 e where k e (β) i are the generalized i-point energy correlation function for the k pair-wise angles that enters their products and β is a parameter that controls the overall angular scaling of these operators.In this analysis, we choose β to be equal to 1.All the large-R jets are required to have p T > 200 GeV and |η| < 2.5.
• Missing Transverse Energy: The missing transverse momentum p miss T (with magnitude E miss T ) is the negative vector sum of the p T of all selected and calibrated objects in the event, including a term to account for energy from soft particles in the event which are not associated with any of the selected objects.
We further impose isolation requirements on charged leptons.A lepton isolation criterion is defined by imposing a cut on the following quantiy where the sum includes all tracks (excluding the lepton candidate itself) within the cone defined by ∆R < R cut about the direction of the charged lepton.The value of R cut is the smaller of r min and 10 GeV/p ℓ T , where r min is set to 0.3 for both the electron and the muon candidates, and p ℓ T is the lepton transverse momentum.All the charged lepton candidates must satisfy I R /p ℓ T < 0.3, which defines a loose-isolation criterion.Overlap removals are used to remove leptons or jets if they are within some defined ∆R of a given object.Electron (muon) candidates that lie within ∆R = 0.2 (0.4) of a jet candidate.Jets are also required to have ∆R = 0.4 of other leptons or jets in the event.
Fast detector simulation is performed using the SFS module [90] in MadAnalysis 5 [91][92][93][94][95][96].The calculation of the jet observables including the clustering of large-R jets and the removal of the soft radiation is done with the help of customised C++ analysis within the Substructure module [97].All the jets were clustered using FastJet version 3.4.0[98].The momentum smearing and identification efficiencies for electrons are implemented from the detector design of FCC-hh that is shipped in Delphes 3.4.0[99] (can be found in https://github.com/delphes/delphes/blob/master/cards/delphes_card_MuonColliderDet.tcl).For muons smearing and identification efficiencies we use the results of Ref. [100].II and the background processes shown as stacked histograms in the resolved regime.From the left to the right, we show the missing transverse energy, the transverse momentum of the Higgs candidate and, the invariant mass of the Higgs candidate.More details can be found in the text.

IV. CUT-BASED ANALYSIS
In this section we discuss the basic approach for the signal-to-background analysis which consists of performing basic event selection on both the signal and the background processes.In this analysis, we follow the analysis strategies of the recent ATLAS [78] and CMS [79] searches of DM produced in association with a Higgs boson and decaying to bottom quarks.

A. Resolved regime
For the resolved regime, we assume that the Higgs boson candidate is reconstructed from two small-R well-separated b-tagged jets.The key distributions for the signal-to-background optimisations are shown in Fig. 4. First, we require that the missing transverse energy is larger than 100 GeV.Electrons and muons that pass the loose isolation criteria defined in the previous section are vetoed.This criterion reduces the VBF processes that occur through neutral current interactions, i.e.Z * Z * → X. Events that contain at least one hadronically-decaying τ -lepton satisfying p τ T > 15 GeV and |η τ | < 6 are vetoed.We do not require the existence of any photon with p T > 10 GeV and |η| < 6.We furthermore require that the events contain at least three small-R untagged jets that satisfy p T > 25 GeV and |η| < 6 where at least two of them are b-tagged with |η b | < 2.5.To reduce backgrounds where E miss T arose from mismeasurement of the jet p T or from leptonic decays of heavy flavours, we require that the minimum ∆ϕ defined by ∆ϕ min ≡ min ∆ϕ(p miss , j 1 ), ∆ϕ(p miss , j 2 ), ∆ϕ(p miss , j 3 ) , to be larger than 20 • .Here, ∆ϕ(x, y) ≡ |ϕ x − ϕ y | and j 1,2,3 are the momenta of the three leading jets in the event.The Higgs boson candidate is reconstructed from the momenta of the two leading b-jets.Therefore, we require that all the events have exactly two b-jets.To be sure that we restrict the analysis in the resolved regime, we further require that E miss T ∈ ]300, 1000] GeV.Note that this requirement is slightly different from the ATLAS definition of the resolved region [78].Higgs boson candidates are required to have a transverse momentum that is larger than 300 GeV.To ensure that reconstructed Higgs boson candidate and the missing transverse momentum (p miss ) are back-to-back, we require that ∆ϕ(p miss , p H ) > 2π/3.Backgrounds where E miss T originates from leptonically-decaying W -boson have the particular property that the transverse mass formed by E miss T and either the leading or the subleading b-jets to be bounded by the top quark mass from above.These two variables are defined as The cutflow table for the event selection used in the resolved region for the backgrounds and an example of the signal event (BP1).For each entry we show the number of events after each selection step along with the statistical uncertainty.We also show the efficiency after each selection as defined in eq.12.
with p lead GeV.The total number of jets that includes both tagged and untagged jets is required to be less than 3.The effect of this cut is however very minor on all the processes.Finally, we require that the invariant of the Higgs candidate to satisfy 80 GeV < m b b < 160 GeV.
In Table V we show the cutflow table in the resolved analysis for both the backgrounds (V V + X, t t + X and H + X) and the signal.For the signal we show only the example of BP1.In each selection we calculate the MC uncertainty that arise from the statistical limitation and the acceptance times the efficiency defined as where N i and N i−1 correspond to the number of events that survive the selection i and i − 1 respectively.We can see that the requirement on having at least two b-tagged jets kills about 82% of the signal events which is not in agreement with the naive expectation of ε ∝ ϵ 2 b ≈ 49% (ϵ b is the b-tagging efficiency) 3 .After all the selections we get an accumulated efficiency of about 0.9%-1.2%for the signal which is slightly dependent on the DM mass assumption while for the backgrounds we get an overall efficiency of 0.1%.We also calculate the significance using Asimov formula [102].The results of this calculation for the four benchmark points are shown in Table VI.
where z b and zb are the momentum fractions for the bottom and the anti-bottom quarks.

B. Merged regime
As was shown in the previous subsection, the SM Higgs boson produced in the mono-Higgs channel leads to unresolved hadronic decay products.Therefore, one expects that the boosted (or merged) selection would have a higher sensitivity reach.In this subsection we perform a cut-based analysis of the mono-Higgs channel using jet substructure techniques.We perform two independent search strategies inspired by the ATLAS [78] and CMS [79] boosted analyses.Therefore, we employ two different jet clustering algorithms altough using the same selection criteria (see section III C for more details): • First analysis category: We cluster jets using the anti-k t algorithm and a jet radius of R = 1 (denoted by AK10).We use a trimming algorithm that is based on the k t algorithm and removing subjets of radius R = 0.2 that carry less than 5% of the total AK10 jet energy.
• Second analysis category: We cluster jets using the Cambridge-Aachen algorithm and a jet radius of R = 1.5 (denoted by CA15).We use a soft drop algorithm to remove soft and a wide-angle radiation.
The differential distributions for the key observables are shown in Fig. 5 (for the selection based on AK10 jets) and in Fig. 6 (for the selection based on the CA15 jets).We require that events do not contain any isolated lepton (electron or muon) with p T > 7 GeV and |η| < 6.Furthermore, we veto events that contain FIG. 6. Differential cross section per bin for the four benchmark points defined in Table II and the background processes shown as stacked histograms in the merged regime for the AK10 jet category.We show the invariant mass of the leading trimmed jet mJ (left panel), the missing transverse energy E miss T (middle panel) and the transverse momentum of the leading trimmed jet p J T (right panel).one hadronically decaying τ -lepton having p T > 15 GeV and |η| < 2.5.We then require that the missing transverse energy satisfies E miss T > 300 GeV.Signal-like events are required at have at least one AK10 jet with p T > 150 GeV and |η| < 2.5 or at least one CA15 jet with p T > 150 GeV and |η| < 2.5.The fat jets that pass these requirements are either trimmed (for AK10 jets) or soft-dropped (for CA15 jets).Therefore one requires that events contain at least one trimmed jet for the ATLAS-like analysis or at least one soft-dropped jet for the CMS-like analysis.The leading trimmed or soft-dropped jet is required to have M > 20 GeV.Finally we require that the leading fat jet to have an invariant mass satisfying 70 GeV < M < 180 GeV.The last cut defines our signal region.The cutflow tables are shown in Tables VII and VIII.The acceptance times the efficiency for the signal varies in the range 34%-41% for the AK10 jets and in the range of 29%-39% for the CA15 jets.Finally, we calculate both the significance (defined in eq.14) and the purity 4 of the signal for the four benchmark points and we display the results in Table IX.We find that the signal significance for the merged regime is a factor of 6-10 larger than that in the case of the resolved regime.

V. OPTIMISATION USING BOOSTED-DECISION TREES
A. General setup An improvement of the previous results can be achieved by using Machine Learning (ML) algorithms, such as decision trees (BDTs).The BDT training is performed using the four benchmark points described in Table II which are merged in one signal sample and for the background sample we merge all the SM background processes.All the processes are weighted by their generator-level cross sections since each process, for both the signal and the background, has a different cross section.Furthermore, in the case where the MC samples for the signal contain more events than the background samples, we reweight the signal and background samples using weight computed via the compute_sample_weight as implemented in Scikit-Learn [103].The BDT algorithm was implemented using XGBoost classifier [104].The model has been trained using a feature set consisting of the following variables: • Resolved regime: • Boosted regime with AK10 jets: • Boosted regime with CA15 jets: where m T is defined as We briefly describe the event preselection criteria applied in our analysis.For the resolved regime, we follow the same selection steps as in the cut-based analysis but halt the selection process once we achieve the requirement of having exactly two b-tagged jets.No further cuts on the magnitude of the missing energy are applied, except the basic requirement of E miss T > 100 GeV.For the boosted regime, we do not impose requirements on the invariant mass of the trimmed leading AK10 jet or the soft-dropped CA15 jets.With these requirements, we ensure enough statistics for the training.We found that some of the variables used in this study are highly correlated to m bb (in the resolved regime) and to m J (in the boosted regime).To reduce these correlations of these variables we scale p b T and p bb T by m bb and scale p J T by m J .Note that we do not apply a StandardScaler() function which removes the mean and reduces the variance to unity but instead, we apply a customised scaling whose aim is only for reducing the correlations.A careful inspection of the input variables through the calculation of the feature importance is very crucial to assess which of the variables can be the best signal-to-background discriminators.This can be seen in Fig. 7 where we show the feature importance for each of the input variables in the resolved regime (left panel), merged regime with AK10 jets (middle panel) and merged regime with CA15 jets (right panel).As expected, we can see that the missing transverse energy is the most sensitive feature for the model training.The other variables depend on the regime.For the resolved regime, the transverse momentum of the Higgs boson candidate (p T,bb ) and 4 The signal purity is defined as where ns (n b ) being the number of the signal (background) events after the full selection.the azimuthal separation between the leading b-jet and the missing momentum are very important.For the merged regime, the transverse mass m T , the azimuthal separation between the leading fat jet and the missing momentum -∆ϕ( ⃗ J, ⃗ p miss ) -, and the invariant mass of the leading fat jet (m J ) are very important features.
To avoid overtraining effects, the standard procedure is to randomly split the data into two independent datasets: a training dataset and a testing dataset.Ensuring strong alignment between the trained model and the predicted testing data serves as a good indicator of the absence of overtraining effects.However, in this study, we adopt an alternative approach.A cross-validation strategy with 5 folds is employed for the training: the data is split into 5 equal parts, a BDT model is trained on each fold and applied to the remaining 4 folds, and the final BDT score is taken to be the average of the 5 BDT model outputs.The 5 BDT models used the exact hyperparameters were optimized using the grid-search technique.The optimized hyperparameters are given in Table X.To define the final BDT score binning, the BDT score (the average of 5 BDT model outputs) is scanned for maximum significance using the Asimov formula.Each BDT bin is required to have at least one background event to ensure good statistics.The result of the scan shows that the BDT score bin [0.99, 1] gives the highest significance (as expected) for the different benchmarks, thus this bin is used to define the signal region.

B. Results
In this section we discuss the results of the BDT analysis.We first start by showing the signal purity (p), background efficiency (ϵ B ), and the signal efficiency (ϵ S ) as a function of the BDT score for the four benchmark points in figure 8.We can see that the bin with the highest BDT score (> 0.99) does not only maximise the significance but also the signal purity.The number of events for the signal is found to be quite large for most of the benchmark points with the results being more important for the merged regime than in the resolved regime.The signal purity varies in the range of 40%-99% where higher numbers are reached for the benchmark points BP1 and BP2.High values of the signal purity implies unprecedented opportunities to perform post-discovery analyses to assess the nature of DM at muon colliders.We must stress that our analysis has a very high accuracy since the area under the Receiver-Operating Characteristic (ROC) curve varies in the range of 0.95-0.97.Signal significance (S) the four benchmark points using the BDT signal region.For each entry, we show the significance for L = 100 fb −1 and after the full run at L = 1000 fb −1 .The results are shown for the resolved regime with AK4 jets and for the two cases of the merged regime for AK10 jets and CA15 jets.We also calculate the signal significance for the signal using Asimov formula [102] for both L = 100 fb −1 and L = 1 ab −1 for the BDT bin > 0.99.The results are shown in Table XI.We can see that even for a luminosity of 100 fb −1 the BDT search strategy leads to quite large signal significance for BP1, BP2 and BP3 where high values are reached for the boosted regime as expected.To reach a high signal significance for BP4 (corresponding to heavy DM), the full luminosity of 1 ab −1 is required.We notice that very important improvements with respect to the results of the cut-based analysis are reached when comparing the results of Table XI with those shown in Tables VI and IX.The results are improved by about 8-50 depending on the benchmark point and the kinematic regime.For instance, BP4 receives the highest improvement especially for the resolved regime where S increases from 2.62 × 10 −2 to 1.42.Finally, we use the trained algorithm to optimise the signal-over-background ratio for DM mass in the interval defined in equation 8.In other words, no further training has been performed at this stage.To quan-tify the sensitivity reach of this analysis we calculate both the significance and the CL 95% s .The significance is calculated by assuming some uncertainties on the background yields and is defined as where δ b = x × n b is the uncertainty on the background yields which is assumed to x = 5%.Moreover we assess the sensitivity reach by computing the expected CL s [105] using Pyhf [106].The CL s estimator is given by where p b+s and p b are the signal-plus-background and the background probabilities respectively.In the calculation of the CL s we assume that the expected number of observed events is equal to the background expectations.Furthermore, we assume that the uncertainty on the background yield is 5%.The results are shown in Fig. 9 where we show the signal significance (left) and CL s (right) as a function of the DM mass for the resolved and the boosted regimes.We can see that the BDT analysis can probe DM masses up to 1 TeV where both the two statistical prespcriptions lead to similar results.We finally the boosted regime has higher sensitivity than the resolved regime as expected.

VI. CONCLUSIONS
In this work we have studied the potential discovery of DM at muon colliders in the mono-Higgs channel.This production channel is very unique in the sense that it would allow the studies of the characteristics of the interactions between the mediator and the SM Higgs sector.As a proof-of-principle we have analysed this channel for the minimal lepton portal DM model which extends the SM with two SU (2) L singlets: a charged scalar that plays the role of the mediator and a right-handed fermion that assumed to be the DM candidate of the model.After studying the characterstic of the benchmark points allowed by the various constraints, we have studied the production of DM in this channel as well as all the possible backgrounds for center-of-mass energies of 3, 10 and 30 TeV.We have found that the initial signal-to-background ratio for this channel, before any cuts, degrades very quickly with the center-of-mass energy.Therefore, we have analysed the sensitivity reach only for √ s = 3 TeV and L = 1 ab −1 .We then performed simple cut-based analysis strategies inspired by the previous ATLAS and CMS searches of DM produced in association with a Higgs boson decaying into bottom quarks.Using different jet clustering algorithms to reconstruct the Higgs boson candidates, we have found poor sensitivities for benchmark points corresponding to heavy DM masses.We have then built an algorithm based on Boosted-Decision Trees (BDT) using XGBoost library.By optimising the hyperparameters of the models and training it on both the signal and the backgrounds we have found very good improvements by factors of 8-50 with respect to the cut-based analysis.Finally, we have analysed the sensitivity reach by applying this algorithm to the range of DM masses kinematically allowed by the used center-of-mass energy, i.e.M N R ∈ [50,1435] GeV.We have found that DM masses up to 1 TeV can be excluded at the 95% CL using the BDT analysis and the mono-Higgs channel.

FIG. 4 .
FIG.4.Differential cross section per bin for the four benchmark points defined in TableIIand the background processes shown as stacked histograms in the resolved regime.From the left to the right, we show the missing transverse energy, the transverse momentum of the Higgs candidate and, the invariant mass of the Higgs candidate.More details can be found in the text.

T and p slead T
refer to the p T of the leading and the subleading b-jets respectively and ∆ϕ(x, y) ≡ |ϕ a − ϕ b |.We require that m min T > 170 GeV and m max T > 200 For the SM Higgs boson with transverse momentum in the range [500, 1000] GeV decaying democratically, i.e. z b = zb = 1/2 we have ∆R b b ≈ 0.06-0.1.

FIG. 5 .
FIG.5.Differential cross section per bin for the four benchmark points defined in TableIIand the background processes shown as stacked histograms in the merged regime for the AK10 jet category.We show the invariant mass of the leading trimmed jet mJ (left panel), the missing transverse energy E miss

FIG. 8 .
FIG.8.The background efficiency (blue), signal efficiency (red) and signal purity (green) as a function of the cut on the BDT score.Results are shown for the resolved regime (left upper panel), the merged regime with AK10 jets (right upper panel) and the merged regime with CA15 jets (lower panels).The calculations are done for BP1 (solid), BP2 (dashed), BP3 (dotted) and BP4 (dashdotted).

FIG. 9 .
FIG. 9. Signal significance (left) and CL 95% s (right) as a function of the DM mass (MN R ).The results are shown for the resolved regime (red), boosted regime with AK10 jets (green) and the boosted regime with CA15 jets (blue).The black solid and dashed lines in the left panel correspond to S = 2 and S = 5.On the right panel, the solid black line corresponds to CLs = 0.95 above which the mass value is excluded at 95% CL.

S = √ 2 (
n s + n b ) log (n s + n b )(n b + δ 2 b ) n 2 b + (n s + n b )δ 2 2, we have e = √ 4πα EM as the electric charge, θ W denoting the Weinberg

TABLE IV .
Parton-level cross sections for the background processes that we are taken into account in this study.

TABLE VI .
Signal significance for the four benchmark points in the resolved regime.

TABLE VII .
TableIIand the background processes shown as stacked histograms in the merged regime for the AK10 jet category.We show the invariant mass of the leading trimmed jet mJ (left panel), the missing transverse energy E miss Same as in Table V but for the boosted regime with AK10 jets.

TABLE IX .
Signal significance (S) and purity (p) for the four benchmark points in the boosted regime for the AK10 jets (first rows) and CA15 jets (second rows).