OS/Linux

Slurm 사용법인데 ipynb를 곁들인..

김개발^^ 2022. 4. 18. 09:32
반응형

bash script로 slurm에 job 던지기

[USER@login1]$ cat submission.sh
#!/bin/bash
#SBATCH --nodes=1
srun jupyter nbconvert --to notebook --execute mynotebook.ipynb  

[USER@login1]$ sbatch submission.sh

 

command에서 바로 명령어 날려서 실행시키기

[USER@login1]$ conda activate my_env
(my_env)[USER@login1]$ srun jupyter nbconvert --to notebook --execute mynotebook.ipynb

 

 

sinfo

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*      up 2-00:00:00      3   idle node[1-3]
optiplex     up   infinite      0    n/a

sbatch

#!/bin/bash

#SBATCH -J ensemble          # job name
#SBATCH -o ensemble.%j.out   # standard output and error log
#SBATCH -p normal            # queue  name  or  partiton name
#SBATCH -t 70:00:00          # Run time (hh:mm:ss) 

samtools view alignments.bam > alignments.sam
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --job-name="A long job"
#SBATCH --mem=5GB
#SBATCH --output=long-job.out
cd /path/where/to/start/the/job

# This may vary per HPC system. At USC's hpc system
# we use: source /usr/usc/R/default/setup.sh
module load R

Rscript --vanilla long-job-rscript.R

 

srun (할당이 시작되면 부여된 노드 중 하나에서 새 bash 세션이 시작됨)

 

salloc (새  bash 세션이 로그인 노드에서 시작됨.)

 

squeue

$ squeue

JOBID  NAME          STATE     USER     GROUP    PARTITION       NODE NODELIST CPUS TRES_PER_NODE TIME_LIMIT  TIME_LEFT  
6539   ensemble      RUNNING   dhj1     usercl   TITANRTX        1    n1       4    gpu:4         3-00:00:00  1-22:57:11 
6532   bash          PENDING   gildong  usercl   2080ti          1    n2       1    gpu:8         3-00:00:00  2-03:25:06

scancel

$ scancel 6539

작업내용 구체적으로 확인

$ scontrol show job 3217

 JobId=3217 JobName=ssw_test

   UserId=moasys1(100001107) GroupId=in0011(1000011) MCS_label=N/A

   Priority=4294901630 Nice=0 Account=kat_user QOS=normal

   JobState=RUNNING Reason=None Dependency=(null)

   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

   RunTime=00:00:05 TimeLimit=01:00:00 TimeMin=N/A

   SubmitTime=2018-04-30T17:54:07 EligibleTime=2018-04-30T17:54:07

   StartTime=2018-04-30T17:54:07 EndTime=2018-04-30T18:54:08 Deadline=N/A

   PreemptTime=None SuspendTime=None SecsPreSuspend=0

   Partition=ivy_v100_2 AllocNode:Sid=login-tesla02:9203

   ReqNodeList=(null) ExcNodeList=(null)

   NodeList=tesla[03-04]

   BatchHost=tesla03

   NumNodes=2 NumCPUs=40 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

   TRES=cpu=40,node=2,gres/gpu=2

   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*

   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0

   Features=(null) DelayBoot=00:00:00

   Gres=gpu Reservation=(null)

   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)

   Command=./kat-2.sh

   WorkDir=/scratch2/moasys1/ssw/moasys1/kat_test

   StdErr=/scratch2/moasys1/ssw/moasys1/kat_test/ssw.e3217

   StdIn=/dev/null

   StdOut=/scratch2/moasys1/ssw/moasys1/kat_test/ssw.o3217

   Power=

 

작업 우선순위 조정

$ sudo scontrol update job=1465 nice=-100

 

참고)

https://doheejin.github.io/linux/2021/02/18/linux-slurm.html

https://wycho.tistory.com/63

https://www.biostars.org/p/453787/

https://support.nesi.org.nz/hc/en-gb/articles/360001316356-Slurm-Interactive-Sessions

https://dandyrilla.github.io/2017-04-11/jobsched-slurm/

 

반응형