OS/Linux
Slurm 사용법인데 ipynb를 곁들인..
김개발^^
2022. 4. 18. 09:32
반응형
bash script로 slurm에 job 던지기
[USER@login1]$ cat submission.sh
#!/bin/bash
#SBATCH --nodes=1
srun jupyter nbconvert --to notebook --execute mynotebook.ipynb
[USER@login1]$ sbatch submission.sh
command에서 바로 명령어 날려서 실행시키기
[USER@login1]$ conda activate my_env
(my_env)[USER@login1]$ srun jupyter nbconvert --to notebook --execute mynotebook.ipynb
sinfo
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 2-00:00:00 3 idle node[1-3]
optiplex up infinite 0 n/a
sbatch
#!/bin/bash
#SBATCH -J ensemble # job name
#SBATCH -o ensemble.%j.out # standard output and error log
#SBATCH -p normal # queue name or partiton name
#SBATCH -t 70:00:00 # Run time (hh:mm:ss)
samtools view alignments.bam > alignments.sam
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --job-name="A long job"
#SBATCH --mem=5GB
#SBATCH --output=long-job.out
cd /path/where/to/start/the/job
# This may vary per HPC system. At USC's hpc system
# we use: source /usr/usc/R/default/setup.sh
module load R
Rscript --vanilla long-job-rscript.R
srun (할당이 시작되면 부여된 노드 중 하나에서 새 bash 세션이 시작됨)
salloc (새 bash 세션이 로그인 노드에서 시작됨.)
squeue
$ squeue
JOBID NAME STATE USER GROUP PARTITION NODE NODELIST CPUS TRES_PER_NODE TIME_LIMIT TIME_LEFT
6539 ensemble RUNNING dhj1 usercl TITANRTX 1 n1 4 gpu:4 3-00:00:00 1-22:57:11
6532 bash PENDING gildong usercl 2080ti 1 n2 1 gpu:8 3-00:00:00 2-03:25:06
scancel
$ scancel 6539
작업내용 구체적으로 확인
$ scontrol show job 3217
JobId=3217 JobName=ssw_test
UserId=moasys1(100001107) GroupId=in0011(1000011) MCS_label=N/A
Priority=4294901630 Nice=0 Account=kat_user QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:05 TimeLimit=01:00:00 TimeMin=N/A
SubmitTime=2018-04-30T17:54:07 EligibleTime=2018-04-30T17:54:07
StartTime=2018-04-30T17:54:07 EndTime=2018-04-30T18:54:08 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=ivy_v100_2 AllocNode:Sid=login-tesla02:9203
ReqNodeList=(null) ExcNodeList=(null)
NodeList=tesla[03-04]
BatchHost=tesla03
NumNodes=2 NumCPUs=40 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=40,node=2,gres/gpu=2
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=gpu Reservation=(null)
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=./kat-2.sh
WorkDir=/scratch2/moasys1/ssw/moasys1/kat_test
StdErr=/scratch2/moasys1/ssw/moasys1/kat_test/ssw.e3217
StdIn=/dev/null
StdOut=/scratch2/moasys1/ssw/moasys1/kat_test/ssw.o3217
Power=
작업 우선순위 조정
$ sudo scontrol update job=1465 nice=-100
참고)
https://doheejin.github.io/linux/2021/02/18/linux-slurm.html
https://www.biostars.org/p/453787/
https://support.nesi.org.nz/hc/en-gb/articles/360001316356-Slurm-Interactive-Sessions
https://dandyrilla.github.io/2017-04-11/jobsched-slurm/
반응형