Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • tapasco tapasco
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 52
    • Issues 52
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tapascotapasco
  • tapascotapasco
  • Issues
  • #57
Closed
Open
Issue created May 12, 2017 by Jens Korinth@jkDeveloper

Slurm DSE: squeue causes RuntimeException

Apparently squeue fails quite frequently with error messages:

slurm_load_jobs error: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation
...

Which leads to further warnings:

[17:30:12 <scala-execution-context-global-202: Slurm$> WARN] Slurm `squeue` failed: java.lang.RuntimeException: Nonzero exit value: 1
[17:30:12 <scala-execution-context-global-237: Slurm$> WARN] Slurm `squeue` failed: java.lang.RuntimeException: Nonzero exit value: 1
[17:30:12 <scala-execution-context-global-230: Slurm$> WARN] Slurm `squeue` failed: java.lang.RuntimeException: Nonzero exit value: 1
...

And apparently this crashes the DSE:

[17:30:12 <scala-execution-context-global-44: DesignSpaceExplorationTask> ERROR] exception: java.lang.AssertionError: assertion failed: elements length (25) does not match results length (0), stacktrace: de.tu_darmstadt.cs.esa.tapasco.dse.Exploration$Events$BatchFinished.<init>(Exploration.scala:249)
de.tu_darmstadt.cs.esa.tapasco.dse.ConcreteBatch.start(Batch.scala:33)
de.tu_darmstadt.cs.esa.tapasco.dse.ConcreteExploration.apply(Exploration.scala:144)
de.tu_darmstadt.cs.esa.tapasco.dse.ConcreteExploration.start(Exploration.scala:200)
de.tu_darmstadt.cs.esa.tapasco.task.DesignSpaceExplorationTask.job(DesignSpaceExplorationTask.scala:75)
de.tu_darmstadt.cs.esa.tapasco.task.Tasks$ProcessingRunnable.$anonfun$run$1(Tasks.scala:156)
scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
scala.concurrent.Future$.$anonfun$apply$1(Future.scala:653)
scala.util.Success.$anonfun$map$1(Try.scala:251)
scala.util.Success.map(Try.scala:209)
scala.concurrent.Future.$anonfun$map$1(Future.scala:287)
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:140)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
[17:30:13 <main: DesignSpaceExploration$> INFO] all DSE tasks have finished

Example run:

tapasco --slurm dse --composition [arrayupdate x 1] --dimensions freq,area --batchSize 25 --frequency 50 --platforms pynq
Assignee
Assign to
Time tracking