Changelog for
torque-devel-5.1.3-1.55.x86_64.rpm :
* Wed Feb 01 2017 scorotAATTfree.fr- Version 5.1.3
* For a complete list of changes see torqueReleaseNotes5.1.3.pdf on http://docs.adaptivecomputing.com
* Sun Nov 29 2015 scorotAATTfree.fr- Version 5.1.2
* TRQ-3239. Track signal sending to job pids to eliminate repeat sending.
* TRQ-3245. Enable reporter mom to correctly handle UNKNOWN role.
* TRQ-2788. Allow jobs submitted on hold to be moved through a routing queue.
* TRQ-3242. Fix problem where resource string argument to prologue script getting garbled.
* TRQ-3232 Start threadpool at pbs_mom start.
* TRQ-2675. Fix small errors in suse init.d scripts.
* TRQ-3235. Fix problem when path to error, output or execution environment contains one or more spaces.
* TRQ-3098. Add the ability to set a parameter = exit_code_canceled_job to force all canceled jobs to have the same exit code regardless of the state they were in when they were canceled.
* TRQ-3185. Create subdirs when server attribute use_job_subdirs set.
* TRQ-2836. Make node health check run on sister nodes when configured for job start and job end as well.
* TRQ-2843. Add the qmgr setting dont_write_nodes_file to make it so that nodes cannot be edited dynamically
* TRQ-2897. Add the ability to adopt running processes into a job with pbs_track.
* TRQ-3189. Never delete a running job because of a dependency.
* Sun Nov 01 2015 scorotAATTfree.fr- version 5.1.1.2
* TRQ-3197. Add support for RHEL7 and SLES12.
* TRQ-2947. Fix a race condition on deleting jobs which are failing to start.
* TRQ-3068. Fix a race condition where a job may be deleted but have it\'s pointer may still be in the alljobs container.
* TRQ-2753. Fix a memory leak in generating the authoritative okclients list.
* TRQ-2332. Fix a job dependency problem when the failover server comes up. This only affects users running high availability.
* TRQ-3023. Fix a bug when ALPS incorrectly returns a permanent confirmation failure.
* TRQ-2833. Set CUDA_VISIBLE_DEVICES to only the indices for this host when it will be set.
* TRQ-3039. Fix a deadlock when deleting a job where other jobs have after any dependencies on the first job.
* TRQ-2782. Distribute job files into subdirectories when server attribute use_jobs_subdirs set to true. Default is false (do not distribute job files).
* TRQ-3116. Make qsub only retry on transient errors.
* TRQ-3122. Fix a problem with login_property not working correctly (cray only).
* TRQ-3114. Fix an issue where an asynchronously started job is stuck with a substate of starting after a failed job start.
* TRQ-3110. Handle slot limits correctly when jobs are preempted.
* TRQ-3095. Add the server setting disable_automatic_requeue to stop jobs from being requeued if they experience a transient failure on the mom.
* TRQ-2307. Fix probelms where mom restarts intermittently fail.
* TRQ-2946. Make qmgr able to handle Cray numeric node ids.
* TRQ-2790. Make offlining cray compute nodes persist across restarts.
* TRQ-3104. Add millisecond precision to the Torque log file
* TRQ-2881. Add node health check error messages to a node\'s notes and therefore pbsnodes output.
* TRQ-3166. Add another safety check before killing stray jobs.- Build requires boost-devel- changes in sytemd
* Use provided service files for systemd
* drop pbs_mom.service pbs_sched.service pbs_server.service and trqauthd.service files- remove rpath from /usr/bin/pbs-config to silent rpmlint
* Mon Oct 26 2015 scorotAATTfree.fr- fix build conditional for NUMA on SLE 12- remove conditional for old suse version
* Thu Aug 06 2015 scorotAATTfree.fr- BuildRequires sendmail which fixes mail submission from torque server
* Tue Aug 04 2015 scorotAATTfree.fr- update to torque 4.2.10
* Mainly bug fixes. See CHANGELOG- Update url
* Sun Nov 02 2014 scorotAATTfree.fr- add missing rclink- add systemd requirements
* Sat Nov 01 2014 scorotAATTfree.fr- update to version 4.2.9
* Many changes and bug fixes. See CHANGELOG- fix systemd support according to rpmlint warnings
* Sat Mar 15 2014 scorotAATTfree.fr- fix error in server init script introduced with init.patch- create missing credentials directory not created by make install
* Wed Mar 12 2014 scorotAATTfree.fr- enable NUMA support on SLE 11 which is often used on high end systems
* Tue Mar 11 2014 scorotAATTfree.fr- update to torque version 4.1.7
* Make job_starter work for parallel jobs as well as serial.
* Fix one issue with being able to submit jobs to the cray while offline.
* Add mom parameter job_oom_score_adjust - affects the oom score for jobs run by this mom. Positive means more likely to be killed.
* Add mom parameter mom_oom_immunize, making the mom immune to being killed in out of memory conditions. Default is now true.
* Don\'t count completed jobs against max_user_queuable. TRQ-1420.
* make pbs_track compatible with display_job_server_suffix = false. The user has to set NO_SERVER_SUFFIX in the environment.
* Fix the way we monitor if a thread is active.
* TRQ-1751. Add some code to handle a corrupted job file where the job file says it is running but there is no exec host list. These jobs now will receive a system hold
* Cray: nppn wasn\'t being specified in reservations. Fix this. TRQ-1660.
* TRQ-1653. Arrays depending on non-array jobs were broken. Fix this.
* Add retries on transient failures to setuid and seteuid calls. TRQ-1541.
* Add a timeout for mother superior when cleaning up a job. Instead of waiting infinitely for sisters to confirm that a job has exited, consider the job dead after 10 minutes. This time can be adjusted by setting $job_exit_wait_time in the mom\'s config file (time in seconds). This prevents jobs from being stuck infinitely if a compute node crashes or if a mom daemon becomes unresponsive. TRQ-1776.
* If privileged ports are disabled, make pbs_moms not check if incoming connections from mother superior are on privileged ports. TRQ-1669.
* Add two mom config parameters: max_join_job_wait_time and resent_join_job_wait_time. TRQ-1790.
* TRQ-1709. Fix parsing of -l gpus=X,other_things parsing incorrectly.
* TRQ-1826. mppdepth is now passed correctly to the ALPS reservation.
* TRQ-1802. Make the environment variable $PBS_NUM_NODES accurate for multi-req jobs.
* TRQ-1832. Add the ability to add a login_property to a job at the queue level by setting required_login_property on the queue.- add SuSEfirewall configuration files- enable systemd support for 12.2 an higher- add patch torque-4.1.5.1-init.patch- add patch torque-4.1.7-fix-tcl-interp.patch
* Sat Apr 06 2013 scorotAATTfree.fr- update to version 4.1.5.1
* If the job is no long valid after attempting to lock the array in get_jobs_array(), make sure the array is valid before attempting to unlock it. TRQ-1598.
* Don\'t log an invalid connection message when close_conn() is called on 65535 (PBS_LOCAL_CONNECTION). TRQ-1557.
* Don\'t strip quotes from values in scripts before specific processing. TRQ-1632
* Fix a deadlock when submitting two large arrays consecutively, the second depending on the first. TRQ-1646 (reported by Jorg Blank, 4.2.1).
* Changed communication between clients and trqauthd to use only unix domain sockets
* Fix a segfault in req_jobobit due to an off-by-one error. TRQ-1361.
* Fix a race condition in mom hierarchy reporting. TRQ-1378.
* Fixed pbs_mom so epilogue will only run once. TRQ-1134.
* Fix some debug output escaping into job output.TRQ-1360.
* Changed momctl to do retries to get connections to make it more robust on busy systems. TRQ-1328.
* Fix crashes due to unprotected array accesses. TRQ-1395.
* Fixed segfault in req_movejob where the job ji_qhdr was NULL. TRQ-1416.
* Many many other bug fixes end enhancements. See CHANGELOG file for a full list- enable gpu support since more users have gpus and mom works fine even if no gpus are detected- enable cpuset support in order to get better performances on multicores cpus systems
* Sat Nov 17 2012 scorotAATTfree.fr- version 4.1.3
* fix a security loophole that potentially allowed an interactive job to run as root due to not resetting a value when $attempt_to_make_dir and $tmpdir are set. TRQ-1078.
* Have pbs_server save the queues each time before exiting so that legacy formats are converted to xml after upgrading. TRQ-1120.
* Make issue_Drequest wait for the reply and have functions continue processing immediately after instead of the added overhead of using the threadpool.
* tm_adopt() calls caused pbs_mom to crash. Fix this. TRQ-1210.
* Modfied output for qstat -r. Expanded Req\'d Time to include seconds and centered Elap Time over it\'s column.
* Fix mismanagement of the ji_globid. TRQ-1262.
* Setting display_job_server_suffix=false crashed with job arrays. Fixed. bugzilla #216
* Made it so pbs_server will come up even if a job cannot recover because of a missing job dependency. TRQ-1287
* Retry cleanup with the mom every 20 seconds for jobs that are stuck in an exiting state. TRQ-1299.
* Fix a double free if the same chan is stored on two tasks for a job. TRQ-1299.
* Many bug fixes. See CHANGELOG file for a full list.
* Thu Oct 25 2012 scorotAATTfree.fr- Add libtorque2 in the devel package requirements
* Wed Oct 24 2012 scorotAATTfree.fr- Update to 4.1.2
* Changelog to long. See CHANGELOG file included with this package in /usr/share/doc/packages/torque/CHANGELOG- spec file reformating
* Fri Nov 11 2011 burnusAATTnet-b.de- Update to 2.5.9
* A new torque.cfg option as added named TRQ_IFNAME. This option allows the administrator to select the outbound tcp interface by interface name for qsub commands.
* Noteable bug fixes: . Added function DIS_tcp_close which frees buffer memory used for sending and receiving tcp data. This reduces the running memory size of TORQUE. . Fix for a server seg-fault when using the record_job_info. . Fix for afteranyarray and afterokarry where dependent jobs would not run after the dependent array requirements were satisfied. . Fix to delete .AR array files from the $TORQUE_HOME/server_priv/arrays directory. . Fix to recover previous state of job arrays between restarts of pbs_server . Fix to prevent the server from hanging when moving jobs from one server to another server . Fix to stop a segfault if using munge and the munge daemon was not running . Security fix to munge authorization to prevent users from gaining access to TORQUE when munge was not running. . Fix to allow pam_pbssimpleauth to work properly.
* To see a compelete list of changes please see the CHANGELOG.
* Sun Sep 04 2011 burnusAATTnet-b.de- Update to 2.5.8
* Several bugs fixes.
* Thu Jun 30 2011 burnusAATTnet-b.de- Update to 2.5.7
* Added new qsub argument -F. This argument takes a quoted string as an argument. The string is a list of space separated commandline arguments which are available to the job script.
* Added an option to asynchronously delete jobs (currently cannot work for qdel -a all due to limitations of single threads)
* Several bug fixes.
* Wed Jun 08 2011 burnusAATTnet-b.de- Update to 2.5.6
* Added new symbol JOB_EXEC_OVERLIMIT. When a job exceeds a limit (i.e. walltime) the job will fail with the JOB_EXEC_OVERLIMIT value and also produce an abort case for mailing purposes. Previous to this change a job exceeding a limit returned 0 on success and no mail was sent to the user if requested on abort.
* Added a new queue resource named procct. procct allows the administrator to set queue limits based on the number of total processors requested in a job.
* Allow more than 5 concurrent connections to TORQUE using pbsD_connect. Increase it to 10.
* Allow an administator using the proxy user submission to also set the job id to be used in TORQUE. This makes TORQUE easier to use in grid configurations.
* Added the ability to detect Nvidia gpus using nvidia-smi (default) or NVML. (Not enabled in in this RPM.)
* The -e and -o options of qsub allow a user to specify a path or optionally a filename for output. If the path given by the user ended with a directory name but no \'/\' character at the end then TORQUE was confused and would not convert the .OU or .ER file to the final output/error file. The code has now been changed to stat the path to see if the end path element is a path or directory and handled appropriately.
* Added new MOM configuration option $rpp_throttle. The syntax for this in the $TORQUE_HOME/mom_priv/config file is $rpp_throttle
where value is a long representing microseconds. Setting this values causes rpp data to pause after every sendto for microseconds. This may help with large jobs where full data does not arrive at sister nodes.
* Several bug fixes
* Wed Jun 08 2011 burnusAATTnet-b.de- Fix spec
* Fri Mar 04 2011 burnusAATTnet-b.de- Fix spec
* Fri Mar 04 2011 burnusAATTnet-b.de- Update to 2.5.4
* Added submit_host and init_work_dir as job attributes, displayed by qstat -f.
* If a host in the nodes file cannot be resolved at startup the server will try once every 5 minutes until the node will resolve and it will add it to the nodes list.
* Add code to verify the group list as well when VALIDATEGROUPS is set in torque.cfg
* Several bug fixes.- Remove obsoleted patch
* Thu Jan 06 2011 burnusAATTnet-b.de- Fix syntax in pbs_server init script.
* Thu Jan 06 2011 burnusAATTnet-b.de- RPMlint fixes in the .spec file
* Thu Jan 06 2011 burnusAATTnet-b.de- Update to 2.5.4
* Added the ability to track gpus. Users set gpus=X in the nodes file for relevant node, and then request gpus in the nodes request: -l nodes=X[:ppn=Y][:gpus=Z].
* Fix potential buffer overrun in pbs_sched (Bugzilla #98).
* Check if a process still exists before killing it and sleeping. This speeds up the time for killing a task exponentially.
* Tue Nov 09 2010 burnusAATTnet-b.de- Update to 2.5.3
* Add the variables PBS_NUM_NODES and PBS_NUM_PPN to the job environment.
* Security bug on the way checkpoint is being handled (Bug 84).
* Change so checkpoint files are transfered as the user, not as root.
* Created the ability to log all jobs to a file.
* qpeek now has the options --ssh, --rsh, --spool, --host, - o, and -e.
* Added the server parameters job_log_file_max_size, job_log_file_roll_depth and job_log_keep_days to help manage job log files.
* Serverdb is optionally in XML format. (Not enabled in this build.)
* Added support for munge authentication. (Not enabled in this build.)
* Wed Sep 08 2010 burnusAATTnet-b.de- Update to 2.5.2
* Allow the nodes file to use the syntax node[0-100] for nodes node0, node1, ..., node100 and node[000-100] for node000, node001, ... node100.
* Allow input of walltime in the format of [DD]:HH:MM:SS
* Several bug fixes.- Remove unlink patch for bug 61, which is included in 2.5.1.
* Mon Aug 16 2010 burnusAATTnet-b.de- Security fix: Use proper effective UID/GUI when unlinking files (clusterresources.org bug #61)
* Thu Jul 22 2010 burnusAATTnet-b.de- Update to 2.5.1
* Improved job arrays (not backward compatible, drain all job arrays before upgrading). Includes: slot limits, job dependencies based on entire and arrays and on individual jobs.
* Improved wildcard support for queue and server parameters.
* New server config option alias_server_name to be able to handle alias ip addresses.
* Enabled TORQUE to be able to parse the -l procs=x node spec.
* Created permission checking of submitted jobs
* Added new qmgr server attributes (clone_batch_size, clone_batch_delay, checkpoint_defaults, job_start_timeout).
* Allow users to delete a range of jobs from the job array (qdel -t).
* Added a slot limit to the job arrays - this restricts the number of jobs that can concurrently run from one job array.
* By default show only a single entry in qstat output for the whole array.
* Changed array names from jobid-index to jobid[index] for consistency
* Added server parameter job_force_cancel_time.
* Expand acl host checking to allow
* in the middle of hostnames, not just at the beginning. Also allow ranges like a[10-15] to mean a10, a11, ..., a15.
* Fri Jun 25 2010 burnusAATTnet-b.de- Update to 2.4.8
* added QSUBSENDGROUPLIST to qsub. This allows the server to know the correct group name when disable_server_id_check is set to true and the user doesn\'t exist on the server.
* mapped \'qsub -P user:group\' to qsub -P user -W group_list=group
* smaller additions and bug fixes
* Mon Apr 12 2010 burnusAATTnet-b.de- Update to 2.4.7
* Added -P to qsub for running as root.
* Asynchronous option -a for qsig.
* qsub\'s -W can now parse attributes with quoted lists, for example: qsub script -W attr=\"foo,foo1,foo2,foo3\" will set foo,foo1,foo2,foo3 as attr\'s value.
* added two server parameters: display_job_server_suffix and job_suffix_alias. The first defaults to true and is whether or not jobs should be appended by .server_name. The second defaults to NULL, but if it is defined it will be appended at the end of the jobid, i.e. jobid.job_suffix_alias.
* added -l option to qstat so that it will display a server name and an alias if both are used. If these aren\'t used, -l has no effect.
* Wed Feb 17 2010 burnusAATTnet-b.de- Update to 2.4.4:
* Added qmgr server attribute job_start_timeout, specifies timeout to be used for sending job to mom. If not set, tcp_timeout is used.
* Bug fixes.
* Fri Jan 22 2010 burnusAATTnet-b.de- Revert: Build torque.rpm as noarch as it does seem to affect all RPMs- Update to 2.4.3:
* Added logging for email events
* Bug fixes
* Tue Jan 19 2010 burnusAATTnet-b.de- Mark torque.rpm as noarch.
* Mon Dec 14 2009 burnusAATTnet-b.de- Update to 2.4.3
* Bug fixes, especially for \"torque 2.4.X breaks OSC\'s mpiexec\"
* Wed Nov 25 2009 burnusAATTnet-b.de- Update to 2.4.2
* pbs_mom -p is now the default option, -q can be used for the previous behavior
* add qchkpt command
* add RERUNNABLEBYDEFAULT parameter to torque.cfg
* new boolean queue attribute \"is_transit\" that allows jobs to exceede server resource limits (queue limits are respected)
* allow the user to request a specific processor geometry for their job using a bitmap, and then bind their jobs to those processors using cpusets.
* add administrator customizable email notifications (see manpage for pbs_server_attributes)
* new fifo scheduler config option. ignore_queue: queue_name allows the scheduler to be instructed to ignore up to 16 queues on the server
* change so queued jobs that get deleted go to complete and get displayed in qstat based on keep_completed
* changed TORQUE_MAXCONNECTTIMEOUT to be a global variable that is nowchanged by the MOM to be smaller than the pbs_server and is also configurable on the MOM ($max_conn_timeout_micro_sec)
* added new parameter \"log_keep_days\" to both pbs_server and pbs_mom
* added qmgr option accounting_keep_days, specifies how long to keep accounting files.
* added a \"-w\" option to qsub to override the working directory
* added a prologue and epilogue option to the list of resources for qsub -l which allows a per job prologue or epilogue script. The syntax for the new option is qsub -l prologue= epilogue=
* Added a new server parameter np_default. This allows the administrator to change the number of processors to a unified value dynamically for the entire cluster.- rpmlint fixes