Changelog for
torque-mom-4.2.7-2.2.x86_64.rpm :
Fri Apr 18 14:00:00 2014 aginiesAATTsuse.com
- update to 4.2.7 release
Sun Mar 16 13:00:00 2014 scorotAATTfree.fr
- fix error in server init script introduced with init.patch
- create missing credentials directory not created by make install
Wed Mar 12 13:00:00 2014 scorotAATTfree.fr
- enable NUMA support on SLE 11 which is often used on high end
systems
Tue Mar 11 13:00:00 2014 scorotAATTfree.fr
- update to torque version 4.1.7
* Make job_starter work for parallel jobs as well as serial.
* Fix one issue with being able to submit jobs to the cray while
offline.
* Add mom parameter job_oom_score_adjust - affects the oom score
for jobs run by this mom. Positive means more likely to be
killed.
* Add mom parameter mom_oom_immunize, making the mom immune to
being killed in out of memory conditions. Default is now true.
* Don\'t count completed jobs against max_user_queuable. TRQ-1420.
* make pbs_track compatible with display_job_server_suffix = false.
The user has to set NO_SERVER_SUFFIX in the environment.
* Fix the way we monitor if a thread is active.
* TRQ-1751. Add some code to handle a corrupted job file where the
job file says it is running but there is no exec host list.
These jobs now will receive a system hold
* Cray: nppn wasn\'t being specified in reservations. Fix this.
TRQ-1660.
* TRQ-1653. Arrays depending on non-array jobs were broken.
Fix this.
* Add retries on transient failures to setuid and seteuid calls.
TRQ-1541.
* Add a timeout for mother superior when cleaning up a job.
Instead of waiting infinitely for sisters to confirm that a job
has exited, consider the job dead after 10 minutes. This time
can be adjusted by setting $job_exit_wait_time in the mom\'s
config file (time in seconds). This prevents jobs from being
stuck infinitely if a compute node crashes or if a mom daemon
becomes unresponsive. TRQ-1776.
* If privileged ports are disabled, make pbs_moms not check if
incoming connections from mother superior are on privileged
ports. TRQ-1669.
* Add two mom config parameters: max_join_job_wait_time and
resent_join_job_wait_time. TRQ-1790.
* TRQ-1709. Fix parsing of -l gpus=X,other_things parsing
incorrectly.
* TRQ-1826. mppdepth is now passed correctly to the ALPS
reservation.
* TRQ-1802. Make the environment variable $PBS_NUM_NODES
accurate for multi-req jobs.
* TRQ-1832. Add the ability to add a login_property to a job at
the queue level by setting required_login_property on the queue.
- add SuSEfirewall configuration files
- enable systemd support for 12.2 an higher
- add patch torque-4.1.5.1-init.patch
- add patch torque-4.1.7-fix-tcl-interp.patch
Sat Apr 6 14:00:00 2013 scorotAATTfree.fr
- update to version 4.1.5.1
* If the job is no long valid after attempting to lock the array
in get_jobs_array(), make sure the array is valid before
attempting to unlock it. TRQ-1598.
* Don\'t log an invalid connection message when close_conn() is
called on 65535 (PBS_LOCAL_CONNECTION). TRQ-1557.
* Don\'t strip quotes from values in scripts before specific
processing. TRQ-1632
* Fix a deadlock when submitting two large arrays consecutively,
the second depending on the first. TRQ-1646 (reported by Jorg
Blank, 4.2.1).
* Changed communication between clients and trqauthd to use
only unix domain sockets
* Fix a segfault in req_jobobit due to an off-by-one error.
TRQ-1361.
* Fix a race condition in mom hierarchy reporting. TRQ-1378.
* Fixed pbs_mom so epilogue will only run once. TRQ-1134.
* Fix some debug output escaping into job output.TRQ-1360.
* Changed momctl to do retries to get connections to make it more
robust on busy systems. TRQ-1328.
* Fix crashes due to unprotected array accesses. TRQ-1395.
* Fixed segfault in req_movejob where the job ji_qhdr was NULL.
TRQ-1416.
* Many many other bug fixes end enhancements. See CHANGELOG file
for a full list
- enable gpu support since more users have gpus and mom works
fine even if no gpus are detected
- enable cpuset support in order to get better performances
on multicores cpus
systems
Sat Nov 17 13:00:00 2012 scorotAATTfree.fr
- version 4.1.3
* fix a security loophole that potentially allowed an interactive
job to run as root due to not resetting a value when
$attempt_to_make_dir and $tmpdir are set. TRQ-1078.
* Have pbs_server save the queues each time before exiting so that
legacy formats are converted to xml after upgrading. TRQ-1120.
* Make issue_Drequest wait for the reply and have functions
continue processing immediately after instead of the added
overhead of using the threadpool.
* tm_adopt() calls caused pbs_mom to crash. Fix this. TRQ-1210.
* Modfied output for qstat -r. Expanded Req\'d Time to include
seconds and centered Elap Time over it\'s column.
* Fix mismanagement of the ji_globid. TRQ-1262.
* Setting display_job_server_suffix=false crashed with job arrays.
Fixed. bugzilla #216
* Made it so pbs_server will come up even if a job cannot recover
because of a missing job dependency. TRQ-1287
* Retry cleanup with the mom every 20 seconds for jobs that are
stuck in an exiting state. TRQ-1299.
* Fix a double free if the same chan is stored on two tasks for a
job. TRQ-1299.
* Many bug fixes. See CHANGELOG file for a full list.
Thu Oct 25 14:00:00 2012 scorotAATTfree.fr
- Add libtorque2 in the devel package requirements
Wed Oct 24 14:00:00 2012 scorotAATTfree.fr
- Update to 4.1.2
* Changelog to long. See CHANGELOG file included with this
package in /usr/share/doc/packages/torque/CHANGELOG
- spec file reformating
Fri Nov 11 13:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.9
* A new torque.cfg option as added named TRQ_IFNAME. This
option allows the administrator to select the outbound tcp
interface by interface name for qsub commands.
* Noteable bug fixes:
. Added function DIS_tcp_close which frees buffer memory used
for sending and receiving tcp data. This reduces the running
memory size of TORQUE.
. Fix for a server seg-fault when using the record_job_info.
. Fix for afteranyarray and afterokarry where dependent jobs
would not run after the dependent array requirements were
satisfied.
. Fix to delete .AR array files from the
$TORQUE_HOME/server_priv/arrays directory.
. Fix to recover previous state of job arrays between restarts
of pbs_server
. Fix to prevent the server from hanging when moving jobs from
one server to another server
. Fix to stop a segfault if using munge and the munge daemon
was not running
. Security fix to munge authorization to prevent users from
gaining access to TORQUE when munge was not running.
. Fix to allow pam_pbssimpleauth to work properly.
* To see a compelete list of changes please see the CHANGELOG.
Sun Sep 4 14:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.8
* Several bugs fixes.
Thu Jun 30 14:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.7
* Added new qsub argument -F. This argument takes a quoted string
as an argument. The string is a list of space separated
commandline arguments which are available to the job script.
* Added an option to asynchronously delete jobs (currently cannot
work for qdel -a all due to limitations of single threads)
* Several bug fixes.
Wed Jun 8 14:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.6
* Added new symbol JOB_EXEC_OVERLIMIT. When a job exceeds a limit
(i.e. walltime) the job will fail with the JOB_EXEC_OVERLIMIT
value and also produce an abort case for mailing purposes.
Previous to this change a job exceeding a limit returned 0 on
success and no mail was sent to the user if requested on abort.
* Added a new queue resource named procct. procct allows the
administrator to set queue limits based on the number of total
processors requested in a job.
* Allow more than 5 concurrent connections to TORQUE using
pbsD_connect. Increase it to 10.
* Allow an administator using the proxy user submission to also
set the job id to be used in TORQUE. This makes TORQUE easier
to use in grid configurations.
* Added the ability to detect Nvidia gpus using nvidia-smi
(default) or NVML. (Not enabled in in this RPM.)
* The -e and -o options of qsub allow a user to specify a path
or optionally a filename for output. If the path given by the
user ended with a directory name but no \'/\' character at the
end then TORQUE was confused and would not convert the .OU or
.ER file to the final output/error file. The code has now been
changed to stat the path to see if the end path element is a
path or directory and handled appropriately.
* Added new MOM configuration option $rpp_throttle. The syntax
for this in the $TORQUE_HOME/mom_priv/config file is
$rpp_throttle
where value is a long representing
microseconds. Setting this values causes rpp data to pause
after every sendto for microseconds. This may help
with large jobs where full data does not arrive at sister
nodes.
* Several bug fixes
Wed Jun 8 14:00:00 2011 burnusAATTnet-b.de
- Fix spec
Fri Mar 4 13:00:00 2011 burnusAATTnet-b.de
- Fix spec
Fri Mar 4 13:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.4
* Added submit_host and init_work_dir as job attributes,
displayed by qstat -f.
* If a host in the nodes file cannot be resolved at startup the
server will try once every 5 minutes until the node will
resolve and it will add it to the nodes list.
* Add code to verify the group list as well when VALIDATEGROUPS
is set in torque.cfg
* Several bug fixes.
- Remove obsoleted patch
Thu Jan 6 13:00:00 2011 burnusAATTnet-b.de
- Fix syntax in pbs_server init script.
Thu Jan 6 13:00:00 2011 burnusAATTnet-b.de
- RPMlint fixes in the .spec file
Thu Jan 6 13:00:00 2011 burnusAATTnet-b.de
- Update to 2.5.4
* Added the ability to track gpus. Users set gpus=X in the
nodes file for relevant node, and then request gpus in the
nodes request: -l nodes=X[:ppn=Y][:gpus=Z].
* Fix potential buffer overrun in pbs_sched (Bugzilla #98).
* Check if a process still exists before killing it and sleeping.
This speeds up the time for killing a task exponentially.
Tue Nov 9 13:00:00 2010 burnusAATTnet-b.de
- Update to 2.5.3
* Add the variables PBS_NUM_NODES and PBS_NUM_PPN to
the job environment.
* Security bug on the way checkpoint is being handled (Bug 84).
* Change so checkpoint files are transfered as the user,
not as root.
* Created the ability to log all jobs to a file.
* qpeek now has the options --ssh, --rsh, --spool, --host,
- o, and -e.
* Added the server parameters job_log_file_max_size,
job_log_file_roll_depth and job_log_keep_days to help
manage job log files.
* Serverdb is optionally in XML format.
(Not enabled in this build.)
* Added support for munge authentication.
(Not enabled in this build.)
Wed Sep 8 14:00:00 2010 burnusAATTnet-b.de
- Update to 2.5.2
* Allow the nodes file to use the syntax node[0-100] for
nodes node0, node1, ..., node100 and node[000-100] for
node000, node001, ... node100.
* Allow input of walltime in the format of [DD]:HH:MM:SS
* Several bug fixes.
- Remove unlink patch for bug 61, which is included in 2.5.1.
Mon Aug 16 14:00:00 2010 burnusAATTnet-b.de
- Security fix: Use proper effective UID/GUI when unlinking files
(clusterresources.org bug #61)
Thu Jul 22 14:00:00 2010 burnusAATTnet-b.de
- Update to 2.5.1
* Improved job arrays (not backward compatible, drain all job
arrays before upgrading). Includes: slot limits, job
dependencies based on entire and arrays and on individual jobs.
* Improved wildcard support for queue and server parameters.
* New server config option alias_server_name to be able to handle
alias ip addresses.
* Enabled TORQUE to be able to parse the -l procs=x node spec.
* Created permission checking of submitted jobs
* Added new qmgr server attributes (clone_batch_size,
clone_batch_delay, checkpoint_defaults, job_start_timeout).
* Allow users to delete a range of jobs from the job array
(qdel -t).
* Added a slot limit to the job arrays - this restricts the
number of jobs that can concurrently run from one job array.
* By default show only a single entry in qstat output for the
whole array.
* Changed array names from jobid-index to jobid[index] for
consistency
* Added server parameter job_force_cancel_time.
* Expand acl host checking to allow
* in the middle of
hostnames, not just at the beginning. Also allow ranges like
a[10-15] to mean a10, a11, ..., a15.
Fri Jun 25 14:00:00 2010 burnusAATTnet-b.de
- Update to 2.4.8
* added QSUBSENDGROUPLIST to qsub. This allows the server to know the
correct group name when disable_server_id_check is set to true and
the user doesn\'t exist on the server.
* mapped \'qsub -P user:group\' to qsub -P user -W group_list=group
* smaller additions and bug fixes
Mon Apr 12 14:00:00 2010 burnusAATTnet-b.de
- Update to 2.4.7
* Added -P to qsub for running as root.
* Asynchronous option -a for qsig.
* qsub\'s -W can now parse attributes with quoted lists, for example:
qsub script -W attr=\"foo,foo1,foo2,foo3\" will set foo,foo1,foo2,foo3
as attr\'s value.
* added two server parameters: display_job_server_suffix and job_suffix_alias.
The first defaults to true and is whether or not jobs should be appended
by .server_name. The second defaults to NULL, but if it is defined it
will be appended at the end of the jobid, i.e. jobid.job_suffix_alias.
* added -l option to qstat so that it will display a server name and an
alias if both are used. If these aren\'t used, -l has no effect.
Wed Feb 17 13:00:00 2010 burnusAATTnet-b.de
- Update to 2.4.4:
* Added qmgr server attribute job_start_timeout, specifies timeout to be
used for sending job to mom. If not set, tcp_timeout is used.
* Bug fixes.
Fri Jan 22 13:00:00 2010 burnusAATTnet-b.de
- Revert: Build torque.rpm as noarch as it does seem to affect
all RPMs
- Update to 2.4.3:
* Added logging for email events
* Bug fixes
Tue Jan 19 13:00:00 2010 burnusAATTnet-b.de
- Mark torque.rpm as noarch.
Mon Dec 14 13:00:00 2009 burnusAATTnet-b.de
- Update to 2.4.3
* Bug fixes, especially for \"torque 2.4.X breaks OSC\'s mpiexec\"
Wed Nov 25 13:00:00 2009 burnusAATTnet-b.de
- Update to 2.4.2
* pbs_mom -p is now the default option, -q can be used for the
previous behavior
* add qchkpt command
* add RERUNNABLEBYDEFAULT parameter to torque.cfg
* new boolean queue attribute \"is_transit\" that allows jobs to
exceede server resource limits (queue limits are respected)
* allow the user to request a specific processor geometry for
their job using a bitmap, and then bind their jobs to those
processors using cpusets.
* add administrator customizable email notifications (see
manpage for pbs_server_attributes)
* new fifo scheduler config option. ignore_queue: queue_name
allows the scheduler to be instructed to ignore up to 16
queues on the server
* change so queued jobs that get deleted go to complete and get
displayed in qstat based on keep_completed
* changed TORQUE_MAXCONNECTTIMEOUT to be a global variable that
is nowchanged by the MOM to be smaller than the pbs_server and
is also configurable on the MOM ($max_conn_timeout_micro_sec)
* added new parameter \"log_keep_days\" to both pbs_server and
pbs_mom
* added qmgr option accounting_keep_days, specifies how long to
keep accounting files.
* added a \"-w\" option to qsub to override the working directory
* added a prologue and epilogue option to the list of resources
for qsub -l which allows a per job prologue or epilogue script.
The syntax for the new option is
qsub -l prologue= epilogue=
* Added a new server parameter np_default. This allows the
administrator to change the number of processors to a unified
value dynamically for the entire cluster.
- rpmlint fixes