Control-M Server

Agents unavailable and need to be cleared from the server. To gather a list of mistaken agent configurations within Control-M, you may run the following step for discovery and removal. From the Control-M server logon (within Unix) execute:

ctm_agstat –list "*" |grep unavailable |sort

This will produce a list of current unavailable agents. Some of these may be due to services not started, tracking off, machine is offline, network connection lost, etc. You must first establish if the machine is still active and connected to the network. If not, you can remove the configuration. If the machine is active and connected, you must troubleshoot the unavailability.

To remove the bad configurations, it's best to shutdown Control-M during this process. In the ~/ctm/data/AGSTAT and ~/ctm/data/AGPRM directories remove all the corresponding bad agents.

To find the jobs that are setup to run on a "bad" agent, run the following from the Control-M database (example=sybase)

select JOBNAME, MEMNAME, NODEID, NODEGRP from CMS_JOBDEF where NODEID = '<bad agent name>' go NODEGRP = '<bad agent name>' can also be used in place of NODEID and you can also look in the CMS_JOBAJF too. These jobs can need to either be removed on corrected to reflect the appropriate agent.

Sysout display is slow or non-responsive in the Control-M EM GUI. This may be due to status directory building up, caused by a Control-M agent or other issue. If the ~/ctm/status directory has many files (100+) with old creation dates, this is your first clue that there maybe a problem. Immediate resolution/symptom is a reboot of the Control-M server and everything is fine for a period of time. Then slowly things progressively get slower and/or eventually non-responsive again.

Sybase issue... the orderID tracking in the db has exceeded its character limit. In Sybase this was suppose to have been corrected in 6.1.02/FixPack4. The first set of numbers of the files in the status directory represent the orderID in a multiplication factor. If these first set of numbers exceed 7 characters, there is a good chance this is the problem. A phone call to BMC support maybe required with an escalation to development.

Assuming Sybase is not the issue and there are many files in the status directory, a few things can be done to identify if it's an agent causing the problem or something else. First thing to look for is a duplication of the same set of first characters (prior to the first underscore "_") and sort the directory listing to find the oldest of these repeating files, this maybe the agent causing the problem. Execute the following from the Unix command line on the file(s):

p_36 "*<first part of file name>

The return will be the Control-M orderID in the GUI. Once you have this, go to the GUI and do a search for the orderID. Locate the agent (node/hostname) in question and review that agent's functionality. This can be done first by running a ctm_diag_comm <nodeID> and ag_diag_comm from the node. If communication looks fine, check the Agents status, procid and proclog directories. You might see old data (creation dates, depending on your retention settings), if there are older files then you expect to see, there is a good chance there an issue with the agent. Before continuing be advised, Jobs that were executing will stop running during the next steps. Shutdown the agent (this will kill any jobs that may have been running), backup the files/folders to a tmp directory and clear out the original files (this will return NOTOK to jobs they may have finished or were still running but could not update the GUI), restart the agent after rebooting Control-M server/status cleaning (see those steps below).

Shutting down the Control-M Server to clear the status directory. Do not clear out the status directory without first shutting down the Control-M server. By default when new day process begins for Control-M the status directory is flushed. There maybe a few files that remain from the previous day, but for the most part they are purged. After shutting down the Control-M server (ctm_menu, option1, etc. leave the Control-M db up.), clear the status directory, restart the Control-M server.

If the issue persists, contact BMC support.

Unavailable agents... To gather a list of mistaken agent configurations within Control-M, you may run the following step for discovery and removal. From the Control-M logon execute:

ctm_agstat –list "*" |grep unavailable |sort This will produce a list of current unavailable agents. Some of this may be due to services not started, tracking off, machine is offline, network connection lost, etc. You must first establish if the machine is still active and connected to the network. If not, you can remove the configuration. If the machine is active and connected, you must troubleshoot the unavailability. To remove the bad configurations, it's best shutdown Control-M during this process. In the ~/ctm/data/AGSTAT and ~/ctm/data/AGPRM directories remove all the corresponding bad agents.

To find the jobs that are setup to run on a "bad" agent, run the following from the Control-M sql:

select JOBNAME, MEMNAME, NODEID, NODEGRP from CMS_JOBDEF where NODEID = '<bad agent name>' go NODEGRP = '<bad agent name>' can also be used in place of NODEID and you can also look in the CMS_JOBAJF too. These jobs need to either be removed on corrected to reflect the appropriate machine name.