Josh 2004 Happy

IBM Director

Sometimes IBM Director seems to lose the navitation pane, tab bar, etc so I only see the current "window". There is no way "back". There is no logout option, because it's part of the navigation pane. If I just go to logout.do, then it says "Cross Site Forgery", aka "SCREW OFF! WE NEVER MAKE MISTAKES!"

I had to log in with a different browser to see what the logout link is, then go back to the broken one, and find a link with the SS variable. This is the session key. Then, replace the front part with /ibm/console/logout.do. *sigh*
Josh 2004 Happy

p260 Flex Node duplicate automatic MAC addresses - Revised

I ran into an issue that might be procedural, but I though you guys might want to know anyway.

We are pursuing with IBM HW support as of 2013-07-18.
I am going to test further in my lab. I suspect this may be related to
bkprofata and rstprofdata copying over some internal seed for MAC addresses

I plan to try this in my lab on p5 rackmount servers via an HMC.
If I can reproduce it there, then I expect support's response to be "don't do that".
As such, I also will try a factory reset to see if that will clear the condition.

If I cannot reproduce it there, then it's either SDMC/FSM related (which is going away),
or it's blade/Flex Node related (No other test resources, but maybe L3 can help).

If L3 decides that rstprofdata cannot be used on a different system,
then I would want them to A) Limit the command to that functionality,
and B) Update documentation for both commands to reflect this.

### BEGIN NOTICE ###
bkprofdata & rstprofdata were used to clone the LPAR layout from one blade to another.
To reset the WWNs, I was able to delete and re-add the virtual fibre adapters.
New LPARs and new virtual fibre adapters automatically get WWNs with the blade/node number as part of the WWN.
This part works as I would expect.

To reset the MAC addresses, this did not work.
Delete and re-add virtual ethernet adapters does not change the MAC addresses.
Adding a new adapter that did not exist before to the same slot number,
on the same LPAR ID, on two different Flex nodes, and both get the same MAC accress.

Current resolution is to override the MAC address with a user specified value in the LPAR profile.
This can be done from Profile -> Virtual -> Ethernet -> Advanced -> checkbox

Change from commandline:
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \
'name=DefaultProfile,lpar_id=4,"virtual_eth_adapters=""2/0/1//0/1/ETHERNET0/DEADBEEF0402/all/none"""'


To remove and Readd:
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \
'name=DefaultProfile,lpar_id=4,"virtual_eth_adapters-=""2/0/1//0/1/ETHERNET0//all/none"""'
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \
'name=DefaultProfile,lpar_id=4,"virtual_eth_adapters+=""2/0/1//0/1/ETHERNET0/DEADBEEF0402/all/none"""'


NOTES:
I've never seen this happen on any other POWER series servers, and I've built a lot of p7 systems, ranging from p710 to p780, including matching LPARs between CECs. This is on top of the whole slew of LPARable systems I've built and/or supported.

I looked into the profile data backup files themselves, and there is no mention of system serial, system name, WWN prefix, or MAC prefix.

I restored mode 3 of the profile data backups prior to any config work, and when adding new virtual NICs to LPARs, the MAC addresses still mirror eachother.

I plan to test this with two p505 systems on an HMC to see if similar issues occur.

I don't have the resources to test this on blades, or on another SDMC.

We are pursuing with IBM HW support as of 2013-07-18

### END NOTICE ###


After a week, still no no response from support,
but I think I found out why this was a problem.

On physical hardware, "lssyscfg -r lpar" will show virtual_eth_mac_base_value=
On the flex nodes, this value is not exposed.

I can't tell if this is an SDMC/FSM limitation, or a flex node limitation.
I know that IVM sees it, but am not sure about HMC.

So, when LPAR profiles are copied over, they will bring the VEMBV,
and there is no way to change it short of deleting and re-creating.

All in all, it may just be easier to use mksyscfg from the start.
An example might be:

mksyscfg -r lpar -m Server-8205-E6D-SN10FFFFF -i profile_name=DefaultProfile,\
name=production,lpar_id=4,lpar_env=aixlinux,min_mem=1024,desired_mem=8192,\
max_mem=16384,mem_mode=ded,proc_mode=shared,min_proc_units=0.1,\
desired_proc_units=0.1,max_proc_units=8.0,min_procs=1,desired_procs=1,\
max_procs=8,sharing_mode=uncap,uncap_weight=8,max_virtual_slots=10,\
\"virtual_scsi_adapters=2/client/1/vioserver/24/1\",\
\"virtual_eth_adapters=8/0/25//0/1,9/0/863//0/1\",auto_start=0


But there's already reference online for this sort of command.

Also, while working on a p740 via IVM, I ran into more differences from HMC/SDMC.
When you add a client LPAR with virtual SCSI, IVM automagically creates the VIO server virtual scsi server adapter. In addition, +1 from that slot it creates a virtual serial adapter for mkvterm.

If you're used to adding virtual scsi adapters in order, and you don't skip a slot on the mksyscfg lines, then you'll get this error:
[VIOSE01050173-0290] Cannot create virtual serial adapter in the management partition in the virtual slot number specified 20.


I couldn't find this error anywhere else on the internet, and it was a little confusing since I wasn't making a virtual serial adapter.
Computer Drive

DS6800 errors

Info from digging into DS6800 (baby shark) which I didn't find online.
It's pretty sparse for the actual raw troubleshooting.
Basically though, it looks like there are recurring I/O channel failures for the primary processor card.
This has been replaced before, so it's probably a chassis issue.
There were power failures, so maybe it's a poor power regulation issue.
The PSU should take the hit and not kill the proc cards.

Since the site PDU was replaced, hopefully those problems will be gone.

Anyway, a CE is coming out to replace this processor card.
Read more...Collapse )
Josh 2004 Happy

POWER7+

So, it came out 2 months ago, but here's the summary:
* D model numbers (9117-MMD, 9719-MHD, etc)
* Up to 128 cores in a p780+ (vs 96 in MMC)
* Double the RAM capacity (4 TB)
* MINIMUM CPU ENTITLEMENT 0.05 VS 0.10
* CoD CPU and RAM can be in a pool shared by multiple systems.
* NO p795 POWER7+ OPTION AT THIS TIME

Requirements:
* AIX 7.1 TL2, 71TL01SP06, 71TL00SP08, 61TL08, 61TL07SP06, 61TL06SP10
* AIX 5.3 TL12 SP07 (Expected but not released yet. Only for extended support)
* VIO 2220, VIO2215 (Dec19)
* HMC 7.7.6 (CR3 or later, and 3GB of RAM if over 256 LPARs total)
* i6.1 only supported through VIO or i7.1 client.

New 1.8" SSD enclosure:
* UltraSSD: New 1U drawer with 30SSDs (GX++ PCIe Cable and two SAS RAID controllers)
* UltraSSD 1U drawer has four 4xSAS ports for running two EXP24S 2U drawers.
* UltraSSD controllers will support EASY TIER for AIX and VIO in 2013.
* UltraSSD will be added to DS8k line in 2013.

New Disk-as-Tape device:
* "RDX" Removable Disk - looks like tape, but it hot-swap disk to replace pre-LTO tech.
* RDX SATA supported on iSeries as optical
* RDX USB supported on AIX and VIO as well.

New I/O components:
* IBM Rackswitch options, with 1GB, 10GB and 40GB ethernet ports.
* PCIe2 dual-port Remote DMA over Ethernet (vs Infiniband for low latency MPI)
* GX++ Dual Port 10gbit FCoE or 10gbit FC Adapters for p770/p780 (no Linux. iSeries through VIO)
* GX++ Dual Port 16GBFC or 10GBFCoE Adapters for p795 (no Linux or iSeries)

Hardware Enhancements:
* 4 sockets per CPU card (vs 2-sockets)
* Supports 64GB DIMMs (vs 32GB)
* Lower heat/power consumption with 32nm vs 45nm
* Better performance per core with 10MB L3 Cache vs 4MB and up to 4.4GHz
* Active Memory Expansion performance improved with on-chip Compression Accelerator
* Crypto accelerator for AES, SHA and RSA
* Random number generator on die
* Four floating point pipelines vs 2 (single precision takes 1, DP takes 2)
* Higher concurrency during firmware updates (Can reset one core at a time)
* Higher uptime with redundant lanes in cache and in CEC interconnect cables
* CPU upgrades for MMA, MMB and MMC will include new CEC enclosures.
* Free CoD: Includes 240GB memory days and 15 processor days per CPU initially shipped
* Free CoD: Includes 90 days of full activation (one shot)
* http://www-03.ibm.com/partnerworld/partnerinfo/src/atsmastr.nsf/WebIndex/TD105846

FLEX Hardware:
* p260 and p460 dual-port FCoE Mezzanine to support dual VIO
* New FCoE 8-port switch module to support new FCoE mezzanine cards
* New FC switch module
* New v7000 module
* New USB-3 storage drawer (1x RDX, 2X DVD-RAM)

Hardware Withdrawals:
* No PCI-X, HSL, RIO-1, or IOP support in POWER8
* 3.5" SAS drawers to be withdrawn in 2013.
* SCSI DISK SUPPORT IS DROPPED!!! SCSI Tape still okay on PCI-X #5736 in I/O drawer
Logo IBM CATE

Multiple VLANs on PowerVM VIO Servers

I always run into issues when I work in a multiple VLAN environment, because it's not *that* common for my builds. This is a reminder for me.

The magic is when using multiple VLANs:
1) Don't use the real VLAN ID for the trunk PVID unless you know for certain that was set on the switch. It is stripped off of all packets, and who knows what the PVID of the switch is, if any.
2) Any mismatch between PVID on the SEA and the trunk will cause packets to be dropped.
3) Don't use IEEE VLAN mode for the client adapter unless you're going to add VLAN interfaces from AIX. When not in VLAN mode, the PVID is ADDED to all packets on client adapters.
4) When using multiple trunks on one SEA, they all have to be the same trunk priority. ha_mode=sharing balances not using trunk priority, but based on the order of the virt_adapters field.
Josh 201604 KWP

Decoding SCSI Additional Sense log pages

This is from a decade ago, so I thought it time to update the URLs and post it to LJ.

Here is information on how to decode SCSI Sense Data. This revolved around IBM Magstar products since that is where I was first exposed to the guts of SCSI errors.

The AIX Error Report records for TAPE_ERR# (usually 1-6) often include SENSE DATA in the Detail section. A SCSI LOG PAGE 06h can be parsed manually to provide the SENSE KEY, ASC and ASCQ values, as well as the ERROR CODE which will tell us if it is current or past errors being reported. An example Log Page 6 is below:
	0600 0000 0300 0000 FF80 0000 0000 0000 0000 0000 7000 0000 0000 0015 0000 000B 
	0000 0000 001C 7F00 2000 0033 7E58 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 B041 0000 0000 

If you'll notice, byte 0 is 06. Also note that there are 32 bytes per line, and two hex digits per byte.

Byte 20 represents the SCSI error class. Valid classes are:
    * 70 - Current Error (Direct Access Logical Block NOT From Sense Data).
    * F0 - Current Error (Direct Access Logical Block IS From Sense Data)
    * 71 - Deferred Error (Direct Access Logical Block NOT From Sense Data).
    * 7F - Vendor Spec. Error (Direct Access Logical Block NOT From Sense Data).
    * EE - Encryption Error
    * F1 - Deferred Error (Direct Access Logical Block IS From Sense Data).
    * FF - Vendor Spec. Error (Direct Access Logical Block IS From Sense Data).

In this example, EC (byte 20) is 70, which is valid and means this is a current error.

When the error class is valid, we can get the sense key from byte 22.

In this example, the sense key is 00 (zero) which means "NO ADDITIONAL SENSE". The standard list of sense keys is:
	X0 - No Sense         X6 - Unit Attention    XC - Equal.
	X1 - Recovered Error  X7 - Data Protect      XD - Volume Overflow.
	X2 - Not Ready        X8 - Blank Check       XE - Miscompare.
	X3 - Medium Error     X9 - Vendor Specific   XF - RESERVED.
	X4 - Hardware Error   XA - Copy Aborted
	X5 - Illegal Request  XB - Aborted Command

ASC is at byte 32 (first byte on line 2) and ASCQ is byte 33.

The ASC and ASCQ chart is pretty extensive. Please see the ASC/ASCQ Code Listing from the SCSI Technical Committee for an authoritative reference:

Note also that sometimes the ASC/ASCQ pair you're looking up may fall under a different sense key than is expected. The Sense key gives general information, such as "Recovered error", "hardware error", or the like. The ASC/ASCQ pair tells what the actual problem is. This isn't always 100% helpful, but is close.

Good reference was had from the 3590 Maintenance Information Guide, Msgs section. This gives 90% of what anyone would need to decode SCSI LOG PAGE 06h messages for IBM tape drives. The Jaguar Tape Drives (IBM 3590 & 3592) Information Center is at:

Included within are how to decode SIM/MIM Records, Log Page 6, and other related information. The 3590 Hardware Reference Guide, Appendix B also shows decent information in regards to non SIM/MIM errors. It makes reference to sense key and ASC/ASCQ bytes. You can acquire PDF copies of tape removable media storage systems' manuals via the following URLS:

The Magstar Maintenance and Ultrium SCSI Reference books makes reference to "Fault Symptom Codes" which are more definitive; however, due to confidentiality of the 3590 microcode, a complete list of fault symptom codes is not available.

For encryption records, see the Troubleshooting section of the IBM TS3500 Tape Library (IBM 3584) Information Center:

The above also has general SCSI SENSE KEY/ASC/ASCQ and extended IBM codes under the Reference section.

There are other ways to get this information, but this was easiest for me.

Yours truly,
Josh Davis
Logo IBM AIX 3.2.5

Why does AIX 7.1 have shr.o, shr4.o and shr_64.o in root?

On most AIX 7.1 systems, I find stray object files in /.

I finally got around to looking at them, and they are libiconv shared objects.

This is most likely an error in packaging of bos.rte.iconv.

The ones inside of /usr/lib/libiconv.a are from 2010 (7.1.1.0),
but the ones in / are from 2011 (7.1.1.15)

It's rare to run into NLS problems, so it's not been worth the hassle of calling in.

I typically leave them there, in case there is a real reason, or if IBM fixes/cleans them up in a future PTF.
Logo IBM CATE

HOWTO SDDPCM upgrade AFTER AIX upgrade, when using SAN boot

If you've ended up upgraded to newer AIX, and are SAN boot from SDDPCM, there is hope:
* mksysb -eXpi /dev/rmt0 ## just in case it all blows up
* ## Stop/quiesce what you can, unmount filesystems, vary off volume groups.
* vi /usr/lpp/devices.sddpcm.53/deinstl/devices.sddpcm.53.rte.pre_d
* ## Add "exit 0" as the first line after the shebang.
* ZZ
* installp -ug devices.sddpcm.53
* installp -acXYgd /export/lpp_source/sddpcm devices.fcp.disk.ibm.mpio.rte devices.sddpcm.71.rte
* lspv | grep rootvg | cut -f 1 -d \ | xargs -n1 bosboot -ad
* shutdown -Fr now

This worked for me at 2.6.0.3 on several different systems.
Josh 201604 KWP

svcconfig backup no longer works?

I updated firmware from 6.3.0.0 to 6.4.0.2 on this new v7000.

Trying to re-backup the config, I get:
IBM_2076:v7000:xaminmo>svcconfig backup
CMMVC6202E This command can only be run by the superuser.

CMMVC6156W SVCCONFIG processing completed with errors


I search on this, and 3 hits, all the docs.

Reference > Block storage system messages and codes > Command-line interface messages
CMMVC6202E

The cluster was not modified because the IP address is not valid.
Explanation

An attempt was made to change the IP address of a cluster to an address that is not valid.
User response

Correct the address and reissue the command.
Parent topic: Command-line interface messages


What makes it most annoying is that:
A) You cannot use the same authorized_keys SSH key for more than one user
B) Only one key attempt is allowed

So, if using pagent, you have to delete the key you used for the other user, and add the one for superuser. You can't have both.