UBNT Infrastructure Part 5 – Replacing the PDC Core
So it had all be worked out. I had chosen the brand I would be using for my wired and wired replacement – check. I had chosen whether I would use EdgeSwitch or UniFi Switch – check. Now came the time to do the actual work. And surprisingly replacing a core switch is not a ten minutes job.
The start of this work took place in the lead up to the Christmas break. I had unpacked all the switches and configured them with generic settings like names, IP, VLANs, tagging etc. And on 8th of January, I would be installing the switches.
I was locked and loaded. And unprepared.
The plan had been simple, and I had made one mistake which would take my planned 5-hour project into a 48-hour project. But we will get to that.
Figure 1 & 2. The old core, and a core sized hole in my life.
The school had been running an HP 5400 Procurve core filled with a mix of fibre ports and ethernet ports. I am not going to try and say how it didn’t do its job, how it is a switch that I was glad to get rid of. The HP Procurve and now HP Aruba switches are amazing gear. But they are expensive. The new core would be the following;
- 4x ES-48-500W
- 7x ES-16-XG
I would use four 48 port ethernet switches to handle my ICT Department network connections and seven 16 port fibre switches for building uplinks and server fibre connections.
Figure 3. The new core is in… time for cables.
So far so good. Everything was going as planned and by 9 am I was starting to plug the core back in. I was happy things were going well. By 10 am I had all the servers connected again and I was bringing things back online.
Figure 4. Cables are in. Though still messy.
How I started to run into issues, I couldn’t figure out why our internet connection was not coming back up. It was at this point I was starting to realise where I had stuffed up. In every other company, I had worked in the internet connection was terminated in the server room, connected directly to a Router, which was then was connected via a firewall to your network. However, the internet here didn’t go like that. It was a wireless link that was terminated in our B Block, and then via a segregated section of the network, it ran through the building fibre link to the ICT department and then into our firewall, from the firewall to the core and then the process was repeated. Sending the segregated traffic back out through B Block and to the wireless link.
Despite the simple sounding nature, and when I had been getting it ready I had tracked the right cables and throught I had it all planned. Though during the moving of things around cables got mixed up and at one point I had plugged them in wrong. And I discovered later that I had sent internal network traffic out via my WAN. I know… I stuffed up.
I hadn’t released this at that point. By 2 pm I had it all plugged in right and I still couldn’t get the internet connection working. I tried everything I could think of, I even bought the 5400 back at one point to see if it was a Ubiquiti issue but that didn’t make any difference.
By 4 pm I had spoken to my predecessor and he told me I had it configured right but I still couldn’t get the internet working. I had also called our ISP and asked them if there were any service issues. And we come into another mistake I made, I didn’t open an official ticket with them.
By 7 pm I was still confused, and I called the Director of ICT telling him I was completely lost*. He offered to come in and I accepted his offer and kept working while he said he would be in when he could.
*NOTE: For any young system or network admins, I have always found it best to admit to your bosses when you are lost or out of your depth. Refusing to do so actually makes things worst, your boss would rather hear that you were having issues and help you get over the hundle.
It was 10 pm when he arrived. I was still confused about the internet, though on a plus side I had finished configuring the rest of the core so that all internal traffic was working just as it had before I replaced the HP 5400. We went through the setup and how the internet should be configured. By 10:30 pm he was confused as well. By 11:30 pm we called the ISP again, this time we had decided that we would connect one of our notebooks to the router in B Block and attempt to connect to the internet with it. We asked for all the needed details like our external IP and next hop etc. We were given the info, but we didn’t think to open an actual ticket… again a mistake.
By 1:00 am we were both extremely tired, and in our tired minds we decided the issue was that our local ARP table was different to the up streams router. And we went home to sleep. We knew that it wasn’t the answer to our problem, but we were both beyond tired. And at some point, you have to call it quits. We both went home to get some sleep before what would be another long day.
Turning back up to work at 6:30 am I still couldn’t get it working. And by 8:30 am I had decided that the issue had to be at the ISP level. I call them up and gave my full details, and that I was having an internet issue, finally opening a ticket. A step I should have done 12 hours earlier.
“Of course, we blocked your internet connection at 1:30 pm yesterday. And we then sent you an email.”
The words floored me. The beyond frustrating thing is that this has happened before, and we told them that our mail is on-prem. Therefore, if you block us before you email us we would never get that email and not know we had been blocked.
At this point at 8:30 am in the morning I started the process of getting unblocked, assuring them everything was set up correctly now. And I am not joking, it took almost 4 hours to unblock our internet. Apparently, the tech that was able to do this was on the train.
Needless to say, they are no longer our ISP, and our new Fibre connection at more than twice the speed is much nicer and terminates in the server room.
By 10:30/11am the internet was back up and all services were back up and normal. Thankfully being so early in the year and technically during our ICT Shutdown window staff weren’t too annoyed, but there were some who weren’t happy about it.
Now all wasn’t perfect, but the system was up, and everything worked.
It wasn’t until a week later I would fully see issues I had to deal with, but I will post about them soon.