If you're getting an "invalid hostname", it doesn't mean your DNS isn't responding... it (in all likelihood) means your DNS is responding to your DNS query with an NXDOMAIN which means no record for that hostname exists in the DNS zone. "that can't be possible, I do an nslookup immediately after and it resolves, so I know it's in DNS". Read on...
You'll need to understand two things... what's an NXDOMAIN response, and then how does your DNS client/resolver handle using multiple DNS servers.
First, and NXDOMAIN response. This is a valid response... it's different than a timeout. An NXDOMAIN is a DNS server saying "I host a DNS zone for domain.com (I'm authoritative), and there is no record in that zone for yourserver.domain.com", and your DNS client/resolver will CACHE that in your local DNS cache (this is sometimes referred to as negative caching) just like it would cache it if there was a response from the DNS server saying "yourserver.domain.com resolves to 10.10.10.17".
Usually, a DNS zone's administrator (or the DNS server software's defaults) will set the negative caching TTL lower than the standard TTL... so your DNS cache might hold a "yourserver.domain.com resolves to 10.10.10.17" for 2 hours, but holds a "there's no record for yourserver.domain.com" for only 5 minutes. This explains why a scenario like this can seem so random and transient. It also explains why an "ipconfig /flushdns" can resolve the issue.
An aside... also remember that APPLICATIONS will cache DNS as well... so even if you flush your PC's DNS cache, if the application process is still running, it may be holding that negative cached value without checking back in the DNS cache, and it in all likelihood isn't observing TTL values. Closing the app and relaunching it is the only way to definitively clear it.
But the real question is why is your DNS infrastructure returning two conflicting sets of results, one time getting the IP, and one time getting an NXDOMAIN response. This is where you need to understand how your PC's DNS resolver works. Check your TCP/IP settings and you'll probably find you're configured to use 2 DNS servers for DNS resolution. One is primary, the other secondary. This means your DNS resolver will ALWAYS query the primary first and ONLY query the secondary when the primary is unresponsive. So here's a real life scenario of what we've seen happen with a County government client...
Their PC's were configured to use two DNS servers:
- primary was one managed by the county
- secondary was one managed by the state
On the primary DNS server, managed by the county, a DNS zone for Somecounty.somestate.gov was hosted, and had an A record for laserficheserver.somecounty.somestate.gov resolving to 10.10.10.17. 98% of the time this DNS server was perfectly responsive, user's PC's queried this server and got the information resolving to the IP address.
2% of the time, that server did not response (for load/performance reasons). In that scenario, a user's PC would then query the secondary DNS server, the state's. It had a DNS zone for somestate.gov... but it did not have a record for laserficheserver.somecounty.somestate.gov within that zone, so it replied with an NXDOMAIN response, i.e. I'm authoritative for this domain, and I can say authoritatively that no such hostname record exists in this domain/zone.
The symptoms... randomly, intermittently, and seldomly (but enough to get everyone's attention), a user would get this invalid hostname error. Different user every time. Lasting only for a short duration. Fixable by rebooting the PC or flushing the DNS cache (because on their next connection attempt, the primary DNS server would actually respond).
How to prove this is happening? A couple ways:
- perform a DNS query directly to both the primary and secondary DNS servers configured for the affected user's PC and see if you get a you can do this using nslookup, i.e.:
- nslookup laserficheserver.yourdomain.com primarydnsserverIP
- nslookup laserficheserver.yourdomain.com secondarydnsserverIP
- presumably, you'll get the IP back you're expecting from the primary, but you'll get a different response from the secondary.
- Another approach is to record what's in your DNS cache the moment it happens. Remember, negative caching TTLs are generally very short, so this is a narrow window. We armed our client's IT staff with a batch file that would dump the DNS cache to a txt file, and told them the moment a user reported the issue, email them the batch and have them run it immediately. The DNS cache can be large, so dump it to a TXT file makes reviewing it easier. The contents of our batch file was a single line:
- ipconfig /displaydns > dnscachecontents.txt
- Then look through that TXT file for the entry for your server...
An entry for a DNS record that returned an IP address will look like this:
answers.laserfiche.com
----------------------------------------
Record Name . . . . . : answers.laserfiche.com
Record Type . . . . . : 1
Time To Live . . . . : 8853
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 68.71.246.184
An entry for a DNS record that does not exist within the DNS zone, i.e. an NXDOMAIN response, will be recorded like this:
somerandomname.laserfiche.com
----------------------------------------
Name does not exist.
And bingo... now just figure out why your particular DNS infrastructure is responding with this NXDOMAIN and you'll have this wrapped up in no time.