Recovering A Corrupt OpenLDAP Database On OSX Server

Recovering A Corrupt OpenLDAP Database On OSX Server
Last night we noticed some services provided by an OSX Leopard Server instance were not working correctly. The iChat, AFP and Web services were not authenticating. In Server Admin.app, the “Overview” tab of the Open Directory service reported…
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
Looking at the server error logs through Console.app, the following was occuring every 10 seconds..
com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1
com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds
The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30978. Restore from backup!
bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30978)
slap_startup failed (test would succeed using the -u switch)
http://discussions.apple.com/message.jspa?messageID=9548971
With a little research, I concluded that..
The OpenLDAP database had been corrupted, and..
The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!
After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..
core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
config file testing succeeded
The errors in Console.app stopped, and the Server Admin.app panel started reporting..
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.
Last night we noticed some services provided by an OSX Leopard Server instance were not working correctly. The iChat, AFP and Web services were not authenticating. In Server Admin.app, the “Overview” tab of the Open Directory service reported…
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
Looking at the server error logs through Console.app, the following was occuring every 10 seconds..
com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1
com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds
The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30978. Restore from backup!
bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30978)
slap_startup failed (test would succeed using the -u switch)
With a little research, I concluded that..
  1. The OpenLDAP database had been corrupted, and..
  2. The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!
After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..
core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
config file testing succeeded
The errors in Console.app stopped, and the Server Admin.app panel started reporting..
LDAP Server is: Running
Password Server is: Running
Kerberos is: Running
I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.

Comments

44 responses to “Recovering A Corrupt OpenLDAP Database On OSX Server”

  1. Paul Avatar
    Paul

    This tip saved my ass! Thank you.

  2. preston.lee Avatar

    @Paul
    You’re welcome!

  3. Terrance Avatar
    Terrance

    Followed your instructions, worked like a champ!! Thanks!!

  4. jan Avatar

    Thank you! Weird enough the configuration tester already fixed it for me!

  5. Bob Avatar

    Add me to the grateful throngs. The LDAP db on my Leopard Server was corrupted after a restart that didn’t. After a panicked morning, I found your post. Worked like a champ!

    (Does 4 count as a throng?)

  6. Craig Avatar
    Craig

    This worked great but it was corrupted again this morning after a restart, any reason why this would keep happening? Thanks!

  7. preston.lee Avatar

    @Craig
    I haven’t seen that happen before. The only time this has happened to me is when I’ve had to hard reset the machine and/or manually kill the daemons: in other words not shutting stuff down properly. My best guess is that there may be something external to LDAP that is corrupting the files on disk, or perhaps a disk issue itself.

  8. Craig Avatar
    Craig

    Hi,
    Thanks for your quick response, and actually we are having to force shutdown the machine as it is getting stuck on shutdown. Thanks so much for the insight.

    @preston.lee

  9. preston.lee Avatar

    @Craig
    No problem. If you’re consistently needing to force shutdown, try running `sync` at the command line before hitting the power button. That should force flush the disks write buffers. Hopefully that’ll at least keep LDAP from getting corrupted.

  10. Marco Papa Avatar

    Preston, another great thanks from me. Anyway, my situation was identical to yours. Same identical messages, and corruption generated by hitting the power button.

    I did the db_recover and it worked. Only thing, Kerberos is still listed as “Stopped”. Any ideas why? Everything seems to work.

  11. preston.lee Avatar

    @Marco Papa
    If you’ve already tried restarting the “Open Directory” service a couple times, I’m not sure. Is there anything in the system logs that seems relevant? Could it be a DNS problem?

  12. Adam Avatar
    Adam

    Thank you!

  13. Mel Avatar

    Preston, yet another big vote of thanks for this – you saved our bacon big time!

  14. Adam Avatar

    MAN YOU SAVED OUR BACON!! Thanks a million we owe you a PINT. 🙂

  15. Quentin Avatar
    Quentin

    Great post & instructions; like the others, it got me out of a jam after an update hung the restart and left me with a corrupt db.

  16. Ranj Avatar
    Ranj

    Thanks a lot this assisted me with a similar problem I was having.

    I had to run a few more commands in terminal to get it working though

    1) sudo to root

    sudo -i
    2) shutdown the open directory server

    service org.openldap.slapd stop

    3) dump a copy of the Open Directory database to an LDIF format text file

    mkdir /var/root/opendirectory
    cd /var/root/opendirectory
    slapcat -l dir.ldif
    4) move the old (corrupt) database files out of the way (or remove them).

    cd /var/db/openldap/openldap-data
    mkdir SAVE
    mv *.bdb SAVE/

    be sure you don’t move, rename or delete the file named DB_CONFIG. It’s needed.

    5) recreate the database from the LDIF format file

    cd /var/root/opendirectory
    slapadd -l dir.ldif
    slapindex
    You will see some harmless warnings during slapadd. Ignore them.

    6) restart open directory

    service org.openldap.slapd start

  17. Jon Zgoda Avatar
    Jon Zgoda

    Thank you! Thank you! Thank you!!!

    I thought I was being good, doing a backup through ServerAdmin before any updates…but then I wasn’t able to restore through ServerAdmin when this problem occurred.

    Your solution worked perfectly, and was back up in minutes.

  18. john lewis Avatar
    john lewis

    Ranj’s instructions worked for me. THANKS GUYS! Phew!

  19. Moshik Avatar
    Moshik

    Ranj 10x Alot…!!! saved us also…

  20. Daniel Avatar
    Daniel

    Thank you.

    This has happened to to me three times.
    Two times I rebuilt from scratch.

    This time – thank you!

  21. Junjun Avatar
    Junjun

    This works! Won’t know db_recovery is the right tool without reading your post.

    On my Xserve, I have to use this to recover the database.

    sudo db_recover -cev -h /var/db/openldap/openldap-data/

  22. MacDave Avatar
    MacDave

    Worked great for me — thanks!

  23. BradDS Avatar
    BradDS

    I may try this seeing all the positive responses and the fact that I am currently in the same boat, but did you lose all of your LDAP setting or did they stay intact, because I see you said you carefully backed up the openldap directory but does this require you to replace it? Or does it fix the existing LDAP

  24. admin Avatar
    admin

    For *me* it fixed my existing database.

  25. BradDS Avatar
    BradDS

    That would be wonderful. One last question if I may. You also mentioned you had to restart services such as AFP after. What was the reason? Did it prevent the LDAP from fully starting or running that recovery caused those services to stop?

  26. BradDS Avatar
    BradDS

    Chalk up another Success Story! Thanks!

  27. BigClay Avatar

    Thanks for the post it saved my ‘bacon’ as well. Now on to make a better back up approach for the OS X server!!!

  28. Gill Avatar

    KARMA is a boomerang and you have a lot of good KARMA coming your way my friend. This fix is fantastic! and a life saver for all who lose theri ldap data!!

    TY TY TY !!

  29. admin Avatar
    admin

    No problem and I certainly hope so! 🙂

  30. Brian Jønch Avatar

    Followed your instructions, worked like a champ, Thanks alot.

    /BJ

  31. Eric Avatar
    Eric

    You are the man!!! You saved me from a LOT of work, time, frustration, profanity, ulcers, high blood pressure, etc.

  32. Geoff Smyth Avatar
    Geoff Smyth

    Elevated to legend status you are.

  33. Michael Avatar
    Michael

    Just about to poo my pants until I came across this article. Many thanks Guys!

  34. Kenny Avatar
    Kenny

    LDAP Server up and running again.
    Thanks a lot, this did it, great !!!!!

  35. Steve Avatar
    Steve

    Preston, thanks for the article. I’m experiencing a similar problem, though OD reports that LDAP, Password Server and Kerberos are all running. But in my LDAP log I get the similar messages as you:

    Oct 4 11:07:36 s1 slapd[26448]: bdb(cn=accesslog): DB_ENV->lock_id interface requires an environment configured for the locking subsystem
    Oct 4 11:07:36: — last message repeated 3 times —
    Oct 4 11:07:36 s1 slapd[26448]: findbase failed! 80

    When I tested the config, I get this:

    s1:~ sadmin$ sudo /usr/libexec/slapd -Tt
    bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
    bdb_db_open: database “cn=accesslog”: unclean shutdown detected; attempting recovery.
    bdb_db_open: database “cn=accesslog”: recovery skipped in read-only mode. Run manual recovery if errors are encountered.
    bdb_db_open: database “cn=accesslog”: alock_recover failed
    bdb_db_open: could not restore bdb backend -1config file testing succeeded
    bdb_db_close: database “cn=accesslog”: alock_close failed

    Do you think I should follow your steps above in using the db_recovery tool, or is that not applicable to my situation? Authentication and all services are working on the server, but I’m afraid this problem could manifest itself in ugly ways if I let it continue.

  36. Steve Avatar
    Steve

    After some other troubleshooting I tried the db_recover tool, and rather than recovering the database, I received these messages:

    Oct 4 11:46:18 s1 slapd[1116]: bdb(dc=*****,dc=*****): PANIC: fatal region error detected; run recovery
    Oct 4 11:46:18 s1 slapd[1116]: SASL [conn=29] Failure: no user in database _ldap_replicator

    I’ve rebooted the machine, manually unloaded & loaded slapd, and ran the recovery tool multiple times, and nothing has worked. Any other ideas? Getting pretty nervous now.

  37. Steve Avatar
    Steve

    Not sure if anyone is checking this thread anymore, but just in case someone with the same problem reads it, here was my solution. None of the commands above worked in my situation – I suppose the corruption was too bad for the db_recover tool (or the manual .ldif steps), so I took one of my healthy OD Replicas and promoted it to Master. I then destroyed OD on the the previous Master, rebooted and made it a Replica of the new Master. This resulted in a fresh, clean database on the old Master, as well as not losing any records or passwords on the new one. Took me hours of troubleshooting and trial and error tests, but I should have done this as my very first step. There was minimal downtime and to most users the transition was transparent.

  38. leolor Avatar
    leolor

    has exactly the same problem on leopard 10.5.8 server. thanks. it works again for me!!!!!!!

  39. MV Avatar
    MV

    Seriously, put up a Paypal link since you saved me a few hours of research. Or a mailing address.

  40. Sanjiv Singh Avatar
    Sanjiv Singh

    Thanks man !!

  41. chris Avatar

    Just saved my Sunday. thanks!

  42. TD Avatar
    TD

    Thank you, problem I encountered was identical.

  43. Bryan Avatar
    Bryan

    Thank you for the help. It is 2015 but I am still running 10.6.8 on my server and this solved the problem.

Leave a Reply

Your email address will not be published. Required fields are marked *