If you have a player state that needs to persist between different Scenes (such as player health or power-ups), then you may want to keep a list of players outside of the Scene, and just let the Scene borrow them as needed.
The diagram doesn't include any indication of what's in charge of drawing things. Presumably the actors draw themselves? but it doesn't point to the DX9 API part in any way
If you have Actor-driven sounds (such as sounds when the player jumps, lands, or attacks, or when an Enemy dies), then your Sound output system should probably not be so separate from the Enemies and Player. Also, there's no reason for SceneManager to know about the Sound stuff. In my engine, Sound is a bunch of static functions that are called by Components that handle sound.
Are Enemies and the Player fundamentally different enough to warrant being entirely separate? They both require collision, animation, and sounds. You could have a single Actor class, and the difference between the player and enemies is that Enemies' behavior is driven by a simple AI while the Player's behavior is driven by player inputs.