Paul Thagard: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Omnipaedista
en>Bluedudemi
added Michigan to infobox as well as alumni cat.
 
Line 1: Line 1:
{{Multiple issues|
Hi there. Allow me begin by introducing the writer, her name is Sophia. Her family lives in Alaska but her spouse desires them to transfer. I am truly fond of to go to karaoke but I've been taking on new issues recently. Credit authorising is how he makes money.<br><br>My web page: tarot readings; [http://help.ksu.Edu.sa/node/65129 http://help.ksu.Edu.sa/node/65129],
{{refimprove |date=January 2009}}
{{technical |date=February 2010}}
{{tone |date=February 2010}}
{{weasel |date=April 2010}}
{{update|date=October 2013}}
}}
 
'''High availability''' is a [[system design]] approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
 
Users want their systems, for example hospitals, production computers, and the [[electrical grid]] to be ready to serve them at all times.  [[Availability]] refers to the ability of the user community to obtain a service or good, access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is - from the users point of view - ''unavailable''.<ref>{{cite book|author=Floyd Piedad, Michael Hawkins|title=High Availability: Design, Techniques, and Processes|url=http://books.google.com/books?id=kHB0HdQ98qYC&dq=high+availability+floyd+piedad+book&printsec=frontcover&source=bn&hl=en&ei=gs0LSrLvBKjm6gOT3ISPCA&sa=X&oi=book_result&ct=result&resnum=7|isbn=9780130962881|publisher=Prentice Hall|year=2001}}</ref> Generally, the term ''[[downtime]]'' is used to refer to periods when a system is unavailable.
 
==Scheduled and unscheduled downtime==
{{unreferenced section|date=June 2008}}
A distinction can be made between scheduled and unscheduled downtime. Typically, scheduled downtime is a result of maintenance that is disruptive to system operation and usually cannot be avoided with a currently installed system design. Scheduled downtime events might include patches to [[system software]] that require a [[booting|reboot]] or system configuration changes that only take effect upon a reboot. In general, scheduled downtime is usually the result of some logical, management-initiated event. Unscheduled downtime events typically arise from some physical event, such as a hardware or software failure or environmental anomaly. Examples of unscheduled downtime events include power outages, failed [[CPU]] or [[RAM]] components (or possibly other failed hardware components), an over-temperature related shutdown, logically or physically severed network connections, security breaches, or various [[Application software|application]], [[middleware]], and [[operating system]] failures.
 
Many computing sites exclude scheduled downtime from availability calculations, assuming that it has little or no impact upon the computing user community. By doing this, they can claim to have phenomenally high availability, which might give the illusion of continuous availability. Systems that exhibit truly continuous availability are comparatively rare and higher priced, and most have carefully implemented specialty designs that eliminate any [[single point of failure]] and allow online hardware, network, operating system, middleware, and application upgrades, patches, and replacements. For certain systems, scheduled downtime does not matter, for example system downtime at an office building after everybody has gone home for the night.
 
==Percentage calculation==
 
Availability is usually expressed as a percentage of uptime in a given year. The following table shows the downtime that will be allowed for a particular percentage of availability, presuming that the system is required to operate continuously. [[Service level agreement]]s often refer to monthly downtime or availability in order to calculate service credits to match monthly billing cycles. The following table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable per year, month, or week.
 
<div align="center">
{| class="wikitable" style="text-align:right;"
!Availability %
!Downtime per year
!Downtime per month*
!Downtime per week
|-
| align="left" | 90% ("one nine")
|36.5 days
|72 hours
|16.8 hours
|-
| align="left" |95%
|18.25 days
|36 hours
|8.4 hours
|-
| align="left" |97%
|10.96 days
|21.6 hours
|5.04 hours
|-
| align="left" |98%
|7.30 days
|14.4 hours
|3.36 hours
|-
| align="left" |99% ("two nines")
|3.65 days
|7.20 hours
|1.68 hours
|-
| align="left" |99.5%
|1.83 days
|3.60 hours
|50.4 minutes
|-
| align="left" |99.8%
|17.52 hours
|86.23 minutes
|20.16 minutes
|-
| align="left" |99.9% ("three nines")
|8.76 hours
|43.8 minutes
|10.1 minutes
|-
| align="left" |99.95%
|4.38 hours
|21.56 minutes
|5.04 minutes
|-
| align="left" |99.99% ("four nines")
|52.56 minutes
|4.32 minutes
|1.01 minutes
|-
| align="left" |99.999% ("five nines")
|5.26 minutes
|25.9 seconds
|6.05 seconds
|-
| align="left" |99.9999% ("six nines")
|31.5 seconds
|2.59 seconds
|0.605 seconds
|-
| align="left" |99.99999% ("seven nines")
|3.15 seconds
|0.259 seconds
|0.0605 seconds
 
|}
</div>
 
[[Uptime]] and [[availability]] are not synonymous. A system can be up, but not available, as in the case of a [[network outage]].
 
Percentages of a particular order of magnitude are sometimes referred to by the number of nines or "class of nines" in the digits.  For example, electricity that is delivered without interruptions ([[Power outage|blackout]]s, [[brownout (electricity)|brownout]]s or [[Voltage spike|surge]]s) 99.999% of the time would have 5 nines reliability, or class five.<ref>[http://www.cs.kent.edu/~walker/classes/aos.s00/lectures/L25.ps Lecture Notes] M. Nesterenko, Kent State University</ref>  In particular, the term is used in connection with [[mainframes]]<ref>[http://comet.lehman.cuny.edu/cocchi/CIS345/LargeComputing/05_Availability.ppt Introduction to the new mainframe: Large scale commercial computing Chapter 5 Availability] IBM (2006)</ref><ref>[http://www.youtube.com/watch?v=DPcM5UePTY0 IBM zEnterprise EC12 Business Value Video] at ''youtube.com''</ref> or enterprise computing.
 
In general, the number of nines is not often used by a network engineer when modeling and measuring availability because it is hard to apply in formula. More often, the unavailability expressed as a [[probability]] (like 0.00001), or a [[downtime]] per year is quoted. Availability specified as a number of nines is often seen in [[marketing]] documents.{{Citation needed|date=August 2008}}
 
The use of the "nines" has been called into question, since it does not appropriately reflect that the impact of unavailability varies with its time of occurrence.<ref>[http://searchstorage.techtarget.com/tip/0,289483,sid5_gci921823,00.html Evan L. Marcus, ''The myth of the nines'']</ref>
 
For large amounts of 9s, the "unavailability" index (measure of downtime rather than uptime) is easier to handle, which is also why such is used in e.g. hard disk bit error rates.
 
A formulation of the ''class of 9s''  <math>c</math>  based on a system's [[unavailability]] <math>x</math> would be
 
:<math> c := \lfloor  - \log_{10} x  \rfloor</math>
 
(cf. [[Floor and ceiling functions]]).
 
A [[Nine (purity)|similar measurement]] is sometimes used to describe the purity of substances.
 
==Measurement and interpretation==
Clearly, how availability measurement is subject to some degree of interpretation. A system that has been up for 365 days in a non-leap year might have been eclipsed by a network failure that lasted for 9 hours during a peak usage period; the user community will see the system as unavailable, whereas the system administrator will claim 100% [[uptime]]. However, given the true definition of availability, the system will be approximately 99.9% available, or three nines (8751 hours of available time out of 8760 hours per non-leap year). Also, systems experiencing performance problems are often deemed partially or entirely unavailable by users, even when the systems are continuing to function. Similarly, unavailability of select application functions might go unnoticed by administrators yet be devastating to users&nbsp;&mdash; a true availability measure is holistic.
 
Availability must be measured to be determined, ideally with comprehensive monitoring tools ("instrumentation") that are themselves highly available. If there is a lack of instrumentation, systems supporting high volume transaction processing throughout the day and night, such as credit card processing systems or telephone switches, are often inherently better monitored, at least by the users themselves, than systems which experience periodic lulls in demand.
 
An alternative metric is [[mean time between failures]] (MTBF).
 
==Closely related concepts==
Recovery time (or estimated time of repair (ETR), also known as [[recovery time objective]] (RTO) is closely related to availability, that is the total time required for a planned outage or the time required to fully recover from an unplanned outage. Another metric is [[mean time to recovery]] (MTTR).  Recovery time could be infinite with certain system designs and failures, i.e. full recovery is impossible. One such example is a fire or flood that destroys a data center and its systems when there is no secondary [[disaster recovery]] data center.
 
Another related concept is data availability, that is the degree to which databases and other information storage systems faithfully record and report system transactions. Information management specialists often focus separately on data availability in order to determine acceptable (or actual) data loss with various failure events. Some users can tolerate application service interruptions but cannot tolerate data loss.
 
A [[service level agreement]] ("SLA") formalizes an organization's availability objectives and requirements.
 
==System design for high availability==
Paradoxically, adding more components to an overall system design can undermine efforts to achieve high availability. That is because complex systems inherently have more potential failure points and are more difficult to implement correctly. While some analysts would put forth the theory that the most highly available systems adhere to a simple architecture (a single, high quality, multi-purpose physical system with comprehensive internal hardware redundancy); however, this architecture suffers from the requirement that the entire system must be brought down for patching and Operating System upgrades. More advanced system designs allow for systems to be patched and upgraded without compromising service availability (see [[load balancing (computing)|load balancing]] and [[failover]]).
 
High availability requires less human intervention to restore operation in complex systems, the reason for this being that the most common cause for outages is human error.<ref name=humanerror>{{cite web|date=October 27, 2010|url=http://www.gartner.com/id=1458131|title=Top Seven Considerations for Configuration Management for Virtual and Cloud Infrastructures |publisher=[[Gartner]]|accessdate=October 13, 2013}}</ref>
[[Redundancy (engineering)]] is used to create systems with high levels of Availability (e.g. aircraft flight computers). In this case it is required to have high levels of failure detectability and avoidance of common cause failures. Two kinds of redundancy are passive redundancy and active redundancy.
Passive redundancy is used to achieve high availability by including enough excess capacity in the design to accommodate a performance decline. The simplest example is a boat with two separate engines driving two separate propellers. The boat continues toward its destination despite failure of a single engine or propeller. A more complex example is multiple redundant power generation facilities within a large system involving [[electric power transmission]]. Malfunction of single components is not considered to be a failure unless the resulting performance decline exceeds the specification limits for the entire system.
 
Active redundancy is used in complex systems to achieve high availability with no performance decline. Multiple items of the same kind are incorporated into a design that includes a method to detect failure and automatically reconfigure the system to bypass failed items using a voting scheme. This is used with complex computing systems that are linked. Internet [[routing]] is derived from early work by Birman and Joseph in this area.<ref>RFC 992</ref> Active redundancy may introduce more complex failure modes into a system, such as continuous system reconfiguration due to faulty voting logic.
 
Zero downtime system design means that modeling and simulation indicates [[mean time between failures]] significantly exceeds the period of time between [[planned maintenance]], [[upgrade]] events, or system lifetime. Zero downtime involves massive redundancy, which is needed for some types of aircraft and for most kinds of [[communications satellite]]. [[Global Positioning System]] is an example of a zero downtime system.
 
Fault [[instrumentation]] can be used in systems with limited redundancy to achieve high availability. Maintenance actions occur during brief periods of down-time only after a fault indicator activates. Failure is only significant if this occurs during a [[mission critical]] period.
 
[[Modeling and simulation]] is used to evaluate the theoretical reliability for large systems. The outcome of this kind of model is used to evaluate different design options. A model of the entire system is created, and the model is stressed by removing components. Redundancy simulation involves the N-x criteria. N represents the total number of components in the system. x is the number of components used to stress the system. N-1 means the model is stressed by evaluating performance with all possible combinations where one component is faulted. N-2 means the model is stressed by evaluating performance with all possible combinations where two component are faulted simultaneously.
 
==Reasons for unavailability==
A survey among academic availability experts in 2010 ranked reasons for unavailability of enterprise IT systems. All reasons refer to '''not following best practice''' in each of the following areas (in order of importance):<ref>Ulrik Franke, Pontus Johnson, Johan König, Liv Marcks von Würtemberg: Availability of enterprise IT systems – an expert-based Bayesian model, ''Proc. Fourth International Workshop on Software Quality and Maintainability'' (WSQM 2010), Madrid, [http://www.kth.se/ees/forskning/publikationer/modules/publications_polopoly/reports/2010/IR-EE-ICS_2010_047.pdf?l=en_UK]</ref>
 
# Monitoring of the relevant components
# [[Requirements management|Requirements]] and procurement
# Operations
# Avoidance of network failures
# Avoidance of internal application failures
# Avoidance of external services that fail
# Physical environment
# Network redundancy
# Technical solution of backup
# Process solution of backup
# Physical location
# Infrastructure redundancy
# Storage architecture redundancy
 
The factors themselves are based on the work of [[Evan Marcus]] and [[Hal Stern]].<ref>{{cite book |first=E. |last=Marcus |first2=H. |last2=Stern |title=Blueprints for high availability |edition=Second |location=Indianapolis, IN |publisher=John Wiley & Sons |year=2003 |isbn=0-471-43026-9 }}</ref>
 
==Costs of unavailability==
In a 1998 report from IBM Global Services, unavailable systems are estimated to have cost American businesses $4.54 billion in 1996, due to lost productivity and revenues.<ref>IBM Global Services, ''Improving systems availability'', IBM Global Services, 1998, [http://www.dis.uniroma1.it/~irl/docs/availabilitytutorial.pdf]</ref>
 
==See also==
* [[Fault-tolerant system]]
* [[Reliability, availability and serviceability (computer hardware)]]
* [[Reliability (computer networking)]]
* [[Reliability engineering]]
 
==References==
{{reflist}}
 
==External links==
* [http://www.cisco.com/en/US/tech/tk869/tk769/technologies_white_paper09186a00800a998b.shtml Cisco IOS Management for High Availability Networking] – Best Practices White Paper
* [http://jedi.informatik.uni-leipzig.de/de/index_en.html Homepage of the Dept. for Computer Science of the University of Leipzig]
* [http://www-ti.informatik.uni-tuebingen.de/~spruth/ECvorles/index.html Lecture Notes on Enterprise Computing] University of Tübingen
 
{{DEFAULTSORT:High Availability}}
[[Category:System administration]]
[[Category:Quality control]]
[[Category:Applied probability]]
[[Category:Reliability engineering]]
[[Category:Measurement]]

Latest revision as of 19:36, 23 March 2014

Hi there. Allow me begin by introducing the writer, her name is Sophia. Her family lives in Alaska but her spouse desires them to transfer. I am truly fond of to go to karaoke but I've been taking on new issues recently. Credit authorising is how he makes money.

My web page: tarot readings; http://help.ksu.Edu.sa/node/65129,